Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.databases.postgresql > #127

Cool tuning trick

From Mladen Gogala <no@email.here.invalid>
Newsgroups comp.databases.postgresql
Subject Cool tuning trick
Date 2011-06-09 17:51 +0000
Organization A noiseless patient Spider
Message-ID <pan.2011.06.09.17.51.37@email.here.invalid> (permalink)

Show all headers | View raw


I had the following statement to optimize:
select
  t.document_id,m.headline,m.author,m.url,m.content,m.stories_like_this,
  m.harvest_time,m.valid_time,m.keyword,m.language,
  w.category, w.is_subscription, w.location, w.region, 
w.registration_url, w.subregion, w.url as mainURL, w.web_site_id,
  s.internet_metrics_comscore, s.source_id, s.publisher_id, s.rank, 
s.source_name, s.source_type, s.internet_web_print_id, s.media_type,
  p.name as publisher_name, p.source_type as publisher_source_type
  from
  tempids t
  join moreover_documents m  on ( t.document_id=m.document_id)
  join internet_web_sites w on (w.web_site_id= m.internet_web_site_id )
  join internet_sources s on (w.source_id = s.source_id)
  left outer join internet_source_publishers p on (s.publisher_id = 
p.publisher_id)
  where
  m.created_at between  '%s'::TIMESTAMP and '%s'::TIMESTAMP and
  w.url ilike ?
  limit $limit

The table "tempids" is a temporary table, populated by querying Sphinx, 
while the problem table is the "moreover_documents" table. Here is the 
problem with the moreover_documents table:

news=# select relname,n_live_tup
from pg_stat_all_tables
where schemaname='moreover' and 
     relname like 'moreover_documents%';
           relname           | n_live_tup 
-----------------------------+------------
 moreover_documents_y2011m07 |          0
 moreover_documents_y2010m12 |   18424679
 moreover_documents_y2011m05 |   23397022
 moreover_documents_y2010m11 |   18155679
 moreover_documents          |          0
 moreover_documents_y2010m10 |   17470210
 moreover_documents_y2011m02 |   19910241
 moreover_documents_y2011m03 |   23619496
 moreover_documents_y2011m01 |   20346488
 moreover_documents_y2011m04 |   22158352
 moreover_documents_y2011m06 |    6693566


In other words, it's a huge partitioned table, using range partitioning 
on a date column, with each partition having 20+ million rows. The table 
is indexed using Sphinx and then the returned document ID's are searched 
in the large table and joined with a bunch of other tables. The SQL above 
was creating endless problems. As you know, Postgres is developed by 
former pastry bakers who don't understand the need for hints, so there is 
no way to force the particular plan. The "perfect optimizer" is less than 
perfect, so the only recourse left is to rewrite the entire SQL. Here is 
the trick that made the SQL perform well:

with tmp_mdocs as (
select m.* from
moreover_documents m
join tempids t on (m.document_id=t.document_id)
where m.created_at between '%s'::TIMESTAMP and
                           '%s'::TIMESTAMP)
select
  m.document_id,m.headline,m.author,m.url,m.content,m.stories_like_this,
  m.harvest_time,m.valid_time,m.keyword,m.language,
  w.category, w.is_subscription, w.location, w.region, w.registration_url,
  w.subregion, w.url as mainURL, w.web_site_id,
  s.internet_metrics_comscore, s.source_id, s.publisher_id, s.rank,
  s.source_name, s.source_type, s.internet_web_print_id, s.media_type,
  p.name as publisher_name, p.source_type as publisher_source_type
from tmp_mdocs m
  join internet_web_sites w on (w.web_site_id= m.internet_web_site_id )
  join internet_sources s on (s.source_id = w.source_id)
  left outer join internet_source_publishers p on (p.publisher_id = 
s.publisher_id)
  where
  w.url ilike ?
  limit $limit

This works fine. The CTE is executed first, just as it should be and 
everything else is joined using indexes. The solution is in the CTE. This 
technique can be used in general case, to force the optimizer to the 
specific order. This is actually an old trick, advised by mr. Dan Tow in 
his SQL tuning book, another thing disliked by some former pastry bakers 
turned lousy computer geeks.
-- 
http://mgogala.freehostia.com

Back to comp.databases.postgresql | Previous | Next | Find similar


Thread

Cool tuning trick Mladen Gogala <no@email.here.invalid> - 2011-06-09 17:51 +0000

csiph-web