Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.databases.postgresql > #127
| From | Mladen Gogala <no@email.here.invalid> |
|---|---|
| Newsgroups | comp.databases.postgresql |
| Subject | Cool tuning trick |
| Date | 2011-06-09 17:51 +0000 |
| Organization | A noiseless patient Spider |
| Message-ID | <pan.2011.06.09.17.51.37@email.here.invalid> (permalink) |
I had the following statement to optimize:
select
t.document_id,m.headline,m.author,m.url,m.content,m.stories_like_this,
m.harvest_time,m.valid_time,m.keyword,m.language,
w.category, w.is_subscription, w.location, w.region,
w.registration_url, w.subregion, w.url as mainURL, w.web_site_id,
s.internet_metrics_comscore, s.source_id, s.publisher_id, s.rank,
s.source_name, s.source_type, s.internet_web_print_id, s.media_type,
p.name as publisher_name, p.source_type as publisher_source_type
from
tempids t
join moreover_documents m on ( t.document_id=m.document_id)
join internet_web_sites w on (w.web_site_id= m.internet_web_site_id )
join internet_sources s on (w.source_id = s.source_id)
left outer join internet_source_publishers p on (s.publisher_id =
p.publisher_id)
where
m.created_at between '%s'::TIMESTAMP and '%s'::TIMESTAMP and
w.url ilike ?
limit $limit
The table "tempids" is a temporary table, populated by querying Sphinx,
while the problem table is the "moreover_documents" table. Here is the
problem with the moreover_documents table:
news=# select relname,n_live_tup
from pg_stat_all_tables
where schemaname='moreover' and
relname like 'moreover_documents%';
relname | n_live_tup
-----------------------------+------------
moreover_documents_y2011m07 | 0
moreover_documents_y2010m12 | 18424679
moreover_documents_y2011m05 | 23397022
moreover_documents_y2010m11 | 18155679
moreover_documents | 0
moreover_documents_y2010m10 | 17470210
moreover_documents_y2011m02 | 19910241
moreover_documents_y2011m03 | 23619496
moreover_documents_y2011m01 | 20346488
moreover_documents_y2011m04 | 22158352
moreover_documents_y2011m06 | 6693566
In other words, it's a huge partitioned table, using range partitioning
on a date column, with each partition having 20+ million rows. The table
is indexed using Sphinx and then the returned document ID's are searched
in the large table and joined with a bunch of other tables. The SQL above
was creating endless problems. As you know, Postgres is developed by
former pastry bakers who don't understand the need for hints, so there is
no way to force the particular plan. The "perfect optimizer" is less than
perfect, so the only recourse left is to rewrite the entire SQL. Here is
the trick that made the SQL perform well:
with tmp_mdocs as (
select m.* from
moreover_documents m
join tempids t on (m.document_id=t.document_id)
where m.created_at between '%s'::TIMESTAMP and
'%s'::TIMESTAMP)
select
m.document_id,m.headline,m.author,m.url,m.content,m.stories_like_this,
m.harvest_time,m.valid_time,m.keyword,m.language,
w.category, w.is_subscription, w.location, w.region, w.registration_url,
w.subregion, w.url as mainURL, w.web_site_id,
s.internet_metrics_comscore, s.source_id, s.publisher_id, s.rank,
s.source_name, s.source_type, s.internet_web_print_id, s.media_type,
p.name as publisher_name, p.source_type as publisher_source_type
from tmp_mdocs m
join internet_web_sites w on (w.web_site_id= m.internet_web_site_id )
join internet_sources s on (s.source_id = w.source_id)
left outer join internet_source_publishers p on (p.publisher_id =
s.publisher_id)
where
w.url ilike ?
limit $limit
This works fine. The CTE is executed first, just as it should be and
everything else is joined using indexes. The solution is in the CTE. This
technique can be used in general case, to force the optimizer to the
specific order. This is actually an old trick, advised by mr. Dan Tow in
his SQL tuning book, another thing disliked by some former pastry bakers
turned lousy computer geeks.
--
http://mgogala.freehostia.com
Back to comp.databases.postgresql | Previous | Next | Find similar
Cool tuning trick Mladen Gogala <no@email.here.invalid> - 2011-06-09 17:51 +0000
csiph-web