Tue
21 Dec 2004
3:54 am
Re: frassle.net homepage now speedier
Posted by shimon under computers/relational databases , frassle , kind of writing/tipsNo Comments
Hi Peter,
Thanks for your interest in my SQL optimizations. I'm gradually making the review of all frassle pages and timing the queries, so I can speed up the worst offenders and make the system snappier.
The main page was an easy place to start. Its slow query was the one that shows recent update times. Previously, it used a subquery to calculate the latest update time for each row. Something like:
SELECT feed_id, blog_title, (SELECT MAX(note_time) FROM fa_notes n WHERE n.feed_id = f.feed_id) as last_post_time FROM fa_feeds f ORDER BY last_post_time DESC LIMIT 15;
One problem was in the way the fake last_post_time column was being generated. Although the fa_notes table has an index on the feed_id column, PostgreSQL was doing a sequential scan of that table in order to find the maximum (latest) post time. But even with the use of the index on the subquery, it was uselessly slow to calculate the last_post_time for each feed and then throw away all but the top 15.
The solution was fairly simple: instead of recalculating it with every page load, store the last_post_time for each feed in the feeds table. I added this column, populated it once using something like the query above, and then added an update trigger to the fa_notes table. The trigger runs whenever a blog post is inserted or updated, and if that post's time is greater than its feed's last_post_time, the last_post_time is updated.
Lesson: In retrospect, this inefficiency seems obvious. But early in the design of a system, it's hard to know which parts of it will be too slow. Sometimes, the database is smart enough to optimize things that would take any reasonable piece of software 14.7 days. Other times, it does exactly the opposite of the smart thing and your clever design succeeds only in confusing the database (and probably yourself and your colleagues). On the balance, I think it's best to stay flexible and react once you have some actual data:
SELECT COUNT(1) FROM fa_notes;
(wait 5 minutes)
363488
Hmm… I guess optimization is never finished.
Full Entries RSS