computers/relational databases


link

Hi Peter,

Thanks for your interest in my SQL optimizations. I'm gradually making the review of all frassle pages and timing the queries, so I can speed up the worst offenders and make the system snappier.

The main page was an easy place to start. Its slow query was the one that shows recent update times. Previously, it used a subquery to calculate the latest update time for each row. Something like:

SELECT feed_id, blog_title, (SELECT MAX(note_time) FROM fa_notes n WHERE n.feed_id = f.feed_id) as last_post_time FROM fa_feeds f ORDER BY last_post_time DESC LIMIT 15;

One problem was in the way the fake last_post_time column was being generated. Although the fa_notes table has an index on the feed_id column, PostgreSQL was doing a sequential scan of that table in order to find the maximum (latest) post time. But even with the use of the index on the subquery, it was uselessly slow to calculate the last_post_time for each feed and then throw away all but the top 15.

The solution was fairly simple: instead of recalculating it with every page load, store the last_post_time for each feed in the feeds table. I added this column, populated it once using something like the query above, and then added an update trigger to the fa_notes table. The trigger runs whenever a blog post is inserted or updated, and if that post's time is greater than its feed's last_post_time, the last_post_time is updated.

Lesson: In retrospect, this inefficiency seems obvious. But early in the design of a system, it's hard to know which parts of it will be too slow. Sometimes, the database is smart enough to optimize things that would take any reasonable piece of software 14.7 days. Other times, it does exactly the opposite of the smart thing and your clever design succeeds only in confusing the database (and probably yourself and your colleagues). On the balance, I think it's best to stay flexible and react once you have some actual data:

SELECT COUNT(1) FROM fa_notes;
(wait 5 minutes)
363488

Hmm… I guess optimization is never finished.

link

Funny, I was almost ready to sit down and write a module to do exactly this.

link

by Martin Fowler Pramod Sadalage

link

Here are the slides from a talk about the growth of LiveJournal from single machine hobby to 60-machine supersite. Very technical, very Perl, very interesting.

link

Andrew suggests I should check out these docs to see if there's an easy way to siphon the feeds.scripting.com data out of his oracle DB into a DB of some sort that I can play with. It may be easier to use the feeds.scripting SDK, but we'll see.

link

Full text search. Coming soon to frassle.

link

Describes the nested-set model of trees in SQL, which is how trees should be represented in frassle. I did it the simple adjacency list way though, since it was marginally easier and this is prototype software.