February 2008


Writing a program is fun. Like writing an essay, the endeavor has two kinds of results: stuff that gets written down, and stuff that changes the way you think. Sometimes you’re aiming for the written product, to fulfill some external obligation. But other times, you’re aiming for enlightenment. The end product is irrelevant next to the understanding you developed in making it.

Most computer book authors wouldn’t claim to be writing about enlightenment. They’ll tell you that if you learn their language, framework, or methodology, you’ll win clients and impress coworkers. In this category there are a lot of useful introduction and reference books, along with a lot of buzzword-laden crap. But there’s another genre altogether, that doesn’t make these claims; it reflects the geek tradition of explaining interesting because they’re interesting. Looking back on experiences that really shaped my thinking as a hacker— a great algorithms course in college, a talk on “Tricks of the Perl Wizards” by Mark-Jason Dominus, Philip Greenspun’s book on web publishing— it’s clear that the most profound things I’ve learned spring from that tradition. Practical Ruby Projects is an exceptional book because it does, too.

Topher Cyll, author of the book, is a friend of mine. In college, Topher and I worked together on a community website for students as well as spending a lot of time in the same computer lab. In both roles, he was always full of clever and thoughtful ideas. Practical Ruby Projects is full of such ideas, expanded to project form and in an approachable buffet layout. The projects are indeed eclectic, from computer-generated music to gaming to genetic algorithms, but their common strand is the curiosity they all reward. If you’re one of the unusual folks who maintained this curiosity beyond school, or perhaps if you’re a professor who wants to assemble an intermediate projects course that will appeal to the most curious and passionate of students, this book is for you. With books like this and an open, curious mind, enlightenment might still be unreachable but at least you’re getting closer.

About two and a half years ago, I wrote a post called How to Design an Interactive RSS Scraper. A scraper is a tool that extracts data from a web page; its most common use is to generate an RSS feed for a blog that doesn’t already have one. While there have been lots of scrapers, most of them focused on automatically figuring stuff out given just a URL. It seemed you wouldn’t get reliable good performance on lots of different page styles being fully automatic, but given a little bit of interactive selection — here’s a date, here’s a title, here’s the story — you could guide the scraper’s initial guesses and make a good feed without much complicated effort.

I recently found out about Dapper, a scraping service that takes this approach. It works quite well. The UI is pretty nice, and although there are some parts I still can’t figure out, I am able to generate RSS feeds. So if you’re looking for a scraper, try it! Here’s one feed I made with Dapper.