15 Feb 2008
About two and a half years ago, I wrote a post called How to Design an Interactive RSS Scraper. A scraper is a tool that extracts data from a web page; its most common use is to generate an RSS feed for a blog that doesn’t already have one. While there have been lots of scrapers, most of them focused on automatically figuring stuff out given just a URL. It seemed you wouldn’t get reliable good performance on lots of different page styles being fully automatic, but given a little bit of interactive selection — here’s a date, here’s a title, here’s the story — you could guide the scraper’s initial guesses and make a good feed without much complicated effort.
I recently found out about Dapper, a scraping service that takes this approach. It works quite well. The UI is pretty nice, and although there are some parts I still can’t figure out, I am able to generate RSS feeds. So if you’re looking for a scraper, try it! Here’s one feed I made with Dapper.