February 2004
Monthly Archive
Fri
27 Feb 2004
4:19 pm
link
I've been looking for this capability. Zip code and longitude/lattitude distance calculations are immensely useful in online community websites.
Zipdy is a program for calculating the distance between two zip codes and finding all the records in a RDBMS with a zip code with x miles of another zip code. Currently, RDBMS support exists for postgreSQL.
Thu
26 Feb 2004
9:55 pm
link
Google labs has a cool location-based search. But my test proves that it's not up to the most interesting location-based queries. [Via Garrett]
Wed
25 Feb 2004
2:50 pm
link
One Jon Husband had an interesting response to a call for visions of the future of blogging tools. He seems to be interested in a lot of the same stuff I am: the intersection of knowledge management, social software, weblogs, and new organizational, social, and economic structures.
Tue
24 Feb 2004
4:15 pm
link
According to Bradsher, internal industry market research concluded that S.U.V.s tend to be bought by people who are insecure, vain, self-centered, and self-absorbed, who are frequently nervous about their marriages, and who lack confidence in their driving skills. Ford's S.U.V. designers took their cues from seeing "fashionably dressed women wearing hiking boots or even work boots while walking through expensive malls." Toyota's top marketing executive in the United States, Bradsher writes, loves to tell the story of how at a focus group in Los Angeles "an elegant woman in the group said that she needed her full-sized Lexus LX 470 to drive up over the curb and onto lawns to park at large parties in Beverly Hills." One of Ford's senior marketing executives was even blunter: "The only time those S.U.V.s are going to be off-road is when they miss the driveway at 3 a.m."
Mon
23 Feb 2004
8:24 pm
link
In his blog today, Andrew Grumet waxes nostalgic about AOLServer. I used AOLServer with ArsDigita Community System 3, and it was clearly the finest environment for building database-backed web applications that I've ever used. It took me a little while to get TCL, but eventually I understood that everything is a string—really, really, everything.
This site you're reading, however, isn't programmed in ACS or its excellent successor OpenACS. It's programmed in Apache::PageKit, which sits on top of Apache's mod_perl. This is the same environment I chose for the Williams Students Online (WSO) website.
PageKit works well for WSO for two reasons: it's simple, and it's Perl. Simplicity is key because as a student organization, WSO has high turnover and it must be easy for new people to contribute. Perl is good because it's a language that most young geeks know or aspire to learn, and can learn quickly thanks to many good online sources. Also, it has an insanely great repository of reusable modules.
I had hoped to use a Java framework at first, but these were so brutally complex and huge I couldn't even set them up, let alone train other students with little programming experience to hack them. That failure brought up two priorities: the web framework had to be simple enough that I could explain it to someone with a little programming experience in 5 minutes and they would get it, and the language had to be something that didn't make such a person recoil in horror.
ACS and OpenACS are wonderful systems but they fail on both counts. To someone who is well educated about relational databases and understands the implications of programming multi-user web/db apps, ACS is beautiful and makes excellent sense; but WSO had only one such person—me. To someone who has just taken a Java data structures course and has never developed any software for users, ACS is unfathomably huge and the strange languages of TCL and SQL are peculiar to mysterious. Experienced geeks who haven't done DB apps, of which WSO had a handful, can usually see SQL as something potentially useful for real-world stuff but tend to view TCL as a passe fad. You need these experienced folks to lead interesting projects, and you want them to be comfortable, so you tend to avoid suggesting any language that elicits a bitter beer face.
PageKit really is a simple and wonderful framework. The essential feature is that your website is split into a Model (program code) and a View (HTML pages). When a URL is hit, PageKit looks for a corresponding subroutine in your Model code. That subroutine is called, and is responsible for setting some values. The View is then rendered; the HTML in each page is mixed with special tags that the server processes and replaces with the values your Model code produced. The Model code is nice, object-oriented Perl, and the View templates are HTML with about 5 special template tags that do IF blocks, loops, and stick values set by the model into your document. It's a pretty good framework, and though it doesn't have the sophisticated user management, package system, or array of software that OpenACS has, it's much much easier to learn and extend.
But Andrew did bring up one point where Perl/PageKit is simply uglier than the AOLServer API: database calls with loops. One of the most common tasks in generating a dynamic web page is executing a SQL query and looping over the results. AOLServer offers the beautiful db_foreach construction:
set title "Late Night With Conan O'Brien"
db_foreach get_matches {
select description, tvchannel, when_start, when_stop
from xmltv_programmes
where title = :title
} {
do_something_with $title $description $tvchannel
do_something_else_with $when_start $when_stop
}
In PageKit right now, you'd write this as:
my $title = "Late Night With Conan O'Brien";
my $sth = $model->dbh->prepare('
select description, tvchannel, when_start,
when_stop from xmltv_programmes where title = ? ');
$sth->execute($title);
while(my $row = $sth->fetchrow_hashref)) {
do_something_with($row->{title}, $row->{description},
$row->{tvchannel});
do_something_else_with($row->{when_start},
$row->{when_stop});
}
Now, this isn't 2 to 8 times as much code as AOLServer; it is only 2 extra lines and a few characters. But all the differences are mere scaffolding, and we should be able to abstract that scaffolding and make the essential feature—running a SQL query and looping over the results—absolutely central. The TCL does some scary stuff though: it takes the columns returned from the query and creates variables in the loop body's namespace with the same names. It also does something really, really useful that Perl's DBI (generic database interface) lacks: its :title syntax specifies that the title variable's value should be used (bound) in the query at that point. In Perl, this is a two-step process: you specify ? in the query where you want a value to be bound, and then when you execute the query, the arguments you pass are filled into the corresponding question marks. The big problem with the Perl approach (used also in Java and Microsoft ADO, and probably others) is that you can easily have a dozen ?s in a SQL statement and it's a pain to match them up in order. We should be able to use names like AOLServer. Andrew probably didn't intend his post this way, but I'm looking at it as a challenge: can I make my environment as good?
Answering the Challenge
Can I answer this challenge? Let me try to state it first:
- Make it possible to write Perl that has the same effect and structure as the AOLServer db_foreach statement, without extraneous code.
- Package this code as a module so that it's easy for any programmer to use the new construct.
- Slight adjustments to accomodate Perl syntax and conventions are OK, but any programmer who knows AOLServer's db_foreach should be able to instantly recognize and accurately understand the corresponding Perl.
Sketch of a Solution
I'll take the following template as my goal. I'd like to have working Perl that looks like this:
my $title = "Late Night With Conan O'Brien";
$dbh->db_foreach 'get_matches' q{
select description, tvchannel, when_start, when_stop
from xmltv_programmes where title = :title
} {
do_something_with($title, $description, $tvchannel);
do_something_else_with($when_start, $when_stop);
}
Can I do it? Will Perl let me meddle with the language syntax enough to make this kind of crazy business compilable? This post will be continued tonight…
Update 2/27/04: I didn't get to continue this post the night I wrote it. Since then, I've mostly hacked the code I want, but I haven't had a chance to write it up. Meanwhile,
Jay has written some code that basically does the looping for you. He also calls poking around in the symbol table to find existing variables "disgusting", a point which I'm both inclined to agree with and find rather irrelevant. As a teaser, I have Perl code that looks and works almost exactly like the TCL code, including grabbing existing variables and support for named bind variables.
Mon
23 Feb 2004
4:56 am
There was a tradesman, a painter called Wayne, who was very interested in making a penny where he could, so he often would thin down paint to make it go a wee bit further.
As it happened, he got away with this for some time, but eventually the Baptist Church decided to do a big restoration job on the painting of one of their biggest buildings. Wayne put in a bid, and because his price was so low, he got the job.
And so he set to erecting the trestles and setting up the planks, and buying the paint and, yes, I am sorry to say, thinning it down with turpentine.
Well, Wayne was up on the scaffolding, painting away, the job nearly completed when suddenly there was a horrendous clap of thunder, and the sky opened, the rain poured down, washing the thinned paint from all over the church and knocking Wayne clear off the scaffold to land on the lawn among the gravestones, surrounded by telltale puddles of the thinned and useless paint.
Wayne was no fool. He knew this was a judgment from the Almighty, so he got on his knees and cried: "Oh, God! Forgive me! What should I do?"
And from the thunder, a mighty voice spoke…
"Repaint! Repaint! And thin no more!"
Sat
21 Feb 2004
2:50 pm
Posted by shimon under
frassleNo Comments
link
Sun noticed a bug in the "frassle it!" bookmark on the home page— it referred to the wrong host. I've fixed it. Thanks!
Sat
21 Feb 2004
2:32 pm
link
Is it another Technorati?
Fri
20 Feb 2004
7:55 pm
There's been a lot of talk in the Berkman group about using technology to help voters make more informed decisions. As I was reading this article from The Nation (thanks Jim Moore), I got to thinking: if Dean knew that the mass media were going to turn on him, why didn't he plug his website at every opportunity? Well, it wouldn't have been very useful because it is mostly a rah-rah brochure with lots of padding in between actual position statements.
Let's face it: when a candidate is on TV or in a debate, they give canned, generic responses that I'll call brochure answers. These are designed to make everyone feel good about the candidate, without giving away enough actual policy to turn anyone off. They're fluff.
Now, if a candidate actually did stake out some positions in any detail, we would have a much more honest and straightforward election. It would become feasible to know what policies Mr./Mrs. Whoever actually advocates. Right now the best place to get candidate position information is perhaps from the League of Women Voters' DemocracyNet, but the candidate-supplied statements there are incomplete and vague. They're statements manufactured during the campaign to have the same brochure appeal as all the other garbage we see. DemocracyNet gives you power to compare, but what you're comparing is sales pitches. We'd like to compare facts about political issues.
So how do we compare facts? First, I'd like to see major issues (e.g. "Abortion", "Gun Control") broken down into specific multiple-choice questions ("Roe v. Wade: should we keep or overturn it?", "Assault rifles: allow or ban civilian purchase?"). Each candidate would then be paired with each issue. Clearly, the reductionist way that these questions are stated and structured makes it difficult to guarantee objectivity. Therefore, the Voter Support System would have to make available the identity of the researchers who selected the issues and interpreted the candidates' positions. Like a good academic paper, all references must be cited. Any visitor to the site should be able to contribute to the vetting process by issuing specific references that support or challenge the validity or applicability of a reference. And very importantly, there must be no aggregate or statistical reasoning. The system shouldn't tell me "266 people agree with this statement and 127 disagree," because that doesn't tell me if it's true or not. Majority opinions, especially on a website, do not correlate to factual accuracy and leave open a huge window for abuse.
Here's a mockup:
| Viewing position information for CURIOUS GEORGE: Gun Control |
| Statement of the Issue |
Candidate position |
| 1. Should guns be given to Monkeys? |
Choices: yes, no George's Position: yes*
* IMPORTANT! We have received compelling arguments that CURIOUS GEORGE could support either yes or no on this issue. Therefore we have LOW CONFIDENCE in the summary position above.
Supported by:
Challenged by:
» support or challenge our assessments
|
| 2. Should machine guns be sold to minors? |
Choices: yes, no George's Position: yes
Supported by:
Challenged by:
» support or challenge our assessments
|
(Hover over a link to get more information on what that link will do.)
The Voter Support System would involve three major roles:
- readers, who view the position information for candidates and offer new resources that support or challenge certain interpretations of a candidate's position
- moderators, who screen out reader suggestions, but only ones that are obviously off-topic or spam
- researchers, who are tasked with verifying the accuracy of reader suggestions and, if accurate, classifying them on the candidate-issue tree
Readers can be anyone, including known biased sources such as competing campaigns; we evaluate specific arguments, not their messengers. Moderators are a mostly administrative function necessitated because of the web-based nature of this system, but still all their decisions should be subject to public review. Researchers must be a disciplined group devoted solely to evaluating the accuracy of claims from readers.
The reputation of the voter support system is staked primarily on the quality of its issues breakdown and the objectivity of its research. I wouldn't use a system that told me if candidates were for "tax relief", because "tax relief" is an abstract term with a loaded meaning—what kind of cruel person could be against relief?! The key is to stick to specifics and avoid opinion, and gather measurable data. To initially populate the issue tree with candidate stances, you'd want to do a vast media review. You might want to enslave a large cadre of grad students for this purpose.
Summary
The key assumption here is that voters want to vote based on rational facts, not hype or emotion. This may not be true for all voters, but it's how I'd like to vote, and I think there should be a system to help me do so.
- break down issues into specific, multiple-choice questions
- research positions of each candidate on each specific issue
- publish rationales, not statistics, for all position judgments
- solicit vetting of those judgments from readers
- research the accuracy of reader arguments and publish that research
Or in short, reduce to specific issues in a systematic, verifiable, open way.
Finally, a Solicitation
If you'd like to work with me on a system like this, I can do the technical side of it. But I'd need help with the issue structures and ongoing research. Email me.
Fri
20 Feb 2004
5:58 pm
I've been working a bit recently on a recommendations engine based on the data of Share Your OPML. The SYO database contains the RSS subscription lists of 733 people, who have 81,438 subscriptions to 24,748 distinct feeds. It uses your subscriptions list, and those of everyone else in the database, to help you find feeds you'd like, but that you aren't yet subscribed to.
This is an area Andrew Grumet, Kingsley Kerce, and I have been talking about for a few weeks. He had the data in convenient form in an RDBMS the day after Dave Winer released the SDK, but last weekend I took a few hours and got it flowing into my own RDBMS so I could play too.
As I've been working on recommendations, I've found that it's really hard to evaluate whether the recommendations are any good. First of all, because feeds.scripting.com ends in, and is advertised mostly on, scripting.com, the audience is biased to include people interested in blogging and technology, liberal politics, and so forth. If you randomly pick a feed from SYO, how likely is it to be something interesting?
- Task: answer this question by building a feature that takes you to a random feed.
Another problem is that it's rather cumbersome to actually look at any of the recommended feeds. SYO only includes XML links, and an obvious improvement would be to spider the linked XML and dig out a website URL. Indeed, Andrew's site very conveniently does this. I'd like to do it or something like it myself; I'm considering perhaps offering a frassle-ized interface to the RSS feed contents rather than a simple link. This would offer a consistent interface to externally-produced content, which is both good and bad.
The next question is, how do you quickly look at a blog that some goofy software has recommended to you and decide whether you'll like it and what it's about? In a perfect world, you'd have enough time to carefully read everything on it, but in real life we're all in hurry. Here are some easier metrics that I think can be correlated to quality and relevance, or can at least give you a sense of what the feed's about and what kind of burden it will impose on you, the reader:
- titles of posts
- what outside sites are linked from the posts
- category titles, if the feed has categories
- average time between posts – to help you determine how often new posts will knock on your door. Perhaps better presented as average number of new posts per day?
- average number of words in a post (and min, max?)
- what feeds point to the same website URL? Am I subscribed to any of those? (i.e. is this feed just a copy of something I already read?)
Currently, there is only a development version of my recommendations engine in an undisclosed location, but I'll try to get it up on this site, and public, real soon now.
Next Page »