computers/programming languages/Perl


I have a confession to make: I'm learning Ruby and Rails. And I dig it. It is, as it claims, a great leap forward in productivity compared to complex frameworks like J2EE, Struts, or even ASP.NET. I was always prepared to believe this— I would say the same for the PageKit framework for Perl, which I've been using for 4 years, and has been pretty stable for at least 5. But until recently, I wasn't believing the hype that Rails could be much better than PageKit.

After all, both PageKit and Rails had the whole MVC thing going, which is the most important advantage. (MVC stands for Model-View-Controller, a pattern whose incarnation in web apps generally means that you have HTML-like files describing page layout, and Perl (or Ruby or Java or whatever) modules containing program code that provides data for that view.) The difference seemed to be mainly that Rails included an object-relational mapping tool.

I am not a fan of object-relational mapping tools. I consider them a special case (no pun intended… well not originally at least) of CASE tools. CASE stands for Computer Assisted Software Engineering, and is based around the idea that you can write programs to do part of the job of software engineering, and thereby make humans more productive.

CASE tools are bullshit. It's not that productive tools aren't helpful to hackers — they certainly are. But tools that write code necessarily degrade the design of your software in a very severe way. If you can't just write the code in an appropriately uncomplicated way, the solution isn't to have a fancy program produce the complicated code. The solution is to change the language you're using so you can express exactly what you need to express, concisely.

That is basically why I like dynamic languages. They remove a large set of restrictions on how you can redefine the meaning of utterances in the languages, so if you decide that looking up the key sasquatch in an associative array has a special meaning anywhere in your program, then damnit, you can have it work in a special way. Easily.

For those who argue that this enables sloppy code, I have two answers:

  1. With great power comes great responsibility. Not everyone is capable of accepting the responsibility of programming in a wide-open language, and if you think you need the sort of guidance that the Java wizards at Sun can produce (aimed at a target market of all application programmers in the world), you should use Java. I'm not going to say great programmers don't need guidelines or rules, because they do— they are constantly both affecting them and affected by them— but their rules are crafted and evolved around the subtle needs of their partners and customers, not pronouncements from language priests.
  2. The alternatives are still sloppy code or code generation. If your language won't let you say things elegantly, you can either buy/invent a new language that will, or you can write inelegant code.

Anyway… back on topic: Object-Relational Mappers. First a definition: an O-R Mapper is a gadget that bridges the divide between your relational database's idea of the important data types in your application, and your programming system's idea of the important data types in your application. Your DB will be relational (stuff goes in tables, powerful querying is built-in) and normally persistent (saved to disk), while your programming system's data will be object-oriented (data is abstracted within object interfaces, querying has to be custom-built) and normally transient (in memory only).

I have so far been doing my web applications in a pretty traditional way, database-wise: my Perl code contains strings of SQL. Then the Perl code processes the results of the SQL query and munges it into data structures (lists of hashes or something) that my templates can handle. This is OK; I've gotten pretty good at dashing out SQL. But:

  • it's hard to reuse code for queries needed in multiple places
  • Perl may not be the most visually clean language, but it has a style, and dropping a fat SQL string into the middle of it is like carpeting the floor of a rainforest
  • each data structure is specifically designed to match the template where it is used, so changes in what details are displayed require changes to SQL and to SQL-result-processing code
  • there isn't an obvious place to put common functions relating to a certain data type.

Well, that list is exactly what O-R Mappers are supposed to solve. Unfortunately, I thought none of them actually worked, mostly because code generation produces very brittle code and it's not OK for code that depends on my (rapidly evolving) database design to be brittle.

Thanks to Rails' ActiveRecord, I now realize O-R mappers aren't doomed. ActiveRecord creates the mapping at runtime, which is possible because it's running in a dynamic language, all of which means that I can add a column to some table and then just program with it. And because things are just expected to have the same name convention everywhere, there's no need for bloated configuration files that tell you totally obvious crap, eg. that the DB's people table's birthday column goes into the Person object's birthday field. And you can still write SQL if you must, though AR will do most of the joins you need.

This O-R stuff isn't as huge a productivity gain as moving from J2EE to anything dynamic and MVC. So the PageKit to Rails transition isn't as huge as most transitions to Rails. But the benefit, though marginal, is significant. Consider me impressed.

[Of course, if someone had done this in Perl it would have been faster! —ed.]

link

My frassle activity has been light lately. I took a small vacation and didn't think much about blogging. Instead, I ate well, slept soundly, and enjoyed the warm (compared to Boston) Cincinnati weather. When I wasn't fully engaged in laziness or family, I was either moving furniture with my California-bound friend Coleman, or programming Javascript.

Yesterday I told Ingo Muschenetz that I had been programming a lot of Javascript, and he offered his sympathies. This is a common reaction: "real programmers" look down on scripting languages, and scripters don't even associate with the losers who write Javascript. Of course, with cool apps and new buzzwords, people are once again stunned at what you can do in the browser, and Javascript is reaching a status not unlike illegal immigrant labor: shameful and abused, but vital to the great stuff coming out of California.

But Javascript isn't all that bad. It's actually kinda nice. While its syntax mostly mimics its namesake, Java, it is in spirit just another dynamic functional language. My personal programming style, developed through Perl hacking and Java schooling, is a blend of object-oriented and functional/dynamic styles. Meaning that I use classes to support modularity and hold state, but use a lot of hashes, lists, and closures.*

It turns out Javascript is excellent for this. In Javascript, the hash (a map from arbitrarily-valued keys to values) occupies a position like the list does in Lisp. Every object in Javascript is a hash, including regular arrays. An object is a hash whose values are either variables or function references. myObj.getFoo() is equivalent to myObj["getFoo"]().

I am not without complaints about Javascript. With some syntactic sugar my programs could be 40% shorter and 20% easier to read, as they would be in Perl or Lisp. With a better and more stable spec from ECMA, cross-browser compatibility wouldn't be as hard. But I also respect that Javascript's priority should be coverage and consistency rather than elegance. Considering that the next feasible choice is Microsoft's VBScript, which would make my programs ten times longer and force me to write sort routines instead of novel UI, I'm impressed. That puts me in the corner with Douglas Crockford, who wrote a wonderful exposition of Javascript way back in 2001.

* Footnote: If you're a Lisp hacker, or just someone who thinks Perl is dirty, I want to understand why you feel that way. Perl has all the important functional features I'm aware of, and if I should really be using Lisp instead I'd like to know. Of course you can write hideous crap in Perl, but it gives the programmer amazing leverage over the language, which translates into huge productivity gains. Is it the punctuation? The diversity supported by the Perl's huge vocabulary? The hackish user community? Or are you just being lazy?

link

Want Python-style "generators" in Perl? You can always use a closure as a generator. Example:

sub something {
  …
  my $sth = $dbh->prepare('SELECT * FROM somewhere');
  $sth->execute;

  consume(sub { $sth->fetchrow_arrayref });
}

sub consume {
  my($generator) = @_;

  while(my @results = $generator->()) {
    # process @results
  }
}

Closures are the bomb. Of course, this is not a perfect solution. If you just wanted to generate the numbers from 1 to 99, you could pass a closure to generate one number, but it would have to store its state externally. I believe Perl 6 has a yield command that would allow you to do this easily (see a more thorough discussion).

link

The classic talk by Larry Wall.

link

Funny, I was almost ready to sit down and write a module to do exactly this.

link

It's rare that a Perl article teaches me something new, but I must admit that I wasn't aware of what the .. operator does in scalar context. Neato.

link

Perl code to do RSS autodiscovery. This is exactly what I need to add one-click subscribe to frassle's aggregator.

link

Here are the slides from a talk about the growth of LiveJournal from single machine hobby to 60-machine supersite. Very technical, very Perl, very interesting.

link

In his blog today, Andrew Grumet waxes nostalgic about AOLServer. I used AOLServer with ArsDigita Community System 3, and it was clearly the finest environment for building database-backed web applications that I've ever used. It took me a little while to get TCL, but eventually I understood that everything is a string—really, really, everything.

This site you're reading, however, isn't programmed in ACS or its excellent successor OpenACS. It's programmed in Apache::PageKit, which sits on top of Apache's mod_perl. This is the same environment I chose for the Williams Students Online (WSO) website.

PageKit works well for WSO for two reasons: it's simple, and it's Perl. Simplicity is key because as a student organization, WSO has high turnover and it must be easy for new people to contribute. Perl is good because it's a language that most young geeks know or aspire to learn, and can learn quickly thanks to many good online sources. Also, it has an insanely great repository of reusable modules.

I had hoped to use a Java framework at first, but these were so brutally complex and huge I couldn't even set them up, let alone train other students with little programming experience to hack them. That failure brought up two priorities: the web framework had to be simple enough that I could explain it to someone with a little programming experience in 5 minutes and they would get it, and the language had to be something that didn't make such a person recoil in horror.

ACS and OpenACS are wonderful systems but they fail on both counts. To someone who is well educated about relational databases and understands the implications of programming multi-user web/db apps, ACS is beautiful and makes excellent sense; but WSO had only one such person—me. To someone who has just taken a Java data structures course and has never developed any software for users, ACS is unfathomably huge and the strange languages of TCL and SQL are peculiar to mysterious. Experienced geeks who haven't done DB apps, of which WSO had a handful, can usually see SQL as something potentially useful for real-world stuff but tend to view TCL as a passe fad. You need these experienced folks to lead interesting projects, and you want them to be comfortable, so you tend to avoid suggesting any language that elicits a bitter beer face.

PageKit really is a simple and wonderful framework. The essential feature is that your website is split into a Model (program code) and a View (HTML pages). When a URL is hit, PageKit looks for a corresponding subroutine in your Model code. That subroutine is called, and is responsible for setting some values. The View is then rendered; the HTML in each page is mixed with special tags that the server processes and replaces with the values your Model code produced. The Model code is nice, object-oriented Perl, and the View templates are HTML with about 5 special template tags that do IF blocks, loops, and stick values set by the model into your document. It's a pretty good framework, and though it doesn't have the sophisticated user management, package system, or array of software that OpenACS has, it's much much easier to learn and extend.

But Andrew did bring up one point where Perl/PageKit is simply uglier than the AOLServer API: database calls with loops. One of the most common tasks in generating a dynamic web page is executing a SQL query and looping over the results. AOLServer offers the beautiful db_foreach construction:

 set title "Late Night With Conan O'Brien"

db_foreach get_matches {
select description, tvchannel, when_start, when_stop
from xmltv_programmes
where title = :title
} {
do_something_with $title $description $tvchannel
do_something_else_with $when_start $when_stop
} 

In PageKit right now, you'd write this as:

 my $title = "Late Night With Conan O'Brien";
my $sth = $model->dbh->prepare('
select description, tvchannel, when_start,
when_stop from xmltv_programmes where title = ? ');
$sth->execute($title);
while(my $row = $sth->fetchrow_hashref)) {
do_something_with($row->{title}, $row->{description},
$row->{tvchannel});
do_something_else_with($row->{when_start},
$row->{when_stop});
} 

Now, this isn't 2 to 8 times as much code as AOLServer; it is only 2 extra lines and a few characters. But all the differences are mere scaffolding, and we should be able to abstract that scaffolding and make the essential feature—running a SQL query and looping over the results—absolutely central. The TCL does some scary stuff though: it takes the columns returned from the query and creates variables in the loop body's namespace with the same names. It also does something really, really useful that Perl's DBI (generic database interface) lacks: its :title syntax specifies that the title variable's value should be used (bound) in the query at that point. In Perl, this is a two-step process: you specify ? in the query where you want a value to be bound, and then when you execute the query, the arguments you pass are filled into the corresponding question marks. The big problem with the Perl approach (used also in Java and Microsoft ADO, and probably others) is that you can easily have a dozen ?s in a SQL statement and it's a pain to match them up in order. We should be able to use names like AOLServer. Andrew probably didn't intend his post this way, but I'm looking at it as a challenge: can I make my environment as good?

Answering the Challenge

Can I answer this challenge? Let me try to state it first:

  1. Make it possible to write Perl that has the same effect and structure as the AOLServer db_foreach statement, without extraneous code.
  2. Package this code as a module so that it's easy for any programmer to use the new construct.
  3. Slight adjustments to accomodate Perl syntax and conventions are OK, but any programmer who knows AOLServer's db_foreach should be able to instantly recognize and accurately understand the corresponding Perl.

Sketch of a Solution

I'll take the following template as my goal. I'd like to have working Perl that looks like this:

 my $title = "Late Night With Conan O'Brien";
$dbh->db_foreach 'get_matches' q{
select description, tvchannel, when_start, when_stop
from xmltv_programmes where title = :title
} {
do_something_with($title, $description, $tvchannel);
do_something_else_with($when_start, $when_stop);
} 

Can I do it? Will Perl let me meddle with the language syntax enough to make this kind of crazy business compilable? This post will be continued tonight…


Update 2/27/04: I didn't get to continue this post the night I wrote it. Since then, I've mostly hacked the code I want, but I haven't had a chance to write it up. Meanwhile, Jay has written some code that basically does the looping for you. He also calls poking around in the symbol table to find existing variables "disgusting", a point which I'm both inclined to agree with and find rather irrelevant. As a teaser, I have Perl code that looks and works almost exactly like the TCL code, including grabbing existing variables and support for named bind variables.

link

Eclipse (www.eclipse.org) is a very nice extensible IDE. Currently it has especially strong support for Java and C++ but this is a promising project to add some Perl support. Note that since Eclipse is written in Java it is painfully slow on machines under, say, 1Ghz. Text editing oughtn't be CPU-bound, but…

Next Page »