computers/programming languages


Over on Reddit, there’s been lots of buzz about Erlang recently. Yet Joel Spolsky didn’t even mention it in a post today about languages for enterprise web apps. This must be because it’s so thoroughly proven, respected, and well-established that you should use it for everything.

After all, I think there are some Swedish phone companies using it for some phone-related apps. And phone-related apps involve zillions of messages per second, so it will definitely be scalable.

Plus I saw this chart where YAWS is red and Apache is green and blue, and red does way better than green and blue. I’m not sure how the test methodology relates to anything you’d actually see in real life, but at least there is quantified evidence that erlang does better than apache at something.

See, the important thing is that you know it’s trustworthy because they made the critical parts so memorable, rather than concentrating on the complex and confusing methodology. After all, what indicates enterprise-readiness better than the existence of an executive summary?

P.S. Someone told me 37signals is developing their next webapp in Erlang. Pass it on!

link

My frassle activity has been light lately. I took a small vacation and didn't think much about blogging. Instead, I ate well, slept soundly, and enjoyed the warm (compared to Boston) Cincinnati weather. When I wasn't fully engaged in laziness or family, I was either moving furniture with my California-bound friend Coleman, or programming Javascript.

Yesterday I told Ingo Muschenetz that I had been programming a lot of Javascript, and he offered his sympathies. This is a common reaction: "real programmers" look down on scripting languages, and scripters don't even associate with the losers who write Javascript. Of course, with cool apps and new buzzwords, people are once again stunned at what you can do in the browser, and Javascript is reaching a status not unlike illegal immigrant labor: shameful and abused, but vital to the great stuff coming out of California.

But Javascript isn't all that bad. It's actually kinda nice. While its syntax mostly mimics its namesake, Java, it is in spirit just another dynamic functional language. My personal programming style, developed through Perl hacking and Java schooling, is a blend of object-oriented and functional/dynamic styles. Meaning that I use classes to support modularity and hold state, but use a lot of hashes, lists, and closures.*

It turns out Javascript is excellent for this. In Javascript, the hash (a map from arbitrarily-valued keys to values) occupies a position like the list does in Lisp. Every object in Javascript is a hash, including regular arrays. An object is a hash whose values are either variables or function references. myObj.getFoo() is equivalent to myObj["getFoo"]().

I am not without complaints about Javascript. With some syntactic sugar my programs could be 40% shorter and 20% easier to read, as they would be in Perl or Lisp. With a better and more stable spec from ECMA, cross-browser compatibility wouldn't be as hard. But I also respect that Javascript's priority should be coverage and consistency rather than elegance. Considering that the next feasible choice is Microsoft's VBScript, which would make my programs ten times longer and force me to write sort routines instead of novel UI, I'm impressed. That puts me in the corner with Douglas Crockford, who wrote a wonderful exposition of Javascript way back in 2001.

* Footnote: If you're a Lisp hacker, or just someone who thinks Perl is dirty, I want to understand why you feel that way. Perl has all the important functional features I'm aware of, and if I should really be using Lisp instead I'd like to know. Of course you can write hideous crap in Perl, but it gives the programmer amazing leverage over the language, which translates into huge productivity gains. Is it the punctuation? The diversity supported by the Perl's huge vocabulary? The hackish user community? Or are you just being lazy?

link

This is what I did my undergraduate thesis (2003) on, and now it's showing up in trade publications. I don't think the author, Ramnivas Laddad, knew about my thesis, but perhaps he'd be interested in its large catalog of aspect-oriented refactorings.

link

I've posted my undergraduate computer science thesis on the web, finally. It's on refactoring AspectJ programs. Here's the abstract:

This thesis extends the state of the art in refactoring to Aspect-Oriented Programming. Refactorings are specific code transformations that improve the design of existing code without changing its observable behavior. Aspect-Oriented Programming (AOP) offers a new approach to software design by encapsulating crosscutting concerns. The novel contributions of this thesis are a recasting of existing refactorings to preserve program behavior in aspect-oriented code, and several new refactorings that can improve the design of code by deploying AOP techniques. The refactorings are described in reference to AspectJ, an AOP extension of Java, and are amenable to partial or full automation.

It is necessary to reevaluate existing OO refactorings because the constructs of AOP programming languages significantly affect what changes can be meaning-preserving. To this end, new preconditions and steps are introduced to about 20 fundamental refactorings (from Opdyke, 1992) such as renaming a class and inlining a method. These extended refactorings can form the basis for an AOP-aware refactoring tool.

About thirty new AOP-specific refactorings are proposed. These refactorings include both fundamental refactorings and more complex refactorings built from these that address specific design problems. For example, a simple refactoring possible in AOP is to move the definition of a method from within a class to an aspect. A more complex refactoring that includes this is moving all code responsible for implementing a particular interface into an aspect. The focus is primarily on accomplishing the desired changes once the involved program parts are identified. These new refactorings can form the basis for a truly AOP-focused refactoring tool.

I've been itching for some time to learn Python and Ruby. I doubt they'll replace my unabated adoration for Perl, and I love learning new languages, but it's hard to divert myself from a day of productive Perl hacking to learn some new language. So I propose to make this a group event.

For one day, several hackers will get together with the sole goal of learning some new programming languages. We'll each purchase and bring a book, and spend the first half-day reading about a language we don't yet know. The second half will be spent attempting to implement a cool web application of some sort using that language. Ideally, the hackers' existing knowledge will overlap such that at least one person in the room will know each language being learned.

For example, suppose you had three hackers, A, B, and C. Here would be a pretty good mesh of knowledge and ambition:

Hacker Knows Wants to learn
A Perl, Java Python, Ruby
B Python, Perl Ruby, PHP
C Ruby, PHP Perl, Python

I think the collaborative atmosphere would keep us focused, as well as being tons of fun. Requisite infrastructure: a linux server, wifi, couches, and available pizza delivery.

If you'd be interested in doing something like this in the Boston area in the near future, leave a comment on this post.

link

Gregor writes in his weblog about a conversation he had with Prof. Eberhard Hilf:

hilf demonstrated how equations as jotted down by einstein in 1905 would be almost incomprehensible to modern scientists today. over the years, verbose notations have been replaced by increasingly more succinct ones, new symbols have been introduced. i immediately had to think of leaky abstractions. hilf was adamant that physics was not prone to those problems because it is grounded in solid math.

good for them physicists, and too bad computer science cannot claim the same currently.

Hilf may have misunderstood what leaky abstractions are really about. Had he understood, he probably would have seen that physics and other natural sciences have the exact same problem, and that the mathematical rigor he claims is at best equivalent to the formal definition of computer programs and therefore not even relevant to the problem of leaky abstractions.

First, let me explain the problem of leaky abstractions. As originally explained by Joel Spolsky, leaky abstractions are a challenge to software engineers. Much like mathematics and theoretical science, new achievements in software development build on the foundations already in place. These foundations are abstractions that package up the complexity of other tasks. For example, if you are building a program to download a file over a network, you can use the web protocol, HTTP. Then you can choose a program to serve the file from a number of existing applications, and instead of writing the code to connect to the server, follow the rules of the protocol, and write the file to disk yourself, you can simply invoke an existing piece of software that does this. In highbrow engineering circles, this is called reuse and is highly desirable because it saves development time and avoids creating new code that must be debugged and maintained. It also helps to cement existing standards so that software makers can compete on the basis of innovative features rather than "we crash less".

An abstraction becomes leaky when some of the details it claims to handle leak through and become your problem. Continuing the file-fetching example, what happens if the network is down? You depend on some piece of fetch software to get files, which depends on a network protocol to ensure that two computers can communicate reliably, which depends on a network to allow computers to fling bits at each other. If the network can't handle that job right now, it can tell the network protocol. But the protocol can't do anything about a network that's physically disconnected, so it shrugs. The fetch software you invoked can't do anything about a protocol that won't let it connect, so it shrugs. Your program depends entirely on this piece of software, so you shrug. A leak in the bottommost layer of abstractions has sprung through every other layer, and has to be dealt with outside the realm of the automatic. "Plug in your network cable," your computer says. Do you ever get that message when your cable is still plugged in, but your cat has stepped on a power strip and turned off your network hub? Another leak!

You may already see that leaky abstractions can show up outside of computer science. Do business transactions always go as they should? Have you ever come to a restaurant expecting to get a meal, only to find that they couldn't seat your group? Have you tried to drive home from work in the usual 30 minutes only to find that weather or a car crash dragged that out to 2 hours?

Of course, these are all informal abstractions. In Physics, the abstractions are all mathematically defined. A more rigorous abstraction of driving home from work wouldn't leave any room for leaks. Right?

Well—not exactly wrong nor right. It depends how you look at it. If you're developing the theory alone, you're not going to find that suddenly F = m*a doesn't hold up because e.g. it isn't defined for a = 3 m/s/s; the requirement that definitions be rigorous prevents that. But if you're trying to develop a theory that accurately describes the interaction of actual physical objects, the classical Newtonian abstraction above breaks down at certain points, like when mass is really really small or you're moving really really fast. (More knowledgable readers are welcome to correct/improve my Physics.)

What we see here are two ways to judge the rigor of an abstraction, which I'll call theoretical rigor and applicative rigor. The mathematical foundations of physics ensure its theoretical rigor, but when applied to the description of nature, we can find failures in applicative rigor. Newton's models, though we call them laws, do not accurately describe everything they were once claimed to describe. And applying these laws to real life situations requires accounting for a number of other factors—wind resistance and its ilk. We could qualify the law by describing the highly idealized world it assumes, but that would take too long. We'll settle for expecting the laws of physics to describe limited, idealized versions of what actually happens in real life.

Now back to computing. Most programming languages require that you write well-defined programs—you can't leave out a step and expect the computer to ask you what to do when it gets there. The language usually provides a sensible default, like doing nothing, but this is a way to compress the notation, not to escape rigorous definition. So programming languages, at least those that have an actual deterministic implementation on a computer, actually enforce the constraints of theoretical rigor at least as well as the Physics research community.

But when we take those theoretical tools and apply them to solve problems, we find many leaky abstractions: broken networks flummox our web browsers; buggy data compression libraries leave security holes open in our servers. Each of these bugs is like a wind resistance we hadn't thought of. We hackers had been assuming a simpler world, and so the model of the world we coded for doesn't exactly correspond with the world we're selling software too.

But that's OK: it happens to Physicists too.

 


For more information:

link

For the next few days, I will be working on a paper submission for this conference based on my thesis research. So please don't distract me by offering fun things to do. (:

link

In his blog today, Andrew Grumet waxes nostalgic about AOLServer. I used AOLServer with ArsDigita Community System 3, and it was clearly the finest environment for building database-backed web applications that I've ever used. It took me a little while to get TCL, but eventually I understood that everything is a string—really, really, everything.

This site you're reading, however, isn't programmed in ACS or its excellent successor OpenACS. It's programmed in Apache::PageKit, which sits on top of Apache's mod_perl. This is the same environment I chose for the Williams Students Online (WSO) website.

PageKit works well for WSO for two reasons: it's simple, and it's Perl. Simplicity is key because as a student organization, WSO has high turnover and it must be easy for new people to contribute. Perl is good because it's a language that most young geeks know or aspire to learn, and can learn quickly thanks to many good online sources. Also, it has an insanely great repository of reusable modules.

I had hoped to use a Java framework at first, but these were so brutally complex and huge I couldn't even set them up, let alone train other students with little programming experience to hack them. That failure brought up two priorities: the web framework had to be simple enough that I could explain it to someone with a little programming experience in 5 minutes and they would get it, and the language had to be something that didn't make such a person recoil in horror.

ACS and OpenACS are wonderful systems but they fail on both counts. To someone who is well educated about relational databases and understands the implications of programming multi-user web/db apps, ACS is beautiful and makes excellent sense; but WSO had only one such person—me. To someone who has just taken a Java data structures course and has never developed any software for users, ACS is unfathomably huge and the strange languages of TCL and SQL are peculiar to mysterious. Experienced geeks who haven't done DB apps, of which WSO had a handful, can usually see SQL as something potentially useful for real-world stuff but tend to view TCL as a passe fad. You need these experienced folks to lead interesting projects, and you want them to be comfortable, so you tend to avoid suggesting any language that elicits a bitter beer face.

PageKit really is a simple and wonderful framework. The essential feature is that your website is split into a Model (program code) and a View (HTML pages). When a URL is hit, PageKit looks for a corresponding subroutine in your Model code. That subroutine is called, and is responsible for setting some values. The View is then rendered; the HTML in each page is mixed with special tags that the server processes and replaces with the values your Model code produced. The Model code is nice, object-oriented Perl, and the View templates are HTML with about 5 special template tags that do IF blocks, loops, and stick values set by the model into your document. It's a pretty good framework, and though it doesn't have the sophisticated user management, package system, or array of software that OpenACS has, it's much much easier to learn and extend.

But Andrew did bring up one point where Perl/PageKit is simply uglier than the AOLServer API: database calls with loops. One of the most common tasks in generating a dynamic web page is executing a SQL query and looping over the results. AOLServer offers the beautiful db_foreach construction:

 set title "Late Night With Conan O'Brien"

db_foreach get_matches {
select description, tvchannel, when_start, when_stop
from xmltv_programmes
where title = :title
} {
do_something_with $title $description $tvchannel
do_something_else_with $when_start $when_stop
} 

In PageKit right now, you'd write this as:

 my $title = "Late Night With Conan O'Brien";
my $sth = $model->dbh->prepare('
select description, tvchannel, when_start,
when_stop from xmltv_programmes where title = ? ');
$sth->execute($title);
while(my $row = $sth->fetchrow_hashref)) {
do_something_with($row->{title}, $row->{description},
$row->{tvchannel});
do_something_else_with($row->{when_start},
$row->{when_stop});
} 

Now, this isn't 2 to 8 times as much code as AOLServer; it is only 2 extra lines and a few characters. But all the differences are mere scaffolding, and we should be able to abstract that scaffolding and make the essential feature—running a SQL query and looping over the results—absolutely central. The TCL does some scary stuff though: it takes the columns returned from the query and creates variables in the loop body's namespace with the same names. It also does something really, really useful that Perl's DBI (generic database interface) lacks: its :title syntax specifies that the title variable's value should be used (bound) in the query at that point. In Perl, this is a two-step process: you specify ? in the query where you want a value to be bound, and then when you execute the query, the arguments you pass are filled into the corresponding question marks. The big problem with the Perl approach (used also in Java and Microsoft ADO, and probably others) is that you can easily have a dozen ?s in a SQL statement and it's a pain to match them up in order. We should be able to use names like AOLServer. Andrew probably didn't intend his post this way, but I'm looking at it as a challenge: can I make my environment as good?

Answering the Challenge

Can I answer this challenge? Let me try to state it first:

  1. Make it possible to write Perl that has the same effect and structure as the AOLServer db_foreach statement, without extraneous code.
  2. Package this code as a module so that it's easy for any programmer to use the new construct.
  3. Slight adjustments to accomodate Perl syntax and conventions are OK, but any programmer who knows AOLServer's db_foreach should be able to instantly recognize and accurately understand the corresponding Perl.

Sketch of a Solution

I'll take the following template as my goal. I'd like to have working Perl that looks like this:

 my $title = "Late Night With Conan O'Brien";
$dbh->db_foreach 'get_matches' q{
select description, tvchannel, when_start, when_stop
from xmltv_programmes where title = :title
} {
do_something_with($title, $description, $tvchannel);
do_something_else_with($when_start, $when_stop);
} 

Can I do it? Will Perl let me meddle with the language syntax enough to make this kind of crazy business compilable? This post will be continued tonight…


Update 2/27/04: I didn't get to continue this post the night I wrote it. Since then, I've mostly hacked the code I want, but I haven't had a chance to write it up. Meanwhile, Jay has written some code that basically does the looping for you. He also calls poking around in the symbol table to find existing variables "disgusting", a point which I'm both inclined to agree with and find rather irrelevant. As a teaser, I have Perl code that looks and works almost exactly like the TCL code, including grabbing existing variables and support for named bind variables.

link

About time I learned this totally sweet language!

link

Real video of a presentation featuring some Squeak stuff. About teaching or something?

Next Page »