computers/web app frameworks


I made an unusual implementation choice for my project last week, OracleBot. I built it on Google App Engine, Google’s new Python-and-BigTable-based web application hosting system. I learned a lot about AppEngine, and came to think it will be an important deployment platform. In case you don’t know about AppEngine yet, here are the main points:

  • AppEngine provides a way to host python-based web applications on Google infrastructure.
  • Django, my preferred web app framework, is supported (aside from the Object-Relational Model system, because there is no relational database available).
  • Apps run in a privilege-limited python environment, potentially distributed among many individual servers.
  • Developers have no knowledge or control over any servers; you just upload code and it runs, moving or being replicated as usage demands.
  • There are quotas on CPU, storage, and bandwidth usage. These are ample for a new site and said to handle about 5 million pageviews monthly.
  • All persistent data storage must be done via the AppEngine Data Store.
  • The Data Store is based on Google’s BigTable, and although it provides a SQL-like query interface, it is emphatically NOT a relational database.

Working in AppEngine was great in some ways, and limiting in others. On the upside, uploading an app has never been easier; you type a command on your development machine, a new version is deployed, and an admin console allows you to roll back to previous versions if you broke something. On the downside there are a few points:

  • It’s still a “preview release”. There are still has some bugs. I ran into one which cost me a couple hours of confusion. And it’s a limited release, so you have to request and wait for an invitation.
  • You’re working in the Google machine. While in principle there’s nothing stopping other companies from providing a compatible app-hosting environment, that’s not an option yet. Even with my use of Django, OracleBot would take a day or two to port for hosting on my own server. Combined with the vague quota system– you can ask for more but there is no guarantee or price structure yet– it would be foolish to deploy anything mission-critical on AppEngine for now.
  • No computation outside of the HTTP request/response cycle. Your app can’t receive email, run batch data updates, or even maintain a persistent connection to a client.
  • Designing apps for scale is weird. Working in AppEngine, you can’t help thinking about how such-and-such feature will soak up a few extra seconds per request, or whether your quota can handle polling for updates every 5 seconds vs. every 30 seconds. If you’re not Google (and not extremely incompetent), you’ll be lucky to develop an app popular enough to overwhelm a cheap server. So while you’re still working out the feature set, it may make more sense to develop in a more flexible environment — like with an RDBMS — and consider AppEngine an option for scaling later.

It’s impressive how acutely aware of scaling concerns you must be when working in AppEngine. They’ve done an elegant job of this; whereas you’d normally have to work very hard to transform an existing app into something scalable– with all sorts of distributed network caches and complicated database replication/sharding schemes– AppEngine makes it hard to refuse scalability. This is achieved in numerous little ways, mostly evident in the data store. Operations that are trivial in RDBMSs, like multi-way joins or uniqueness constraints, are nearly impossible in AppEngine. And transactions are not something you can layer on at the end. Each persisted entity can optionally be stored as a “child” of some other entity, rather than as a parentless “root”. Since roots can, along with their children, be moved or replicated at will, you can’t run a transaction across multiple roots because that might require some nasty multi-phase commit across several machines. So you again have to step back and plan for the transactions you’ll need as you’re modeling and writing data. This takes time, which is why oraclebot has some embarrassing bugs related to locking. (You may have seen the “-1 lines left” problem.)

Most of these downsides will erode over time. The environment will get friendlier as documentation, books, and toolkits evolve. AppEngine will likely add new kinds of service, such as batch data loading/retrieval, the ability to receive email, and support for longer-lived connections (Comet). Google will probably offer quota increases at predictable prices soon, and eventually competitors could offer a compatible hosting environment, turning AppEngine into a de-facto deployment standard.

Benefits for Non-coders

A standard deployment process may be the most significant consequence of AppEngine. It used to be that you’d have to get a server, set it up, copy your code, and configure a web server to run it. Now you just create an app in AppEngine’s console, run a command to upload your code and… there is no step 3.

This is significant because, by reducing the overhead of deploying an app, it becomes feasible to deploy custom apps for small customers. A software vendor can write a piece of software — a bug tracker, or a CRM, or perhaps a Human Resources Mangement System — and offer it for free or cheap because its users can buy hosting as a utility from Google (or someday, its competitors). Google is already hinting at this by offering users of Google Apps for Domains the ability to deploy an AppEngine app on a subdomain. If they can open up the Apps for Domains distribution channels a little, independent software vendors (ISVs) would line up to sell apps to customers there. That’s better for everyone: ISVs can build software without setting up customer hosting infrastructure, small customers can run hosted apps without an IT staff, and Google can operate the infrastructure in their efficient data centers.

More at BarCamp

If you enjoyed this post, look for me at BarCamp Boston 3 this weekend (May 17-18, 2008). I’m planning to do a session on AppEngine and would love to team up with other hackers, whether you’re experienced or just curious.

I have a confession to make: I'm learning Ruby and Rails. And I dig it. It is, as it claims, a great leap forward in productivity compared to complex frameworks like J2EE, Struts, or even ASP.NET. I was always prepared to believe this— I would say the same for the PageKit framework for Perl, which I've been using for 4 years, and has been pretty stable for at least 5. But until recently, I wasn't believing the hype that Rails could be much better than PageKit.

After all, both PageKit and Rails had the whole MVC thing going, which is the most important advantage. (MVC stands for Model-View-Controller, a pattern whose incarnation in web apps generally means that you have HTML-like files describing page layout, and Perl (or Ruby or Java or whatever) modules containing program code that provides data for that view.) The difference seemed to be mainly that Rails included an object-relational mapping tool.

I am not a fan of object-relational mapping tools. I consider them a special case (no pun intended… well not originally at least) of CASE tools. CASE stands for Computer Assisted Software Engineering, and is based around the idea that you can write programs to do part of the job of software engineering, and thereby make humans more productive.

CASE tools are bullshit. It's not that productive tools aren't helpful to hackers — they certainly are. But tools that write code necessarily degrade the design of your software in a very severe way. If you can't just write the code in an appropriately uncomplicated way, the solution isn't to have a fancy program produce the complicated code. The solution is to change the language you're using so you can express exactly what you need to express, concisely.

That is basically why I like dynamic languages. They remove a large set of restrictions on how you can redefine the meaning of utterances in the languages, so if you decide that looking up the key sasquatch in an associative array has a special meaning anywhere in your program, then damnit, you can have it work in a special way. Easily.

For those who argue that this enables sloppy code, I have two answers:

  1. With great power comes great responsibility. Not everyone is capable of accepting the responsibility of programming in a wide-open language, and if you think you need the sort of guidance that the Java wizards at Sun can produce (aimed at a target market of all application programmers in the world), you should use Java. I'm not going to say great programmers don't need guidelines or rules, because they do— they are constantly both affecting them and affected by them— but their rules are crafted and evolved around the subtle needs of their partners and customers, not pronouncements from language priests.
  2. The alternatives are still sloppy code or code generation. If your language won't let you say things elegantly, you can either buy/invent a new language that will, or you can write inelegant code.

Anyway… back on topic: Object-Relational Mappers. First a definition: an O-R Mapper is a gadget that bridges the divide between your relational database's idea of the important data types in your application, and your programming system's idea of the important data types in your application. Your DB will be relational (stuff goes in tables, powerful querying is built-in) and normally persistent (saved to disk), while your programming system's data will be object-oriented (data is abstracted within object interfaces, querying has to be custom-built) and normally transient (in memory only).

I have so far been doing my web applications in a pretty traditional way, database-wise: my Perl code contains strings of SQL. Then the Perl code processes the results of the SQL query and munges it into data structures (lists of hashes or something) that my templates can handle. This is OK; I've gotten pretty good at dashing out SQL. But:

  • it's hard to reuse code for queries needed in multiple places
  • Perl may not be the most visually clean language, but it has a style, and dropping a fat SQL string into the middle of it is like carpeting the floor of a rainforest
  • each data structure is specifically designed to match the template where it is used, so changes in what details are displayed require changes to SQL and to SQL-result-processing code
  • there isn't an obvious place to put common functions relating to a certain data type.

Well, that list is exactly what O-R Mappers are supposed to solve. Unfortunately, I thought none of them actually worked, mostly because code generation produces very brittle code and it's not OK for code that depends on my (rapidly evolving) database design to be brittle.

Thanks to Rails' ActiveRecord, I now realize O-R mappers aren't doomed. ActiveRecord creates the mapping at runtime, which is possible because it's running in a dynamic language, all of which means that I can add a column to some table and then just program with it. And because things are just expected to have the same name convention everywhere, there's no need for bloated configuration files that tell you totally obvious crap, eg. that the DB's people table's birthday column goes into the Person object's birthday field. And you can still write SQL if you must, though AR will do most of the joins you need.

This O-R stuff isn't as huge a productivity gain as moving from J2EE to anything dynamic and MVC. So the PageKit to Rails transition isn't as huge as most transitions to Rails. But the benefit, though marginal, is significant. Consider me impressed.

[Of course, if someone had done this in Perl it would have been faster! —ed.]

I keep on hearing good stuff about the Ruby on Rails web application framework. It's like the Republican party—one of the highest values of its supporters is to tout its greatness. But the Rubyblicans have evidence: cool projects like Basecamp and sibling Tadalist, wiki+hierarchy tool Hieraki, and the aforementioned Web Collaborator. Not to mention very nice documentation, such as this tutorial on making, guess what, a to-do list.

Perhaps it's too late to turn my own (upcoming) to-do list application into an experiment with Ruby on Rails, but my beloved Perl on PageKit still let me get a prototype of voo2do kicking in about a day. Definitely, the vast majority of my time on v2 has been spent tweaking CSS and Javascript to make the interface work.

But I'll save the rest of the hype until you can actually visit and try voo2do…

link

LAMPPIX allows you to burn your web projects (i.e. PHP presentations or Perl scripts) onto a CD-ROM and give them away to others. They will only have to insert the CD and reboot — if you configured LAMPPIX right (and this is really easy!) they can view your project

If we ever want to market frassle for intranets, this is the way to do it. Download a 200MB CD image, burn, boot, and you've got your very own frassle to play with. As they say in car sales, the feel of the wheel will seal the deal.

link

now to see if I can manage a second international trip in September…

link

This is Bob Doyle's site where has some potentially interesting videos from conferences, including OSCOM. His CMSReview site and related community of practice is also cool. When will video/audio from BloggerCon be up?

link

This site was created to give you the opportunity to "try out" some of the best open source and free php/mysql based software systems in the world. You can log in as the administrator to any site here, thus allowing you to decide which system best suits your needs. Each system is deleted and reinstalled every two hours. This allows you to be the administrator of any system here without fear of messing anything up.

It's php/mysql only, but still, having a single place to try out tons of different CMSs is pretty cool.

link

Here are the slides from a talk about the growth of LiveJournal from single machine hobby to 60-machine supersite. Very technical, very Perl, very interesting.

link

Here is a very interesting, thorough analysis of how Google might build Gmail. It really shows off how much ass Google truly kicks.

link

you had me at "we believe J2EE should be easier to use"

Update 4/6/04: Holy shit, this is easier to use?

Next Page »