16 May 2008
I made an unusual implementation choice for my project last week, OracleBot. I built it on Google App Engine, Google’s new Python-and-BigTable-based web application hosting system. I learned a lot about AppEngine, and came to think it will be an important deployment platform. In case you don’t know about AppEngine yet, here are the main points:
- AppEngine provides a way to host python-based web applications on Google infrastructure.
- Django, my preferred web app framework, is supported (aside from the Object-Relational Model system, because there is no relational database available).
- Apps run in a privilege-limited python environment, potentially distributed among many individual servers.
- Developers have no knowledge or control over any servers; you just upload code and it runs, moving or being replicated as usage demands.
- There are quotas on CPU, storage, and bandwidth usage. These are ample for a new site and said to handle about 5 million pageviews monthly.
- All persistent data storage must be done via the AppEngine Data Store.
- The Data Store is based on Google’s BigTable, and although it provides a SQL-like query interface, it is emphatically NOT a relational database.
Working in AppEngine was great in some ways, and limiting in others. On the upside, uploading an app has never been easier; you type a command on your development machine, a new version is deployed, and an admin console allows you to roll back to previous versions if you broke something. On the downside there are a few points:
- It’s still a “preview release”. There are still has some bugs. I ran into one which cost me a couple hours of confusion. And it’s a limited release, so you have to request and wait for an invitation.
- You’re working in the Google machine. While in principle there’s nothing stopping other companies from providing a compatible app-hosting environment, that’s not an option yet. Even with my use of Django, OracleBot would take a day or two to port for hosting on my own server. Combined with the vague quota system– you can ask for more but there is no guarantee or price structure yet– it would be foolish to deploy anything mission-critical on AppEngine for now.
- No computation outside of the HTTP request/response cycle. Your app can’t receive email, run batch data updates, or even maintain a persistent connection to a client.
- Designing apps for scale is weird. Working in AppEngine, you can’t help thinking about how such-and-such feature will soak up a few extra seconds per request, or whether your quota can handle polling for updates every 5 seconds vs. every 30 seconds. If you’re not Google (and not extremely incompetent), you’ll be lucky to develop an app popular enough to overwhelm a cheap server. So while you’re still working out the feature set, it may make more sense to develop in a more flexible environment — like with an RDBMS — and consider AppEngine an option for scaling later.
It’s impressive how acutely aware of scaling concerns you must be when working in AppEngine. They’ve done an elegant job of this; whereas you’d normally have to work very hard to transform an existing app into something scalable– with all sorts of distributed network caches and complicated database replication/sharding schemes– AppEngine makes it hard to refuse scalability. This is achieved in numerous little ways, mostly evident in the data store. Operations that are trivial in RDBMSs, like multi-way joins or uniqueness constraints, are nearly impossible in AppEngine. And transactions are not something you can layer on at the end. Each persisted entity can optionally be stored as a “child” of some other entity, rather than as a parentless “root”. Since roots can, along with their children, be moved or replicated at will, you can’t run a transaction across multiple roots because that might require some nasty multi-phase commit across several machines. So you again have to step back and plan for the transactions you’ll need as you’re modeling and writing data. This takes time, which is why oraclebot has some embarrassing bugs related to locking. (You may have seen the “-1 lines left” problem.)
Most of these downsides will erode over time. The environment will get friendlier as documentation, books, and toolkits evolve. AppEngine will likely add new kinds of service, such as batch data loading/retrieval, the ability to receive email, and support for longer-lived connections (Comet). Google will probably offer quota increases at predictable prices soon, and eventually competitors could offer a compatible hosting environment, turning AppEngine into a de-facto deployment standard.
Benefits for Non-coders
A standard deployment process may be the most significant consequence of AppEngine. It used to be that you’d have to get a server, set it up, copy your code, and configure a web server to run it. Now you just create an app in AppEngine’s console, run a command to upload your code and… there is no step 3.
This is significant because, by reducing the overhead of deploying an app, it becomes feasible to deploy custom apps for small customers. A software vendor can write a piece of software — a bug tracker, or a CRM, or perhaps a Human Resources Mangement System — and offer it for free or cheap because its users can buy hosting as a utility from Google (or someday, its competitors). Google is already hinting at this by offering users of Google Apps for Domains the ability to deploy an AppEngine app on a subdomain. If they can open up the Apps for Domains distribution channels a little, independent software vendors (ISVs) would line up to sell apps to customers there. That’s better for everyone: ISVs can build software without setting up customer hosting infrastructure, small customers can run hosted apps without an IT staff, and Google can operate the infrastructure in their efficient data centers.
More at BarCamp
If you enjoyed this post, look for me at BarCamp Boston 3 this weekend (May 17-18, 2008). I’m planning to do a session on AppEngine and would love to team up with other hackers, whether you’re experienced or just curious.