computers/development tools


I made an unusual implementation choice for my project last week, OracleBot. I built it on Google App Engine, Google’s new Python-and-BigTable-based web application hosting system. I learned a lot about AppEngine, and came to think it will be an important deployment platform. In case you don’t know about AppEngine yet, here are the main points:

  • AppEngine provides a way to host python-based web applications on Google infrastructure.
  • Django, my preferred web app framework, is supported (aside from the Object-Relational Model system, because there is no relational database available).
  • Apps run in a privilege-limited python environment, potentially distributed among many individual servers.
  • Developers have no knowledge or control over any servers; you just upload code and it runs, moving or being replicated as usage demands.
  • There are quotas on CPU, storage, and bandwidth usage. These are ample for a new site and said to handle about 5 million pageviews monthly.
  • All persistent data storage must be done via the AppEngine Data Store.
  • The Data Store is based on Google’s BigTable, and although it provides a SQL-like query interface, it is emphatically NOT a relational database.

Working in AppEngine was great in some ways, and limiting in others. On the upside, uploading an app has never been easier; you type a command on your development machine, a new version is deployed, and an admin console allows you to roll back to previous versions if you broke something. On the downside there are a few points:

  • It’s still a “preview release”. There are still has some bugs. I ran into one which cost me a couple hours of confusion. And it’s a limited release, so you have to request and wait for an invitation.
  • You’re working in the Google machine. While in principle there’s nothing stopping other companies from providing a compatible app-hosting environment, that’s not an option yet. Even with my use of Django, OracleBot would take a day or two to port for hosting on my own server. Combined with the vague quota system– you can ask for more but there is no guarantee or price structure yet– it would be foolish to deploy anything mission-critical on AppEngine for now.
  • No computation outside of the HTTP request/response cycle. Your app can’t receive email, run batch data updates, or even maintain a persistent connection to a client.
  • Designing apps for scale is weird. Working in AppEngine, you can’t help thinking about how such-and-such feature will soak up a few extra seconds per request, or whether your quota can handle polling for updates every 5 seconds vs. every 30 seconds. If you’re not Google (and not extremely incompetent), you’ll be lucky to develop an app popular enough to overwhelm a cheap server. So while you’re still working out the feature set, it may make more sense to develop in a more flexible environment — like with an RDBMS — and consider AppEngine an option for scaling later.

It’s impressive how acutely aware of scaling concerns you must be when working in AppEngine. They’ve done an elegant job of this; whereas you’d normally have to work very hard to transform an existing app into something scalable– with all sorts of distributed network caches and complicated database replication/sharding schemes– AppEngine makes it hard to refuse scalability. This is achieved in numerous little ways, mostly evident in the data store. Operations that are trivial in RDBMSs, like multi-way joins or uniqueness constraints, are nearly impossible in AppEngine. And transactions are not something you can layer on at the end. Each persisted entity can optionally be stored as a “child” of some other entity, rather than as a parentless “root”. Since roots can, along with their children, be moved or replicated at will, you can’t run a transaction across multiple roots because that might require some nasty multi-phase commit across several machines. So you again have to step back and plan for the transactions you’ll need as you’re modeling and writing data. This takes time, which is why oraclebot has some embarrassing bugs related to locking. (You may have seen the “-1 lines left” problem.)

Most of these downsides will erode over time. The environment will get friendlier as documentation, books, and toolkits evolve. AppEngine will likely add new kinds of service, such as batch data loading/retrieval, the ability to receive email, and support for longer-lived connections (Comet). Google will probably offer quota increases at predictable prices soon, and eventually competitors could offer a compatible hosting environment, turning AppEngine into a de-facto deployment standard.

Benefits for Non-coders

A standard deployment process may be the most significant consequence of AppEngine. It used to be that you’d have to get a server, set it up, copy your code, and configure a web server to run it. Now you just create an app in AppEngine’s console, run a command to upload your code and… there is no step 3.

This is significant because, by reducing the overhead of deploying an app, it becomes feasible to deploy custom apps for small customers. A software vendor can write a piece of software — a bug tracker, or a CRM, or perhaps a Human Resources Mangement System — and offer it for free or cheap because its users can buy hosting as a utility from Google (or someday, its competitors). Google is already hinting at this by offering users of Google Apps for Domains the ability to deploy an AppEngine app on a subdomain. If they can open up the Apps for Domains distribution channels a little, independent software vendors (ISVs) would line up to sell apps to customers there. That’s better for everyone: ISVs can build software without setting up customer hosting infrastructure, small customers can run hosted apps without an IT staff, and Google can operate the infrastructure in their efficient data centers.

More at BarCamp

If you enjoyed this post, look for me at BarCamp Boston 3 this weekend (May 17-18, 2008). I’m planning to do a session on AppEngine and would love to team up with other hackers, whether you’re experienced or just curious.

link

Awesome tool that records a complete trace of your (Java) program's actions, allowing you to debug by finding where something went wrong, then stepping backwards in your program to figure out why.

It's like a pause-and-rewind button for the world.

link

Behlendorf, co-founder of the Apache Web Server project and current CTO of CollabNet, a firm that hosts systems for collaboration in engineering teams distributed around the world, has an interesting interview at Netcraft. I went there just to use their handy "What's that site running?" tool, but really enjoyed Behlendorf's comments on open source software development, the SCO case, and his own company's experience with offshoring:

At the beginning of 2003, there was much discussion around the executive staff about outsourcing and/or offshoring. We had a dedicated and productive engineering staff in the U.S., but the amount of stuff we *wanted* to do was huge—and customers were demanding new features constantly. I was skeptical about the model where you hand someone a spec and magically they write code for you. While looking at this we met with a company named Enlite Technologies, who had a collaborative project management tool for the electronics-design market, and who had the majority of their engineers in Chennai, India. We were considering outsourcing some work to them, but I really liked the founder (Gopinath Ganapathy) and the team he'd formed, and I wanted something much closer and more, er, collaborative—so we decided to merge. Our products were complimentary, they had a great team in Chennai, and I figured that it was time for us to become our own best-use case in showing how our product could be used to build worldwide engineering teams, as many of our customers had done.

Since that point in time, we've integrated the two teams very tightly. Engineers in each location are spread across the combined codebases, and they know each other on a first name basis. We were the subject of an article in Salon about this. No doubt the topic is controversial, and there are huge challenges to making an offshore or outsourcing model actually work.

The open source model has a lot to do with making that possible. …

link

Eclipse (www.eclipse.org) is a very nice extensible IDE. Currently it has especially strong support for Java and C++ but this is a promising project to add some Perl support. Note that since Eclipse is written in Java it is painfully slow on machines under, say, 1Ghz. Text editing oughtn't be CPU-bound, but…

link

A tool to automatically check conformance to best practices guidelines on a live DB.