business/companies/Google


I made an unusual implementation choice for my project last week, OracleBot. I built it on Google App Engine, Google’s new Python-and-BigTable-based web application hosting system. I learned a lot about AppEngine, and came to think it will be an important deployment platform. In case you don’t know about AppEngine yet, here are the main points:

  • AppEngine provides a way to host python-based web applications on Google infrastructure.
  • Django, my preferred web app framework, is supported (aside from the Object-Relational Model system, because there is no relational database available).
  • Apps run in a privilege-limited python environment, potentially distributed among many individual servers.
  • Developers have no knowledge or control over any servers; you just upload code and it runs, moving or being replicated as usage demands.
  • There are quotas on CPU, storage, and bandwidth usage. These are ample for a new site and said to handle about 5 million pageviews monthly.
  • All persistent data storage must be done via the AppEngine Data Store.
  • The Data Store is based on Google’s BigTable, and although it provides a SQL-like query interface, it is emphatically NOT a relational database.

Working in AppEngine was great in some ways, and limiting in others. On the upside, uploading an app has never been easier; you type a command on your development machine, a new version is deployed, and an admin console allows you to roll back to previous versions if you broke something. On the downside there are a few points:

  • It’s still a “preview release”. There are still has some bugs. I ran into one which cost me a couple hours of confusion. And it’s a limited release, so you have to request and wait for an invitation.
  • You’re working in the Google machine. While in principle there’s nothing stopping other companies from providing a compatible app-hosting environment, that’s not an option yet. Even with my use of Django, OracleBot would take a day or two to port for hosting on my own server. Combined with the vague quota system– you can ask for more but there is no guarantee or price structure yet– it would be foolish to deploy anything mission-critical on AppEngine for now.
  • No computation outside of the HTTP request/response cycle. Your app can’t receive email, run batch data updates, or even maintain a persistent connection to a client.
  • Designing apps for scale is weird. Working in AppEngine, you can’t help thinking about how such-and-such feature will soak up a few extra seconds per request, or whether your quota can handle polling for updates every 5 seconds vs. every 30 seconds. If you’re not Google (and not extremely incompetent), you’ll be lucky to develop an app popular enough to overwhelm a cheap server. So while you’re still working out the feature set, it may make more sense to develop in a more flexible environment — like with an RDBMS — and consider AppEngine an option for scaling later.

It’s impressive how acutely aware of scaling concerns you must be when working in AppEngine. They’ve done an elegant job of this; whereas you’d normally have to work very hard to transform an existing app into something scalable– with all sorts of distributed network caches and complicated database replication/sharding schemes– AppEngine makes it hard to refuse scalability. This is achieved in numerous little ways, mostly evident in the data store. Operations that are trivial in RDBMSs, like multi-way joins or uniqueness constraints, are nearly impossible in AppEngine. And transactions are not something you can layer on at the end. Each persisted entity can optionally be stored as a “child” of some other entity, rather than as a parentless “root”. Since roots can, along with their children, be moved or replicated at will, you can’t run a transaction across multiple roots because that might require some nasty multi-phase commit across several machines. So you again have to step back and plan for the transactions you’ll need as you’re modeling and writing data. This takes time, which is why oraclebot has some embarrassing bugs related to locking. (You may have seen the “-1 lines left” problem.)

Most of these downsides will erode over time. The environment will get friendlier as documentation, books, and toolkits evolve. AppEngine will likely add new kinds of service, such as batch data loading/retrieval, the ability to receive email, and support for longer-lived connections (Comet). Google will probably offer quota increases at predictable prices soon, and eventually competitors could offer a compatible hosting environment, turning AppEngine into a de-facto deployment standard.

Benefits for Non-coders

A standard deployment process may be the most significant consequence of AppEngine. It used to be that you’d have to get a server, set it up, copy your code, and configure a web server to run it. Now you just create an app in AppEngine’s console, run a command to upload your code and… there is no step 3.

This is significant because, by reducing the overhead of deploying an app, it becomes feasible to deploy custom apps for small customers. A software vendor can write a piece of software — a bug tracker, or a CRM, or perhaps a Human Resources Mangement System — and offer it for free or cheap because its users can buy hosting as a utility from Google (or someday, its competitors). Google is already hinting at this by offering users of Google Apps for Domains the ability to deploy an AppEngine app on a subdomain. If they can open up the Apps for Domains distribution channels a little, independent software vendors (ISVs) would line up to sell apps to customers there. That’s better for everyone: ISVs can build software without setting up customer hosting infrastructure, small customers can run hosted apps without an IT staff, and Google can operate the infrastructure in their efficient data centers.

More at BarCamp

If you enjoyed this post, look for me at BarCamp Boston 3 this weekend (May 17-18, 2008). I’m planning to do a session on AppEngine and would love to team up with other hackers, whether you’re experienced or just curious.

link

Jessie Stricchiola, a click fraud expert who frequently represents advertisers seeking refunds from Google and Yahoo, estimates that click fraud accounts for as much as 20 percent of the clicks in some industry sectors. The president of AlchemistMedia.com, Stricchiola said tens of thousands of advertisers, who pay Google and Yahoo by credit card, are being overcharged daily, adding that neither search engine has a large enough staff devoted to monitoring the problem or fielding complaints.

Hyperbole from selfish lawyers aside, click fraud is an interesting problem. It may never be fully solved, but can advertisers tolerate it like retailers tolerate (some) theft? I think it will eventually be accepted as a cost of doing business, with enforcers at ad syndicates in a contant arms race against this next wave of organized crime.

link

It's true that Google obfuscates their Javascript to make it small, but with a feature as popular as google suggest, people are bound to reverse engineer it into something readable. Not to mention write Perl modules for the backend functionality.

By the way, it would be quite nice if one way to add frassle categories was using a text field with autocomplete. You could type fragments of names and see what's already there, or simply type a new category to create it. No XMLHttp would be needed, because the user's categories are already listed on the page.

link

OK, that's just schweeeeeeeeeeeeeeet.

link

A great article on the upcoming Google/Microsoft architecture war, by Charles Ferguson of High Stakes, No Prisoners fame.

link

I think I saw this site's mention of frassle a couple of months ago, but I didn't see Google's goofy attempt at translation:

The egg-laying woolly milch sow wants to be Frassle this service falls therefore here rather from the row. Beside a feed Aggregator (similar as Bloglines) a Blog and a link collection (with RSS feed) are ordered for free organization. The project still is in the alpha stage.

What is this, Google funny mistakes day or something?

link

Joel Spolsky posted a funny contest:

Ebay affiliates are going completely nuts abusing Google sponsored links. Let's have a little blogger fun before Larry & Sergei (or ebay) figure out how to shut them down.

I'm proposing a little contest. Who can find the most horrifying genuine affiliate advertisement on Google? Here's my entry:

Post your screen capture on your own blog and link back to here.

(A quick explanation, before you freak out: people who think they are extraordinarily clever sign up as ebay affiliates and then buy entire dictionaries full of keywords on Google. These keywords link back to their site, so they can track it, and then on to ebay. They pay Google a small fee for each clickthrough and earn a small fee from ebay for each sale made on ebay, and profit on the difference.)

Mysteriously, this posting has disappeared from Joel's site, as has the image of his search (which suggested a great selection of African slaves). I got the text from my aggregator. Whatever the explanation from that may be, I probably set off some alarms while digging up the following searches.

Illicit Goods and Services

(MDMA is another name for the drug Ecstasy)

Stuff You Might Not Actually Want

Wouldn't you love to meet these singles?

Dictatorship and Coups d'Etat

Read these headlines in order:

Finally, If Only it were So Easy…

link

I noticed this interesting idea while browsing around on Google's site:

[A business goal is] Using millions of computers to solve important problems requiring substantial CPU resources, such as cancer and disease research. For example, we have recently begun small-scale tests with the Folding at Home project at Stanford University with a few thousand selected Google Toolbar users, in preparation for a much larger scale system that would enable our millions of Google Toolbar users to opt-in to contributing their CPU cycles to solving important problems.

A company like Google investing its substantial brand recognition and technical know-how to enable world-wide grid computing would be an interesting adventure indeed. That would be a good way to grow beyond search but advance further in their other, behind-the-scenes competency: running lots of computers together. Possibilities boggle the mind.

link

There are some great observations in this overly long essay.

One interesting trend is the shift of value away from software and toward the network effects surrounding software-based services. What this means is that while the software of ebay or amazon or orkut is fairly easy to clone, each of these businesses has its competitive advantage in the scale and involvement of its user base. The advantage is not in writing software, but in developing self-sustaining communities that invite and reward effective participation. This is dependent on software in roughly the same way that good cities are dependent on the layout of public spaces, roads, parks, transit networks, and buildings. Given enough money, you could clone all of these aspects of a city, but your clone wouldn't have any life until it was full of people constantly occupying the physical space and gradually reshaping it to fit their own lives.

In other words, skills now crucial in making software aren't taught in The Art of Computer Programming. If you want to make software, read Philip and Alex's Guide to Web Publishing, or better yet, A Pattern Language.


There is also some grist for the prediction mill in this essay. Here are mine:

  • Microsoft will ship open-source software within 10 years. Leading up to this point, they will transition to a business focused primarily on helping people find and use content (including software) created by third parties. Their software margins will crumble during this time period, but they may be able to sustain a profitable software business by driving quality up and cost down due to explosive growth in the number of devices that use software.
  • Some interesting stuff is going to happen when people start figuring out how to commoditize network effects. This problem will require figuring out how to make software more responsive to user intentions, and less brittle at the mercy of incompatible formal interfaces. The driving forces in the next generation of programming systems will be social, not technical.

link

I considered printing this, but the preview showed it as 171 pages!

Next Page »