Sunday, December 30, 2007

Grid computing - using web browsers

Grid computing is one of those areas that seems to have a magic appeal to software developers. There is something very attractive about taking some relatively simple computers and wielding their combined power to perform seemingly infinitely large computing tasks within reasonable times.


I've also always been attracted to grids. But as many developers, I too thought this type of power was not within reach for me. Only since Google started documenting the "cloud of commodity PCs" that power their vast computing power, does it suddenly seem quite feasible for even just "large" companies to have their own computing cloud.


But my problem remains the same. I don't work for Google, Yahoo or IBM and I'm not a large company myself. So I don't have access to a set of commodity PCs that I can combine into a grid. So for years I decided that I'd never get a chance to work on a grid, unless I'd start working for one of those big boys.

Recently I've been thinking about an alternate setup for a grid computer, more along the lines of the SETI@Home project and all its successors. Those programs all allow home PCs of users all over the world to take part in a giant computer network - a global grid in essence. So the people creating these programs get a lot of computing power, yet they don't have to manage the hardware. A powerful setup.

But such setups already exist. And they have one downside that keeps them from even more mainstream adoption: they require the user to install software to put their computer into the grid. And although the threshold isn't very high, it's still too high for many people. So a lot of potential computing power is not used, because the barrier of installing software is too high.

Now that got me thinking: is there an existing platform on modern PCs that we can just embed our own grid applications in? Preferably a platform that's been around for a few years, so all its quirks are known. And it would be nice if the platform comes with built-in internet connectivity.

Here's the idea that popped into my head: web browsers! They used to be nothing more than HTML viewers, but those days are long gone. Nowadays our browsers are hosting more and more complete applications, like GMail, PopFly and Yahoo Pipes. These applications prove that there is a lot of computing power in the web browser. Is it possible to use the web browsers that people have open on their PCs all the time and turn those into nodes in the grid?

It is a very simple concept: every browser that has a certain URL open is a node in the grid. For a computer to join the grid, they just surf to the URL. To leave the grid again, they navigate away from the URL. It doesn't get much easier than that, right? No software to install, just a page you have to visit. Put it in your favorites in the office, open it every morning when you boot your PC and that's one more node in the grid. From even my own limited reach, I know of at least 5 machines that I could "grid enable" in this way. Those are all PCs and Macs that are on for a large part of the day, just waiting for me or my family to use them. Or that's what they used to be... now I can't stop thinking about them as being nodes in my "web based grid".

If you're a software developer reading this, than your mind probably started wandering while reading the last few paragraphs. Is this possible? How would the nodes get their tasks? How would they report their results back? How would you manage the nodes in the grid? Where do you keep the data that is needed for/generated by the nodes? How do you handle XSS issues? Wouldn't the nodes quickly overload the server that manages them? The list of challenges is seemingly endless and definitely too much for me to deal with in one go.

All I know is that ever since this idea popped into my head, I can't stop thinking about it. And for every problem, I can see at least a few potential solutions. I have no idea whether they'll work or which one is best, but the only way to figure that out is to actually start building the platform.

Oh man... I really need to make this my 20% project. Or more likely... I really need a lot of people to make this their 20% project. Help?

Saturday, December 22, 2007

The origin of the name Apache web server

I read a lot. Not much literature and novels as unfortunately those seem to suffer under my more professional reading habits. I read lots of technical articles, white papers, blog posts and specifications. It's part of what I do to keep up to date with the things happening in the CS field. But in part I also read all kinds of stuff to gain a broader understanding of our profession.

Some of the longer things I read this year include "PPK on JavaScript", "The no asshole rule", but also the venerable "Art and science of Smalltalk". And some colleagues even caught me reading an OS9 AppleScript manual dated somewhere around 1999. They're still making fun of their discovery almost every day, but I don't mind... having read that manual has given me a better understanding of how the now much heralded Apple engineers thought about making an end-user programming language almost a decade ago.

Recently I read the bulk of Roy Thomas Fielding's thesis Architectural Styles and the Design of Network-based Software Architectures in which he introduces the principles of REST. As with any thesis it is a bit too abstract for my taste, but it did introduce me somewhat better to the background and theory behind REST.

Aside from that, I made one stunning discover when I read about Fielding's involvement in the creation of the Apache HTTP server:

  • At the time, the most popular HTTP server (httpd) was the public domain software developed by Rob McCool at the National Center for Supercomputing Applications, University of Illinois, Urbana-Champaign (NCSA). However, development had stalled after Rob left NCSA in mid-1994, and many webmasters had developed their own extensions and bug fixes that were in need of a common distribution. A group of us created a mailing list for the purpose of coordinating our changes as "patches" to the original source. In the process, we created the Apache HTTP Server Project
Please read that last part again, and again... and again. Until it hits you where it finally hit me. What hit me? Well... I finally understood that the name of the Apache web server might (originally) have had nothing to do with the Apache tribe. The server was created by taking an existing code base and then applying all sort of patches. So in a sense it was a patchy web server. A patchy... Apache...!

Brilliant! In all my years of knowing the Apache web server and the brand that was created around the Apache name, I never realized where it came from.

The Apache website itself has this to say about it:
  • The name 'Apache' was chosen from respect for the Native American Indian tribe of Apache, well-known for their superior skills in warfare strategy and their inexhaustible endurance. It also makes a cute pun on "a patchy web server" -- a server made from a series of patches -- but this was not its origin.
For the moment I'll take their word for it and accept that the name sounding like "a patchy web server" is pure coincidence. I bet it's also more convenient for them in selling the Apache brand: "we named our web server after its inexhaustible endurance" sounds a lot better than "we named our web server after the fact that it was created from a bunch of unrelated patches".