Sunday, December 30, 2007

Grid computing - using web browsers

Grid computing is one of those areas that seems to have a magic appeal to software developers. There is something very attractive about taking some relatively simple computers and wielding their combined power to perform seemingly infinitely large computing tasks within reasonable times.

I've also always been attracted to grids. But as many developers, I too thought this type of power was not within reach for me. Only since Google started documenting the "cloud of commodity PCs" that power their vast computing power, does it suddenly seem quite feasible for even just "large" companies to have their own computing cloud.

But my problem remains the same. I don't work for Google, Yahoo or IBM and I'm not a large company myself. So I don't have access to a set of commodity PCs that I can combine into a grid. So for years I decided that I'd never get a chance to work on a grid, unless I'd start working for one of those big boys.

Recently I've been thinking about an alternate setup for a grid computer, more along the lines of the SETI@Home project and all its successors. Those programs all allow home PCs of users all over the world to take part in a giant computer network - a global grid in essence. So the people creating these programs get a lot of computing power, yet they don't have to manage the hardware. A powerful setup.

But such setups already exist. And they have one downside that keeps them from even more mainstream adoption: they require the user to install software to put their computer into the grid. And although the threshold isn't very high, it's still too high for many people. So a lot of potential computing power is not used, because the barrier of installing software is too high.

Now that got me thinking: is there an existing platform on modern PCs that we can just embed our own grid applications in? Preferably a platform that's been around for a few years, so all its quirks are known. And it would be nice if the platform comes with built-in internet connectivity.

Here's the idea that popped into my head: web browsers! They used to be nothing more than HTML viewers, but those days are long gone. Nowadays our browsers are hosting more and more complete applications, like GMail, PopFly and Yahoo Pipes. These applications prove that there is a lot of computing power in the web browser. Is it possible to use the web browsers that people have open on their PCs all the time and turn those into nodes in the grid?

It is a very simple concept: every browser that has a certain URL open is a node in the grid. For a computer to join the grid, they just surf to the URL. To leave the grid again, they navigate away from the URL. It doesn't get much easier than that, right? No software to install, just a page you have to visit. Put it in your favorites in the office, open it every morning when you boot your PC and that's one more node in the grid. From even my own limited reach, I know of at least 5 machines that I could "grid enable" in this way. Those are all PCs and Macs that are on for a large part of the day, just waiting for me or my family to use them. Or that's what they used to be... now I can't stop thinking about them as being nodes in my "web based grid".

If you're a software developer reading this, than your mind probably started wandering while reading the last few paragraphs. Is this possible? How would the nodes get their tasks? How would they report their results back? How would you manage the nodes in the grid? Where do you keep the data that is needed for/generated by the nodes? How do you handle XSS issues? Wouldn't the nodes quickly overload the server that manages them? The list of challenges is seemingly endless and definitely too much for me to deal with in one go.

All I know is that ever since this idea popped into my head, I can't stop thinking about it. And for every problem, I can see at least a few potential solutions. I have no idea whether they'll work or which one is best, but the only way to figure that out is to actually start building the platform.

Oh man... I really need to make this my 20% project. Or more likely... I really need a lot of people to make this their 20% project. Help?


Anonymous said...

Hi Frank,

Saw this slightly older post from you only today - as I happen to be thinking of the same thing right now. What happened with your idea after that? Were you able to develop it any further?

- Bhathiya

Frank said...

Hi Bhathiya,

You're not the only one interested in "browser-based map reduce" all of a sudden. The post about this topic on ajaxian certainly gets a lot more attention than my humble idea ever got. ;-)

I indeed went ahead and implemented a prototype of the idea. A large part was in finding types of work that are suitable for this type of grid, because -as you can see in the comments on the ajaxian article- data transfer scalability is the limiting factor for this grid style. I posted a follow up article about one type of work that really fits well: finding prime numbers.

After implementing that one in my prototype, I ran into reliability issues when more than a few nodes started working - a topic you'll also hear people talk about on the ajaxian article comments.

The prototype is running my server, but it is not stable enough to warrant public availability. I stopped working on it about 10 months ago, once I had proven to myself that it could work. You know how things go with these pet projects.

I also experimented with a simpler version of the engine, which just tries to find collatz sequences. You can find that experiment here.

I hope this helps. Let me know if you want to know more about specific implementation details.