Sunday, March 2, 2008

Finding primes using a grid of browsers

It's been a few months since I talked about my idea of combining the power of a lot of browsers into a so-called computing grid. Since writing that post I've been working on such a browser based grid. Progress has been slow, but steady. And I'm learning a lot in the process, so I'm not complaining at all.

My current prototype grid software has a server-side and a client-side part. Tasks and results get distributed over clients as needed. It's actually pretty cool to see one node start a task and another node finish it. But one thing that my grid lacks is a data storage service. Until I add such a service (in a scalable way) I am stuck with a grid that can only handle tasks like finding prime numbers. But even from such a classic task, there is still a lot to be learned.

For example one of the problems we'll need to solve for a browser based grid is the fact that the nodes in the grid are not dedicated to being part of the grid. Whereas people that download SETI@home at least consciously decide to contribute their computer cycles to the project, for a browser based grid this decision is less explicit. Of course this is part of the beauty of a using the web for this grid: it becomes very easy for people to join the grid. But it also means that we can't do certain things. Like eat up all available CPU cycles...

So how do we find prime numbers without eating up too much CPU time? As you probably recall from school: a prime number is a number that can only be divided by 1 and by itself. So the simplest possible algorithm to determine if a given number is a prime is something like:


function IsPrimeNumber(number) {
for (var i=2; i < number-1; i++) {
if (number % i == 0) {
return false;
}
}
return true;
}

As you can see this is quite a literal translation of the definition of primes we all learned. And although there are plenty of optimizations you can do to this algorithm, we'll stick to to this very simple version for now.

What happens if we execute this function on a node? It'll work fine for small numbers. But once the number goes beyond a certain threshold, the loop will start eating up quite some time. And since executing JavaScript in the browser is unrestricted but single-threaded, this means the script will eat up all CPU cycles for a noticeable amount of time. As I said before, that is something that I simply don't want to happen on this grid.

So we need to modify our algorithm to not eat up all CPU time. We could optimize it by for example skipping all even numbers above 2. But that would only double the maximum number we can check with this function before the user starts to notice. What we need is something that will make this function work without noticeable effect on the CPU for any number.

My approach to this has been to split the task into many sub-tasks. So instead of having one loop like in the version above, we split it into multiple loops that each run over a smaller range. Let's say we have the following helper function:

function IsDividableBySomethingInRange(number, start, end);

This function will return true if number can be divided by one of the numbers in the range [start, end>. With this helper function, we can rewrite our original IsPrimeNumber function to use ranges.

function IsPrimeNumber(number) {
var CHUNK_SIZE = 1000;
for (var i=0; i < number; i += CHUNK_SIZE) {
if (IsDividableBySomethingInRange(number, Math.max(i, 2), Math.min(i + CHUNK_SIZE, number-1)));
return false;
}
}
return true;
}

Now if we leave the algorithm like this, we are still executing the work in one go. To actually remove that problem we need to convert the IsDividableBySomethingInRange calls into sub-tasks, which can be executed on the grid separately. The IsPrimeNumber function/task then just has to wait until all its sub-tasks have completed.

The splitting of a task into sub-tasks that can be executed separately and independent of each other is a typical fork operation that I learned back in my college days. We would fork off some work to separate sub-threads, so it could be completed in parallel instead of executing each one in turn. Waiting for the sub-tasks is called a join operation. By creating a fork/join algorithm we're not just making task execution more deterministic, we're also improving parallelism by allowing the sub-tasks to run independently from each other.

So what we really wanted to create all along is a fork/join version of our IsPrimeNumber function/task.

Let's say that we have a fork operation that we can call to somehow fork off a function into a separate sub-task. And let's say that whenever such a sub-task is completed, it will call back into the original task with the result:

class IsPrimeNumber {
var number;
var result;
function IsPrimeNumber(number) {
this.number = number;
this.result = true;
}
function forkSubTasks() {
var CHUNK_SIZE = 1000;
for (var i=0; i < number; i += CHUNK_SIZE) {
fork IsDividableBySomethingInRange(number, Math.max(i, 2), Math.min(i + CHUNK_SIZE, number-1)));
}
}
function joinSubTask(subtask) {
if (subtask.result == true) {
this.result = false;
}
}
}

This is a grid-enabled fork/join version of our IsPrimeNumber task. When we execute this task with a very large number on a grid of nodes, the task will be forked into sub-tasks and each of those sub-tasks can be executed on any of the nodes. When a sub-task is completed its result can be joined back into the IsPrimeNumber, which will assure that the combined result is correct.

IsPrimeNumber(2137)
forkSubTasks
|
|--------->IsDividableBySomethingInRange(2137, 2, 1000)
| |
|------------------+-------------->IsDividableBySomethingInRange(2137, 1000, 2000)
| | |
|------------------+---------------------+---------------->IsDividableBySomethingInRange(2137, 2000, 2136)
| | |
joinSubTask | | |
| | | |
|<-----------------+ | |
| | |
|<---------------------------------------+ |
| |
|<---------------------------------------------------------------+
|
true<--------+

If you know a bit about Google's Map/Reduce algorithm (or the open-source Hadoop implementation of it) you will probably see the similarities between join and reduce.

There is still many things that we can improve here (not swamping the grid with all sub-tasks at once, keep track of the number of missing sub-tasks, etc). But essentially we now have a grid-enabled way of determining whether any given number is a prime.

Saturday, February 2, 2008

Scrum: story points, ideal man days, real man weeks

My team completed its seventh sprint of a project. Once again all stories were accepted by the product owner.

While one of the team member was giving the sprint demo, I started looking in more detail at some of the numbers. Because with seven sprints behind us, we've gathered quite some data on the progress from sprint to sprint. That's the velocity, for XP practitioners.

Looking at the data



So far we've had 131 story points of functionality accepted by the product owner, so that's an average of 18-19 per sprint. The distribution has not really been all that stable though. Here's a chart showing the number of accepted story points per sprint:


Although it is a bit difficult to see the trend in sprint 1 to 5, we seemed to be going slightly upward. This is in line with what you'd expect in any project: as the team gets more used to the project and to each other, performance increases a bit.

The jump from sprint 5 to sprint 6 however is very clearly visible. This jump should come as no surprise when I tell you that our team was expanded from 3 developers to 5 developers in sprint 6 and 7. And as you can clearly see, those additional developers were contributing to the team velocity right from the start.

But how much does each developer contribute? To see that we divide the number of accepted story points per sprint by the number of developers in that sprint:

Apparently we've been pretty consistently been implementing 5 story points per developer per sprint. There was a slight drop in sprint 6, which is also fairly typical when you add more developers to a project. But overall you can say that our velocity per developer has been pretty stable.

Given this stability it suddenly becomes a simple (but still interesting) exercise to try and project when the project will be completed. All you need in addition to the data from the previous sprints, is an indication of the total estimate of all stories on the product backlog. We've been keeping track of that number too, so plotting both the work completed vs. the total scope gives the following chart:

So it looks like we indeed will be finished with the project after one more sprint. That is of course, if the product owner doesn't all of a sudden change the scope. Or we find out that our initial estimates for the remaining stories were way off. After all: it's an agile project, so anything can happen.

Story points vs. ideal man days vs. real man weeks



Whenever I talk about this "number of story points per developer per sprint" to people on other projects, they inevitably ask the same question. What is a story point? The correct Scrum answer would be that it doesn't matter what unit it is. It's a story point and we do about five story points per developer per sprint.

But of course there is a different unit behind the story points. When our team estimates its stories, we ask ourselves the question: if I were locked into a room with no phone or other disturbances and a perfect development setup, after how many days would I have this story completed? So a story point is a so-called "ideal man day".

From the results so far we can see that apparently this is a pretty stable way to estimate the work required. And stability is most important, way more important than for example absolute correctness.

A classic project manager might take the estimate of the team (in ideal man days) and divide that by 5 to get to the ideal man weeks. Then divide by the number of people in the team to get to the number of weeks it should take the team to complete the work. And of course they'll add some time to the plan for "overhead", being the benevolent leaders that they are. This will give them a "realistic" deadline for the project. A deadline that somehow is never made, much to the surprise and outrage of the classic project manager.

I'm just a Scrum master on the project. So I don't set deadlines. And I don't get to be outraged when we don't make the deadline. All I can do is study the numbers and see what it tells me. And what it tells me for the current project is that the number are pretty stable. And that's the way I like it.

But there is a bit more you can do with the numbers. If you know that the developers in the team estimate in "ideal man days", you can also determine how many ideal man days fit into a real week. For that you need to know the length of a sprint.

Our team has settled on a sprint length of four weeks. That's the end-to-end time between the sprints. So four weeks after the end of sprint 3, we are at the end of sprint 4. In those four weeks, we have two "slack days". One of those is for the acceptance test and demo. The other is for the retro and planning of the next sprint.

So there is two days of overhead per sprint. But there is a lot more overhead during the sprint, so in calculations that span multiple sprints I tend to think of those two days as part of the sprint.

So a sprint is simply four weeks. And in a sprint a developer on average completes 5 story points, which is just another way of saying 5 ideal man days. So in a real week there is 1.25 ideal man days!

I just hope that our managers don't read this post. Because their initial reaction will be: "What? What are you doing the rest of the time? Is there any way we can improve this number? Can't you people just work harder?"

Like I said before: I don't believe in that logic. It's classic utilization-focused project management. It suggests that you should try to have perfect estimates and account for all variables so that you can come to a guaranteed delivery date. The problem with that is that it doesn't work! If there's anything that decades of software engineering management should have taught us, is that there are too many unknown factors to get any kind of certainty on the deadline. So until we get more control of those variables, I'd much rather have a stable velocity than a high utilization.

Sunday, January 20, 2008

A WinSCP replacement for the Mac

About half a year ago I wrote a piece on the inability to use my iMac as a web development machine. The reason was very simple: the lack of a utility on OSX with a feature set similar to WinSCP on Windows. It doesn't need to have all features of WinSCP, but at least being able to browse a remote SSH/SCP file system as if it is local and being able to edit remote files without being bothered too much by the latency and remote-ness are a must-have for me.

Apparently I'm not the only one looking for something to replace good old WinSCP on OSX, as this post is one of the most viewed and most commented on this site. In the comments visitors suggested many replacements: Cyberduck, Fugu, Smultron, Filezilla and Transmit. I tried all of them.

But none of these tools seem to do what I want, as -until last week- I still found myself at my XP laptop whenever I wanted to work on a site. Until last week? Yes, because about a week ago my trial license of Transmit ran out. And although it's no WinSCP replacement, Transmit is a decent SCP program. So I went to the vendor website to see what it would cost. On the website I ran into their "One-window web development" program, called Coda. After all the tries one more couldn't hurt, so I decided to give it a go.



After installing Coda you have to get used to the fact that it is not a WinSCP clone. So it doesn't have a dual pane Norton Commander like interface, but instead just has the list of remote files on the left hand side. And when you open one of those files -instead of getting the WinSCP editor popup- Coda opens the files in a large panel on the right hand side. Open a second file and it adds a tab on that panel, where WinSCP would open a second popup window.



Looking at it like that, it seems like Coda tries to mimic an IDE more than anything else. But then an IDE for remote web site development, which is exactly what I use it for. I'm now about half-way through my trial period for Coda and I must say I love it so far.

The file transfers are incredibly fast, way faster than with WinSCP it seems. They're also handled more gracefully. With WinSCP file transfers are handled in the main window, which is typically covered by my editing popup. So I have to alt-tab to the main window to see whether the transfer is complete. With Coda the file transfer is indicated by a throbber animation on the file listing on the left. When the transfer is complete, a brief notification animation is shown in the top right.

The editor is decent enough for my needs. It highlights most of the languages you typically encounter during web development and it follows most standard editing conventions. But it could do with some more keyboard shortcuts for faster editing and especially navigation. It also would be nice if it could integrate with external or online help files for supported languages. There's tons of great documentation out there for Java, JavaScript, HTML, CSS, etc. Unfortunately Coda limits its built in help system to the help that comes with the installation. So for anything beyond that, I still need to tab to my browser and search for the help there. This seriously breaks their "one-window web development" mantra and should be addressed if they want to stick to that claim.

That said: you might notice that the editor has lots of features that the internal WinSCP editor doesn't have, so I shouldn't be complaining. Syntax highlighting is great of course, as are other features like the overview of functions in the current file. The fact that I'm asking for features that you'd normally find in real IDE's and not in WinSCP are actually a big compliment to the Panic people: Coda really feels like a web development IDE!

The biggest downside of Coda I've found so far is the price tag. Like pretty much all software on the Mac it's commercial. Panic decided to set the price tag to $79. Which definitely doesn't make it an impulse buy for me. So I'll evaluate it a bit longer and see what happens when my trial runs out. I'll probably go back to my XP laptop and WinSCP. But if I then find myself missing Coda features, I'll give in and buy it. After all... that would mean it's better suited to my WinSCP. Who would have ever thought that would be possible?

Sunday, December 30, 2007

Grid computing - using web browsers

Grid computing is one of those areas that seems to have a magic appeal to software developers. There is something very attractive about taking some relatively simple computers and wielding their combined power to perform seemingly infinitely large computing tasks within reasonable times.


I've also always been attracted to grids. But as many developers, I too thought this type of power was not within reach for me. Only since Google started documenting the "cloud of commodity PCs" that power their vast computing power, does it suddenly seem quite feasible for even just "large" companies to have their own computing cloud.


But my problem remains the same. I don't work for Google, Yahoo or IBM and I'm not a large company myself. So I don't have access to a set of commodity PCs that I can combine into a grid. So for years I decided that I'd never get a chance to work on a grid, unless I'd start working for one of those big boys.

Recently I've been thinking about an alternate setup for a grid computer, more along the lines of the SETI@Home project and all its successors. Those programs all allow home PCs of users all over the world to take part in a giant computer network - a global grid in essence. So the people creating these programs get a lot of computing power, yet they don't have to manage the hardware. A powerful setup.

But such setups already exist. And they have one downside that keeps them from even more mainstream adoption: they require the user to install software to put their computer into the grid. And although the threshold isn't very high, it's still too high for many people. So a lot of potential computing power is not used, because the barrier of installing software is too high.

Now that got me thinking: is there an existing platform on modern PCs that we can just embed our own grid applications in? Preferably a platform that's been around for a few years, so all its quirks are known. And it would be nice if the platform comes with built-in internet connectivity.

Here's the idea that popped into my head: web browsers! They used to be nothing more than HTML viewers, but those days are long gone. Nowadays our browsers are hosting more and more complete applications, like GMail, PopFly and Yahoo Pipes. These applications prove that there is a lot of computing power in the web browser. Is it possible to use the web browsers that people have open on their PCs all the time and turn those into nodes in the grid?

It is a very simple concept: every browser that has a certain URL open is a node in the grid. For a computer to join the grid, they just surf to the URL. To leave the grid again, they navigate away from the URL. It doesn't get much easier than that, right? No software to install, just a page you have to visit. Put it in your favorites in the office, open it every morning when you boot your PC and that's one more node in the grid. From even my own limited reach, I know of at least 5 machines that I could "grid enable" in this way. Those are all PCs and Macs that are on for a large part of the day, just waiting for me or my family to use them. Or that's what they used to be... now I can't stop thinking about them as being nodes in my "web based grid".

If you're a software developer reading this, than your mind probably started wandering while reading the last few paragraphs. Is this possible? How would the nodes get their tasks? How would they report their results back? How would you manage the nodes in the grid? Where do you keep the data that is needed for/generated by the nodes? How do you handle XSS issues? Wouldn't the nodes quickly overload the server that manages them? The list of challenges is seemingly endless and definitely too much for me to deal with in one go.

All I know is that ever since this idea popped into my head, I can't stop thinking about it. And for every problem, I can see at least a few potential solutions. I have no idea whether they'll work or which one is best, but the only way to figure that out is to actually start building the platform.

Oh man... I really need to make this my 20% project. Or more likely... I really need a lot of people to make this their 20% project. Help?

Saturday, December 22, 2007

The origin of the name Apache web server

I read a lot. Not much literature and novels as unfortunately those seem to suffer under my more professional reading habits. I read lots of technical articles, white papers, blog posts and specifications. It's part of what I do to keep up to date with the things happening in the CS field. But in part I also read all kinds of stuff to gain a broader understanding of our profession.

Some of the longer things I read this year include "PPK on JavaScript", "The no asshole rule", but also the venerable "Art and science of Smalltalk". And some colleagues even caught me reading an OS9 AppleScript manual dated somewhere around 1999. They're still making fun of their discovery almost every day, but I don't mind... having read that manual has given me a better understanding of how the now much heralded Apple engineers thought about making an end-user programming language almost a decade ago.

Recently I read the bulk of Roy Thomas Fielding's thesis Architectural Styles and the Design of Network-based Software Architectures in which he introduces the principles of REST. As with any thesis it is a bit too abstract for my taste, but it did introduce me somewhat better to the background and theory behind REST.

Aside from that, I made one stunning discover when I read about Fielding's involvement in the creation of the Apache HTTP server:

  • At the time, the most popular HTTP server (httpd) was the public domain software developed by Rob McCool at the National Center for Supercomputing Applications, University of Illinois, Urbana-Champaign (NCSA). However, development had stalled after Rob left NCSA in mid-1994, and many webmasters had developed their own extensions and bug fixes that were in need of a common distribution. A group of us created a mailing list for the purpose of coordinating our changes as "patches" to the original source. In the process, we created the Apache HTTP Server Project
Please read that last part again, and again... and again. Until it hits you where it finally hit me. What hit me? Well... I finally understood that the name of the Apache web server might (originally) have had nothing to do with the Apache tribe. The server was created by taking an existing code base and then applying all sort of patches. So in a sense it was a patchy web server. A patchy... Apache...!

Brilliant! In all my years of knowing the Apache web server and the brand that was created around the Apache name, I never realized where it came from.

The Apache website itself has this to say about it:
  • The name 'Apache' was chosen from respect for the Native American Indian tribe of Apache, well-known for their superior skills in warfare strategy and their inexhaustible endurance. It also makes a cute pun on "a patchy web server" -- a server made from a series of patches -- but this was not its origin.
For the moment I'll take their word for it and accept that the name sounding like "a patchy web server" is pure coincidence. I bet it's also more convenient for them in selling the Apache brand: "we named our web server after its inexhaustible endurance" sounds a lot better than "we named our web server after the fact that it was created from a bunch of unrelated patches".

Saturday, November 17, 2007

Why devx needs a better print function

I like to read technology articles during my daily commute. And since the train is too crowded for a laptop and I don't have an ebook reader (yet), I still print articles that seem interesting to read during the train ride.

A lot of web sites still have a Print button. What happens when you click that button differs from site to site, but it roughly falls into these categories:

  • Show all pages at once
    Many sites break articles into multiple pages. The print version of the article puts all of these pages together again, to allow them to be printed in one go.
  • Re-layout the site to print better
    Tables seem to be notoriously difficult to print. That's why many sites revert to a table-less layout in their print version
  • Remove navigation elements
    Global and local navigation elements are pretty useless on paper. So they're removed from the print layout.
  • Images - click to see full size version
    Some graphics-intensive sites show images of reduced size in their normal view, showing the full version in a popup when you click some link. Since you can't click a link in the Print version, the full size images should always be shown there.
These are some things that I wish more site would do:
  • Replace animated adds by text adds
    I don't mind showing adds next to good content. I do mind the ignorance of including animated adds in a print layout. I'm pretty sure no printer will deal with these in a useful way.
  • Use images that are more appropriate for B&W
    Most people still use B&W printers. So it would be nice if sites allowed the option of replacing their colored images with version that are more suited to printing on a B&W printer.
    A common example of this are mostly-black screenshots like from command prompts/shell windows. When printed these really eat through a toner at high speed. It would be nice if a site would allow me to replace those images with ones that are mostly white, making my toner last longer.
That's a pretty long list. And most of these things can actually be accomplished on a website without needing a special print version of the articles. Hiding navigation elements, showing non-animated adds and other layout tricks on a print version can easily be accomplished using CSS media types. And why do most sites still use tables for their layouts? Just remove those tables and you have one less difference between the screen and the print version. And I also think it would make sense to show all content on a single page.

So that actually leaves just one reason for having a Print button: showing full sized images inline. And that finally brings us to the title of the article: the print function of DevX.

DevX is a nice development site that sometimes has very interesting content. And one of the reasons their content is good is that they usually include quite a lot of screenshots and diagrams. This just makes their articles so much easier to follow. On screen the articles show the images at a reduced size. Which makes sense, because the images are often full screen screenshots which would otherwise leave hardly any room for text.

But if you've ever printed an article from www.devx.com you've probably noticed their print versions still only show the images with a reduced size. They're not replaced by the full-resolution version. They're not printed in a larger box. They're not even added at the end of the article, like appendices. The images in the print version are exactly the same as in the screen version: reduced to sometimes a tenth of the the original size.

So whenever I find an article in DevX that I want to read on the train, I start up Word and open the print version in there. Then I remove all tables, because they also don't print very well from Word. Then I go back to the browser and open each image, copy it to the clipboard, paste it in Word and then remove the useless downsized version.

And although I normally like the high volume of screenshots that DevX uses in their articles, this is actually a reason why I'd like them to use less screenshots and more text. Because this conversion to Word is not just a lot of mindless work; I sometimes forget to do it and print a DevX article as is. And by the time I realize what I've done, I'm already on the train. So I do my best and squint my eyes trying to read the text in there.

So there you have it: please DevX fix your @$@%&^# Print function.

Saturday, October 6, 2007

Viewing and editing Scrum project management data with Google Mashup Editor

Welcome to my first post on the Google Mashup Editor. In this article we'll create a tool for entering and storing data using Google's new mashup editor tool. Depending on available time, the evolution of Google Mashup Editor and the availability of alternative tools, I might improve on the basic data management application in later articles.

Scrum project management

The application we'll be creating is a Scrum project management tool. If you don't know Scrum yet, it's an agile project management framework. Please do yourself a huge favor and read this 90 page story about Scrum (pdf). It's a good and fun read and has already won over many organizations to at least give Scrum a try.

My reasons for wanting to create this type of application are many. One of them is that there seems to be no tool that satisfies my needs with the right price tag. XPlanner is good, but very basic. Mingle looks nice, but is too expensive and a real resource hog. ExtremePlanner also looks nice, but again: it seems a bit expensive for my taste. But one other reason is probably more important than the price issue: building this data model seems do-able and gives me a chance to get to know Google Mashup Editor a bit more.

Google Mashup Editor

Mashup tools seem to be a dime a dozen these days. These tools try to take programming to the masses, allowing everyone to create complex web applications based on existing data or logic.

Yahoo was the first big player in this field, with their Yahoo Pipes. They're aiming for a visual programming environment where the user manipulates blocks rather than writing code. Microsoft followed suit with Popfly, an even richer mashup creation environment combined with what seems to be the next generation of their MSN Spaces platform.

Google was the last entrant into this field (if I recall correctly) and the first glances at their entry into the field left me rather disappointed. No drag-and-drop programming, no cool default widgets, just a pretty basic text editor and some basic tags.

But if you look below the surface you can see that Google Mashup Editor (GME) is actually quite different from the other two. Where Yahoo and Microsoft just seem to focus on allowing you to read and combine data from various sources, Google also allows you to create new applications from scratch. In that respect GME is more of an application creation (and hosting) platform than a mashup editor.

Much of these additional possibilities seem so originate from Google's adoption of the Atom Publishing Protocol, exposed through the Google Data (GData) APIs. This API is what makes GME not only a mashup editor, but also a valid tool for creating completely standalone applications. These applications are then hosted on Google's servers, using Google's servers for data storage, using the GME to create and update the applications. Some people might not like to put so much in the hands of Google. But it will certainly lower the bar for creating scalable web 2.0 applications.

That's enough of the background talk. Let's get to work on the application.

Initial data model

We'll start by defining the basic entities and relations in our application. We'll probably expand on these later, but we can get pretty far with just the following.

A project is something on which a team works in sprints to create a product or a release of a product. This is all intentionally very vague, as our application doesn't need to know the details of the projects it manages.

A project has a product owner and a scrum master. Aside from that there are other team members, but we'll leave them out of the equation for now.

A sprint is a time period during which the team implements certain stories. A sprint has a start date and end date and a description of the general goal of the sprint.

A story is a piece of functionality that the team creates. It has a name, a description of how to demonstrate it and an estimate of the effort it will take to create the functionality. Stories can be either user-focused or technical in nature.

All stories combined are called the product backlog. Stories from the product backlog are planned into sprints. So each project has one product backlog and some of the stories in this product backlog are planned into each sprint.

This all translates into the following very simple data model:
Let's see how we can translate this data model into GME.

Creating the project list in GME

The first step is to create a new project in GME. This will show you a nice empty application with just a pair of gm:page tags.

<gm:page title="Scrum Project Manager" authenticate="true">

</gm:page>

Everything for our application will be inside the gm:page tags. If you want your application to have multiple pages, just add some more files to it. But for this application a single page will do.

Getting data into GME consists of two steps: defining the data itself and defining the GUI for it. The data itself takes the form of a gm:list tag:

<gm:list id="Projects" data="${app}/Projects" template="projectList" />

The gm:list tag defines a list of data that is used in the application. In many applications the data will be pulled from an external -RSS or Atom- feed. But we want to store the data inside the application, right in Google's mashup servers.

The data of our project list is stored under the ${app}. This is a location (a "feed" in GME terms) where the data of all users of the application is stored. If we don't want to share the data between users, we can store it under ${user}, which is data that is kept per user. Currently there is no way to have data shared between some users (but not all users of the application), although this feature will probably be added in the future.

To display the data in the list, the page needs a template. A template determines what fields to display and how to display them. It's easiest to use an HTML table, so we'll do that for now.

<gm:template id="projectList">
<table class="gm-table">
<thead><tr>
<td width="200">Name</td>
<td width="100">Product owner</td>
<td width="100">Scrum master</td>
<td width="45"> </td>
</tr></thead>
<tr repeat="true">
<td><gm:text ref="atom:title" hint="Project name"/></td>
<td><gm:text ref="gmd:productOwner"/></td>
<td><gm:text ref="gmd:scrumMaster"/></td>
<td><gm:editButtons/></td>
</tr>
<tfoot><tr>
<td colspan="4" align="right"><gm:create label="New project"/></td>
</tr></tfoot>
</table>
</gm:template>

As you can see we're mixing standard HTML tags, with GME specific tags like gm:text, gm:editButtons and gm:create. Also notice the non-HTML repeat attribute on the second tr (a normal HTML table row). This tells GME to repeat that tr for every item in the ${app}/Projects feed.

If we now compile and test this application, we get an empty table with a "New project" button. Pressing the button adds an empty row to the table, with fields to fill in the values for a product.
Note that editing and creation functionality are for free with GME. Although they're not very flexible, they allow you to quickly get started.

Creating the list of stories in GME

Next is a list of stories for a project. Since stories are always part of a project, we store the data under the feed of a project.

<h2>Stories for selected project</h2>
<gm:list id="Stories" data="${Projects}/Stories" template="storyList" />

This is where GME really adds a lot of logic automatically. The location refers to a child of the ${Projects}/StoriesProjects list we defined earlier. Each project in the Projects list will have its own list of Stories.

This list also needs a template to display it, which is really similar to the one for the projects.

<gm:template id="storyList">
<table class="gm-table">`
<thead><tr>
<td width="200">Title</td>
<td width="75">Type</td>
<td width="25">Estimate</td>
<td width="100">How to demo</td>
<td width="45"></td>
</tr></thead>
<tr repeat="true">
<td><gm:text ref="atom:title" hint="Story title"/></td>
<td>
<gm:select ref="gmd:storyType">
<gm:option value="user" selected="true">User</gm:option>
<gm:option value="tech">Tech</gm:option>
</gm:select>
</td>
<td><gm:number ref="gmd:estimate"/></td>
<td><gm:text ref="gmd:howToDemo"/></td>
<td><gm:editButtons/></td>
</tr>
<tfoot><tr>
<td colspan="5" align="right"><gm:create label="New story"></td>
</tr></tfoot>
</table>
</gm:template>

Now the only tricky bit we still need to do for the list of stories, is that it needs to be refreshed when the user selects a different project. This is quite easy, by setting an event handler.

<h2>Unplanned stories for selected project</h2>
<gm:list id="ProjectStories" data="${Projects}/Stories" template="storyList">
<gm:handleEvent src="Projects"/>
</gm:list>

This tells the story list to refresh itself when an event happens in the Projects list we defined before. So select a different project will display the stories for that project.

So after adding the story list and adding some projects and stories, our application looks like this:
We can easily do the same for the list of sprints for the project. Since this is really similar to the list of stories, I won't show the code here. If you want to have a look at the code, look at the finished project on http://scrummer.googlemashups.com.

Last is the list of stories for the selected sprint. Note that stories can either be part of the project or part of the sprint. So for now we'll call the first type "unplanned stories". Later we'll want to share the stories between the project and the sprints.

Since the list of stories is -again- really similar to the list of unplanned stories, we won't show the code here. But when we now run our mashup it looks like this:
At the bottom you can see that I am entering a story. This is almost a usable application, at least for entering and browsing the data. To make it something you'd really want your entire team to use for your daily managing of Scrum projects, it would require more work.

That's it for now. If you want to have a look at the finished code or play with the application, go to http://scrummer.googlemashups.com.