Tuesday, June 26, 2007

It's all about the data, not about the code

As I've probably told before, I administer (and contribute to) a few photo blogs. Since I wrote the software for the blogs a few years ago, actually most of the time I spend on it is small tweaks to the code. Adding a feature here and there. Nothing big, but enough to keep the editors happy and myself busy.

But recently I noticed that I've been putting of adding one of the requested features. And I wondered why. The feature in itself isn't very spectacular. The blogs work with a scheduled system, where all editors can see everyone else's posts. Since this is the holiday season, editors are sometimes planning their posts weeks ahead of time. And this can get confusing to the editors because it sometimes isn't very clear anymore where we are today. And something that also slightly bothered me is that some of our editors like to read posts from the administrative interface before they appear on the site. We even get answers to some of our "guess what" photo's before they're available on the public site.

It is time to do something about this. Like I said, it is pretty simple: don't allow the editors to see each others future posts in the administrative interface. So they should still see each others "old" posts, which is a great way to quickly find something you want to link to. But into the future, they should only be able to see their own posts.

The heart of the administrative interface is a table with a row for each post. It's very basic and the code is like:

    for each post
write post to grid
Any half-decent programmer will know how to add the feature:
    for each post
if (postdate <= now || currentUser == author)
write post to grid
But this is where it becomes problematic. The editors can sometimes also post on behalf of other (guest) authors. And of course, they should be able to see the posts that they created on behalf of guest authors. So the condition should be more like:
    for each post
if (postdate <= now || currentUser == editor)
write post to grid
In here we added the concept of an editor: the person who created the post in the system. And this is not necessarily the same as the author: the person who created the content of the post.

This would indeed very easily implement the feature. One additional if statement and I'm done.

There is only one problem with it: I don't keep track of the editor of the post!

The information was never needed, so it has never been recorded. That means that I have about a year and a half of posts for which I don't know who the editor is. Which means that I have to figure out a way to either manually gather that data now (something I don't look forward to), add an exception for the cases where the data isn't available (resulting in uglier code) or somehow programmatically extract the information from the data that we already have. That last option sounds like the least manual work (both now and in later maintenance). But it means that I'll have to write code that touches all 500+ posts we have in the system.

When I realized this, I finally knew why I'd been putting off adding this feature. I don't have any problem modifying the code, even when I hardly use version control and backups of it. Why? Well, simply because I know that if I break it I can just as easily fix it again. That's the benefit of a one man software project. I wrote the code, so I know how everything works. But I am reluctant to touch the data. Why? Well I didn't write most of the data, so I have no idea how to restore it if I "break" something in that area.

This realization reminds me of something I always say to fellow developers: "it's all about the data, not about the code". The code can be re-written without any problem. All it takes is time and a developer. It doesn't even need to be the original developer, because a good set of data allows for lots of reverse engineering. But the data is often gathered from many sources. And if it's lost, it's lost. It will take much more work to find all those sources of data again, if it's possible at all.

Now that I've talked about it this much, I'll probably bite the bullet and add the feature anyway. I'll even do it the right way: by modifying/augmenting the existing data to include information about the editor in addition to the author field we already have. But I'll be sure to first do this in a development environment. And even when I know it works, I'll make some extra backups before applying it to the live environment. It might be a lot of work that is most likely not needed. But I'd rather do the extra work than run the risk of corrupting some of our precious data.

No comments: