Sunday, July 13, 2008

Using PageRank to determine the most powerful people on LinkedIn

Last week I was reading the original paper by Larry Page and Sergey Brin on the PageRank algorithm. PageRank is an algorithm to determine the relative authority of pages that they came up with while at Stanford University. And it is of course also the algorithm they then just started using for a search engine at Stanford - a search engine they named Google.

If you just skip the math for a moment, the PageRank algorithm is surprisingly simple. The basic rules in determining the rank of a page are:
1) the links to a page determine its rank
2) a page spreads its rank over the links it contains

So if you have a page with three links on it, that page will contribute one third of its rank to each of those links. A page with a rank of 100 that has two links on it, will give 50 points to each of its targets.

A simplified PageRank calculation, showing that each page re-distributes its own PageRank over the pages it links to

One thing you may already have noticed is that the definition of the PageRank can't be determined in one go: you need to know the value of all incoming links to determine the rank of a page, which you need to know to determine the value of all outgoing links. According to the paper you can simply calculate all PageRanks in an iterative fashion until it converges after a quite limited number of steps. I'll gladly take their word for it, since that puts the algorithm back into the hands of us programmers. We can iterate over large collections of data like no other.

A converged PageRank calculation, showing that repeated refinement will result in a stable network of PageRanks

The PageRank algorithm is so simple that it not only works, it has turned the search engine world (and the internet in general) upside down. I'm quite sure that today's PageRank is determined in a much more complex fashion, but let's stick to this older and more simple model for now.

What else could we use PageRank for?

What would happen if you let PageRank loose on a completely different type of data? How about instead of taking all the hyperlinked documents on the web, we take a social network? So take the network of people at a site like LinkedIn, Facebook, Plaxo or Orkut and apply the same two basic rules:
1) a person's rank is determined by the incoming links
2) a person's rank is evenly divided over their outgoing links

So the more people link to you, the higher your rank. And if more people link to the people that link to you, your rank will get even higher.

If we let PageRank loose on a social network like this, what will happen? Who will get the highest PageRank? And what does the rank mean in such a scenario? Is your PageRank an indicator of the power of your social network? Will my boss indeed have a higher rank than me, as you would expect? And will his boss have an even higher rank?

There are of course differences between links on a social network site and the web in general. For example: in a social network like this the links between people are non-directed, when you are connected to someone that person is also connected to you. This will likely affect the results, but I am too bad at maths to foresee how.

Basically I have no idea what the outcome of such an experiment will be. It just seems to me that it would be incredibly interesting to see what the numbers are and then determine what they mean. Much like Page and Brin did with their original PageRank: first let the algorithm loose on the data, then see what you can do with the results. If the algorithm feels intuitively good, the results are bound to be interesting.

Does you feel the same? Can this be an interesting experiment? Or is this complete nonsense and is PageRank not applicable to this scenario? And do the social network sites have APIs that will allow you to do this? Or is your data access limited to the people in your own network (as they always promise in their privacy statement) and can only someone working for Facebook/LinkedIn/Plaxo create such an application?

1 comment:

Kas Thomas said... tries to apply PageRank to Twitterers.

Lots of discussion of this sort of thing at