The Technorati Top 100: A Lesson in How Not to Calculate Weblog Popularity

August 14, 2005

@ 02:00 PM

In recent weeks there have been a number of blog postings critical of the Technorai Top 100 List of popular web logs. The criticisms have primarily been of two flavors; some posts have been critical of the idea of blogging as popularity contests which such lists encourage and others have criticized the actual mechanism of calculating popularity used by Technorati. I agree with both criticisms especially the former. There have been a number of excellent posts arguing both points which I have think are worth sharing.

Mary Hodder, in her post Link Love Lost or How Social Gestures within Topic Groups are More Interesting Than Link, argues that more metrics besides link count should be used for calculating popularity and influence. Some of the additional metrics she suggests include comment counts and number of subscribers to the site's RSS feed. She also suggests creating topic specific lists instead of one ber list for the entire blogosphere. It seems a primary motivation for encouraging this approach is to increase the pool of bloggers that are targetted by PR agencies and the like. Specifically Mary writes

However, I'm beginning to see many reports prepared by PR people, communications consultants etc. that make assessments of 'influential bloggers' for particular clients. These reports 'score' bloggers by some random number based on something: maybe inbound links or the number of bloglines subscribers or some such single figure called out next to each blog's name.

Shelley Powers has a different perspective in her post Technology is neither good nor evil. In arguing against the popularity contests inherent in creating competing A-lists or even just B-lists to complement the A-lists she writes

Even if we tried to analyze a persons links to another, we cant derive from this anything other than person A has linked to person B several times. If we use these to define a community to which we belong, and then seek to rank ourselves within these communities, all weve done is create a bunch of little Technorati 100s and communities that are going to form barriers to entry. We see this communal behavior all too often: a small group of people who know each other link to each other frequently and to outsiders infrequently; basically shutting down the discussion outside of the community.
...
I think Mary should stop with I hate rankism. I understand the motivations behind this work, but ultimately, whatever algorithm is derived will eventually end up replicating the existing patterns of authority rather than replacing them. This pattern repeated itself within the links to Jay Rosens post; it repeated itself within the speaker list that Mary started for women ("where are the women speakers"), but had its first man within a few hours, and whose purpose was redefined within a day to include both men and women.

Rankings are based on competition. Those who seek to compete will always dominate within a ranking, no matter how carefully we try to 'route' around their own particular form of 'damage'. What we need to challenge is the pattern, not the tools, or the tool results.

I agree with Shelley that attempts to right the so called "imbalance" created by lists such as the Technorati Top 100 will encourage competition and stratification within certain blogging circles. I also agree that despite whatever algorithms are used, a lot of the same names will still end up on the lists for a variety of reasons. A major one being that a number of the so-called A-list blogs actually work very hard to be "popular" and changing the metrics by which their popularity is judged won't change this fact.

So Shelley has given us some of the social arguments while popularity lists such as the Technorati Top 100 aren't a good idea. But are the technical flaws in Technorati's approach to calculating weblog popularity so bad? Yes, they are.

Danah Boyd has a post entitled The biases of links where she did some research to show exactly how flawed simply counting links on web pages isn't an accurate way to calculate popularity or influence. There are a lot of excellent points in Danah's post and the entire post is worth reading multiple times. Below are some key excerpts from Danah's post

I decided to do the same for non-group blogs in the Technorati Top 100. I hadn't looked at the Top 100 in a while and was floored to realize that most of those blogs are group blogs and/or professional blogs (with "editors" and clear financial backing). Most are covered in advertisements and other things meant to make them money. It's very clear that their creators have worked hard to reach many eyes (for fame, power or money?).
...
Blogrolls:

All MSNSpaces users have a list of "Updated Spaces" that looks like a blogroll. It's not. It's a random list of 10 blogs on MSNSpaces that have been recently updated. As a result, without special code (like in Technorati), search engines get to see MSNSpace bloggers as connecting to lots of other blogs. This would create the impression of high network density between MSNSpaces which is inaccurate.

Few LiveJournals have a blogroll but almost all have a list of friends one click away. This is not considered by search tools that look only at the front page.
...

Blogrolls seem to be very common on politically-oriented blogs and always connect to blogs with similar political views (or to mainstream media).

Blogrolls by group blogging companies (like Weblogs, Inc.) always link to other blogs in the domain, using collective link power to help all.
...

Male bloggers who write about technology (particularly social software) seem to be the most likely to keep blogrolls. Their blogrolls tend be be dominantly male, even when few of the blogs they link to are about technology. I haven't found one with >25% female bloggers (and most seem to be closer to 10%).

On LJ (even though it doesn't count) and Xanga, there's a gender division in blogrolls whereby female bloggers have mostly female "friends" and vice versa.

I was also fascinated that most of the mommy bloggers that i met at Blogher link to Dooce (in Top 100) but Dooce links to no one. This seems to be true of a lot of topical sites - there's a consensus on who is in the "top" and everyone links to them but they link to no one.
...

Linking patterns:

The Top 100 tend to link to mainstream media, companies or websites (like Wikipedia, IMDB) more than to other blogs (Boing Boing is an exception).

Blogs on blogging services rarely link to blogs in the posts (even when they are talking about other friends who are in their blogroll or friends' list). It looks like there's a gender split in tool use; Mena said that LJ is like 75% female, while Typepad and Moveable Type have far fewer women.

Bloggers often talk about other people without linking to their blog (as though the audience would know the blog based on the person). For example, a blogger might talk about Halley Suitt's presence or comments at Blogher but never link to her. This is much rarer in the Top 100 who tend to link to people when they reference them.

Content type is correlated with link structure (personal blogs contain few links, politics blogs contain lots of links). There's a gender split in content type.

When bloggers link to another blog, it is more likely to be same gender.

I began this investigation curious about gender differences. There are a few things that we know in social networks. First, our social networks are frequently split by gender (from childhood on). Second, men tend to have large numbers of weak ties and women tend to have fewer, but stronger ties. This means that in traditional social networks, men tend to know far more people but not nearly as intimately as those women know. (This is a huge advantage for men in professional spheres but tends to wreak havoc when social support becomes more necessary and is often attributed to depression later in life.)

While blog linking tends to be gender-dependent, the number of links seems to be primarily correlated with content type and service. Of course, since content type and service are correlated by gender, gender is likely a secondary effect.
...
These services are definitely measuring something but what they're measuring is what their algorithms are designed to do, not necessarily influence or prestige or anything else. They're very effectively measuring the available link structure. The difficulty is that there is nothing consistent whatsoever with that link structure. There are disparate norms, varied uses of links and linking artifacts controlled by external sources (like the hosting company). There is power in defining the norms, but one should question whether or companies or collectives should define them. By squishing everyone into the same rule set so that something can be measured, the people behind an algorithm are exerting authority and power, not of the collective, but of their biased view of what should be. This is inherently why there's nothing neutral about an algorithm.

There is a lot of good stuff in the excerpts above and it would take an entire post or maybe a full article to go over all the gems in Danah's entry. One random but interesting point is that LiveJournal bloggers are penalized by systems such as the Technorati Top 100. For example, Jamie Zawinski has over 1900 people who link to him from their Friend's page in LiveJournal but he somehow doesn't make the cut for the Technorati Top 100. Maybe the fact that most of his popularity is within the LiveJournal community makes his "authority" less valid than others with less incoming links that are in the Technorati Top 100 list.

Yeah, right.

Categories: Mindless Link Propagation

Tracked by:
"Measuring The ACTUAL Blogosphere Part 1 - Technorati View" (Sacred Cow Dung) [Trackback]

« Some Thoughts on MSN Filter | Home | Podcasting with Atom 1.0: More Than One ... »

Sunday, 14 August 2005 17:03:03 (GMT Daylight Time, UTC+01:00)

Hi Dara, A couple of comments.

I think people are responding to this effort, as we have this online conversation, because there is such a need to fix the inbound link problem. No question. Making some new way to understand blogs by changing the metric measures from counting inbound links to one that creates topic communities that shows conversationalness seems most valuable. This is my preference, but I've been holding back some to let people comment first.

That would likely minimize some of the problems you are noting in your post, but would probably bring a host of new problems, so making a solution the community likes, and also will help to police against spammers, and represents things we care about, seems to me the right thing to do. However, as soon as there is a metric, a power law will develop, with a few folks at the top, and everyone starts leaning toward the top. And when we all start changing our social behavior to be like the people at the top, we aren't ourselves, talking in our real voices and doing things we would normally do because we like things. This is what is so wonderful about reading blogs. And it seems to me that we need to address this to keep from ruining what is so delightful about blogs and our connections with people through them. So maybe a solution with 25 metrics that all play a small part will be harder to game?

Also, a separate issue, you quote Shelly's words on the speaker's wiki (though the ways men and women network is related to the linking issues...). She says the premise for it was quickly changed from a women's list a list that included men. In fact, I proposed a speaker's list to get more women who speak noticed in front of conference organizers, but that doesn't preclude men. In fact, the problem is that we see the same people (mostly men) speaking over and over. I want new voices *and* more women, so making a wiki where people can sign up, to sort by topic of expertise, makes sense to me, as a vehicle to get new voices out there. I don't want to segregate it for women, because the real issue is new voices and how we find them. To me the answer is to sort people by topic, to highlight the many people who can speak about something. Take a look at the 'open source' category: https://www.socialtext.net/speakers/index.cgi?action=category_display;category=opensource. It's a great list of mostly women, but also folks like Eugene Kim, who has created the Open Source Usability Sprint last spring, that should really be speaking more about open source. Exposing new voices was what I proposed and that is what we are doing.

On your last point, I looked up JWZ in Technorati, here: http://technorati.com/search/www.livejournal.com/users/jwz, where there are 341 links. The link you show above, which is the user page (profile) upon lookup: http://technorati.com/search/www.livejournal.com/userinfo.bml%3Fuser=jwz has 42 links. I'm guessing that the 341 are post links, which Technorati tracks because they are on the front of a blog. They only count links on the front pages, and if users are linking to each other in LJ, profile page to profile page, (which is the LJ equivalent to blogrolling) then Technorati is missing all those 'blogroll' type links for JWZ.

Building something that recognizes different types of links might help, but building something that recognizes communities that find conversation valuable to me makes the most sense. I'd rather stop talking about how Top 100 lists are terrible and start finding the different types of metrics that matter to us, and figure out how they relate socially (for example, are comments on blogs more valueable, assuming spam is filtered out, than post to post links, or is the opposite true, or can we equate them, in a measurement of conversationalness?)

Thanks for helping the conversation about how to fix this problem. It's great to see everyone contributing to this discussion.
mary

mary hodder

Monday, 15 August 2005 14:44:55 (GMT Daylight Time, UTC+01:00)

Mary and Dara,
I just put up a post commending both of you and the others in their efforts towards fixing this problem. It's just now that blogs are gaining more and more in popularity that the professionals [marketers, pr firms, etc...] are realizing blogs are not the same as web sites and cannot be measured by the same set of rules. I'm all for a place where we can openly collaborate but I'd rather keep it to small numbers rather than open it up to everyone in the beginning. Not to play a power game or anything but to keep the noise to a minimum so we can carve out the issues and then once we're comfortable with those issues open it up. We'll then see more issues and solutions but bringing in too many folks in the beginning would do more harm than help IMO.

I began working on a doc some time ago that breaks down the differences in blogs and web sites from a technical perspective. I only bring that point out to say that we'll most definitely need to keep discussions to the functional > technical realm. Meaning, we must describe the problem in functional terms and then translate that in to the technical so a proposed spec or solution can be written. I initially focused on the technical because that's my area of expertise but with the rise of blog search engines like Technorati and BlogPulse, the functional is being misinterpreted because the technical is not understood... Understanding by this group is critical if it is to succeed in it's mission of creating a well-formed set of multi-faceted guidelines to rank and score blogs.

Count me in.

~jason

Jason Dowdell

Comments are closed.

Dare Obasanjo's weblog

"You can buy cars but you can't buy respect in the hood" - Curtis Jackson

Navigation for The Technorati Top 100: A Lesson in How Not to Calculate Weblog Popularity - Dare Obasanjo's weblog