Jubilee Thoughts: Personalized Meme Tracking

September 4, 2006

@ 03:42 AM

About six months ago, I wrote a blog post entitled Jubilee Thoughts: Tracking Hot Topics where I talked about adding meme tracking functionality similar to the features of Memeorandum and TailRank to RSS Bandit. Since I wrote that blog post I haven't written a lick of code that actually does this but I've thought and talked about it a lot. While all I've done is talk I can't help but notice that a few others have been writing code while I've been pontificating in my blog.

In his blog post entitled Spyder Spots a Memetracker Nick Bradbury writes

Andy "Spyder" Herron writes about the "personal memetracker" that's hidden in FeedDemon 2.0.0.25.

I had hoped to complete this feature by now, but as Andy points out, it still needs some work (which is why I hid it and gave it a "beta" label). If you'd like to try it out, select "Popular Topics" from the Browse menu (or just add the "Popular Topics" toolbutton to the toolbar above FeedDemon's browser).
I should add that this feature will probably be useful only to people who subscribe to a lot of feeds since it relies on common links to determine popularity. So if you're not subscribed to feeds which link to the same articles, chances are it won't show you any results.

In another blog post entitled MeMeme Sam Ruby writes

Ross Mayfield: Cue up not what is popular, or what the people I subscribed to produced. Cue up what my social network has found interesting.
Herewith, a simple demonstration of what aggressive canonicalization can produce. Venus may be in Python, but suppose I’m in a Ruby mood. The cache is simply files in Atom 1.0 format, with all textual content normalized to XHTML.

Lets make a few simplifying assumptions: all posts are created equal, each post can only vote once for any given link (this also takes care of things like summaries which partially repeat content), posts implicitly vote (once!) for themselves, and the weight of a vote degrades as the square of the distance between when the post was made and now.

Here’s the code, and here’s a snapshot of the output. The output took 6.239 elapsed seconds to produce on my laptop. I still have more work to do to eliminate some of the self-referential links (in fact, I a priori removed Bob Sutor’s blog from the analysis as it otherwise he would dominate the results). But I am confident that this is solvable, in fact, I am working on expanding what filters can do. I’ll post more on that shortly.

With both Sam and Nick on the case, I'm quite sure that within the next few months it will be taken for granted that one of the features of news aggregator is to provide personalized meme tracking. Although I'm sure that we'll all use the same set of basic rules for providing this feature, I suspect that the problems that we are trying to solve will end up being different which will influence how we'll implement the feature.

For example, the main reason I want this feature isn't to track what the popular topics are across multiple blogs but instead to find what the popular topics are across aggregated blog feeds such as blogs.msdn.com and the numerous planet sites. In reading Sam's blog it seems he'd consider the same feed linking to the same news items as spam to filter out while I consider it be the only part of the feature I'd use. This issue illustrates the main problems I've had with designing the feature in my head. What "knobs" or options should we give users to control how the meme tracker decides what is interesting or not vs. what should be ignored when generating the list of 'hot topics' (e.g. the various meme trackers have said they filter out link blogs since they tend to dominate the results as well)?

Since I've decided to be more focused with regards to RSS Bandit development, I won't touch this feature until podcasting support is done. However I'd like to hear thoughts from our users in the meantime.