Jubilee Thoughts: Personalized Meme Tracking

September 4, 2006

@ 03:42 AM

About six months ago, I wrote a blog post entitled Jubilee Thoughts: Tracking Hot Topics where I talked about adding meme tracking functionality similar to the features of Memeorandum and TailRank to RSS Bandit. Since I wrote that blog post I haven't written a lick of code that actually does this but I've thought and talked about it a lot. While all I've done is talk I can't help but notice that a few others have been writing code while I've been pontificating in my blog.

In his blog post entitled Spyder Spots a Memetracker Nick Bradbury writes

Andy "Spyder" Herron writes about the "personal memetracker" that's hidden in FeedDemon 2.0.0.25.

I had hoped to complete this feature by now, but as Andy points out, it still needs some work (which is why I hid it and gave it a "beta" label). If you'd like to try it out, select "Popular Topics" from the Browse menu (or just add the "Popular Topics" toolbutton to the toolbar above FeedDemon's browser).
I should add that this feature will probably be useful only to people who subscribe to a lot of feeds since it relies on common links to determine popularity. So if you're not subscribed to feeds which link to the same articles, chances are it won't show you any results.

In another blog post entitled MeMeme Sam Ruby writes

Ross Mayfield: Cue up not what is popular, or what the people I subscribed to produced. Cue up what my social network has found interesting.
Herewith, a simple demonstration of what aggressive canonicalization can produce. Venus may be in Python, but suppose I’m in a Ruby mood. The cache is simply files in Atom 1.0 format, with all textual content normalized to XHTML.

Lets make a few simplifying assumptions: all posts are created equal, each post can only vote once for any given link (this also takes care of things like summaries which partially repeat content), posts implicitly vote (once!) for themselves, and the weight of a vote degrades as the square of the distance between when the post was made and now.

Here’s the code, and here’s a snapshot of the output. The output took 6.239 elapsed seconds to produce on my laptop. I still have more work to do to eliminate some of the self-referential links (in fact, I a priori removed Bob Sutor’s blog from the analysis as it otherwise he would dominate the results). But I am confident that this is solvable, in fact, I am working on expanding what filters can do. I’ll post more on that shortly.

With both Sam and Nick on the case, I'm quite sure that within the next few months it will be taken for granted that one of the features of news aggregator is to provide personalized meme tracking. Although I'm sure that we'll all use the same set of basic rules for providing this feature, I suspect that the problems that we are trying to solve will end up being different which will influence how we'll implement the feature.

For example, the main reason I want this feature isn't to track what the popular topics are across multiple blogs but instead to find what the popular topics are across aggregated blog feeds such as blogs.msdn.com and the numerous planet sites. In reading Sam's blog it seems he'd consider the same feed linking to the same news items as spam to filter out while I consider it be the only part of the feature I'd use. This issue illustrates the main problems I've had with designing the feature in my head. What "knobs" or options should we give users to control how the meme tracker decides what is interesting or not vs. what should be ignored when generating the list of 'hot topics' (e.g. the various meme trackers have said they filter out link blogs since they tend to dominate the results as well)?

Since I've decided to be more focused with regards to RSS Bandit development, I won't touch this feature until podcasting support is done. However I'd like to hear thoughts from our users in the meantime.

Categories: RSS Bandit | Social Software | Syndication Technology

Tracked by:
http://clotsneeze.blog-bash-club.info/index.php/archives/58 [Pingback]
http://www.nhanlife.com/weblog/the-on-demand-blogosphere-revisited/190 [Pingback]
http://www.nhanlife.com/weblog/between-popular-and-personal-there-is-social/402 [Pingback]

« Jubilee Thoughts: Favicons in RSS Bandit... | Home | Jubilee Thoughts: More on Personalized M... »

Monday, 04 September 2006 03:55:57 (GMT Daylight Time, UTC+01:00)

> In reading Sam's blog it seems he'd consider the same
> feed linking to the same news items as spam to filter
> out while I consider it be the only part of the feature
> I'd use.

My primary goal here is to move this from the realm of a few people deciding what people want, and into a realm where may can experiment.

Don't take my statement as the result of any sort of deep analysis or ingrained preference on my part, I simply wrote the application in minutes and commented on the results I saw. All of Bob's entries have common closing text; many of Tim's posts point to an index of a sort. Perhaps with a wider scope, this will work itself out; but if the target here is the long tail, it just might not.

More experiementation is called for. And all it takes to get started is an OPML file of subscriptions.

Sam Ruby

Monday, 04 September 2006 04:48:33 (GMT Daylight Time, UTC+01:00)

Sam,
I agree that the more people experimenting with works and what doesn't, the better. My main problem has been trying to figure out the practical use for this once the "cool factor" runs out.

Tech.Memeorandum is useful because I don't have time to subscribe to all the A-list bloggers. However for the most part I keep on top of the blogs I do subscribe to, so I tend not to need extra tools to tell what does blogs are talking about. Also, I assume but cannpt verify that most people aren't subscribed to enough feeds to make this functionality interesting for their own feed lists.

Where I do see practical usage [for me] is

1. Being able to aggregate all the "me too" posts about recent Google/Yahoo/Microsoft announcements in one place and kill such threads.

2. Keeping on top of high traffic feeds that I have no time to read (e.g. blogs.msdn.com)

That's the perspective I've been taking with this feature. On the other hand, a planet site would probably have different needs from the ones listed above.

Dare Obasanjo

Monday, 04 September 2006 10:21:29 (GMT Daylight Time, UTC+01:00)

Hmm, "consider the same feed linking to the same news items as spam" - presumably in aggegrated feeds like the Planets, you'd check for duplicates/self-references based on the source feeds of entries rather than the aggregate..?

The ability to kill a "me too" thread en masse would be very nice to have.

I'm able to keep on top of my subscribed feeds only by keeping their number down arbitrarily. Like the high traffic case, I'd like to have a *lot* more input, only with less effort on my part to avoid overload - the personal meme tracker sounds like it could help a lot.

Danny

Comments are closed.

Dare Obasanjo's weblog

"You can buy cars but you can't buy respect in the hood" - Curtis Jackson

Navigation for Jubilee Thoughts: Personalized Meme Tracking - Dare Obasanjo's weblog