Jubilee Thoughts: More on Personalized Meme Tracking

September 5, 2006

@ 08:58 PM

Gabe Rivera, author of Techmeme, has a blog post entitled Why I don't offer a personal filter where he writes

I'm facing another round of inquiries on personal filtering, mostly from Techmeme fans who've read Ross Mayfield's or Dare Obasanjo's Jeff Clavier and Ted Leung nearly a year ago!) recent thoughts on the matter. (Just for the record, the first round included requests from

Why don't I offer a personal filter service aka "meMeme" aka "my.memeorandum"? Briefly, filters based on the editorial approach used for Techmeme/memeorandum don't work well outside of a few topic domains (like politics and tech), because cross linking is typically too sparse to produce a compelling mix of news. Sam Ruby unintentionally confirmed this yesterday should you pause to consider what sort of daily news selection could be derived from his Venus output. While it's true that cross linking is dense in some blogospheres, these are largely the same domains already covered by my existing sites.

Why not try editorial approaches based on new kinds of semantic analyses? My belief is that the requisite technology is harder than anything powering Google News, Topix, or my current sites. Attempts based on current technologies come up woefully short, with the resulting "Daily Me" consisting of a seemingly random mix of content missing most or all "must have" articles and posts. And having the "must haves" is essential for winning the earlier adopter types that would dominate the userbase of such a filter in the first place.

I reread the output from Sam's blogroll and it reminded me that there is a difference between the scenarios that sites like Techmeme and Tailrank are interested in and the goals of a personalized meme tracker. Here are a copuple of questions to get you started on understanding the differences in implementation choices one might make between implementing a personalized meme tracker vs. a topic specific memetracker,

Q: How do you deal with "noise" links such as http://del.icio.us/tag/rest or http://www.technorati.com/tag/AJAX which may be common in the feeds the user is interested in?

A: In both cases, it would seem the first step is to hard code the application to understand certain kinds of links as "noise". The interesting question is how to deal with the introduction of new types of "noise" links to the ecosystem. A web-based application may be easily updated as new "noisy" links enter the system but things are a bit more difficult for a desktop application. Perhaps allowing users to nominate certain classes of links as noise?
Q: What 'class' of news items or blog posts should be used in evaluating what is [currently] popular?

A: It is quite obvious that simply using the entirety of the posts from a particular feed to calculate a links popularity is flawed. Using that metric, I suspect that links such as http://adaptivepath.com/publications/essays/archives/000385.php or http://scobleizer.wordpress.com/2006/06/10/correcting-the-record-about-microsoft/ would always be the most popular links from my blogroll. Using a specific date or time range (e.g. over the past 24-48 hours) seems to be what sites such as Techmeme and Tailrank seem to do. An aggregator such as RSS Bandit or FeedDemon may use other techniques such as only using 'unread' items to calculate currently popular topics.
Q: How do you deal with link blogs?

A: A number of people in my blogroll have blog posts that are basically a repost of al the links they have posted to del.icio.us that day (e.g. Stephen O'Grady and Mark Baker). Sites like Techmeme and Tailrank filter these posts because no one wants to see a bunch of headlines that are all of the form 'links for 2006-09-05' with no real content. On the other hand, if a large number of folks in my blogroll are linking to a particular news item then it is likely to be interesting to me regardless of whether there 'meaty' blog posts behind their links or just linkblog style postings.

These are a couple of the queestions that I've been pondering since I started thinking about this feature a couple of months ago. At the end of the day I think that although Gabe's perspective is useful since he did build the site that inspired this thinking, the scenarios are different enough to change some of the implementation choices in ways that may seem surprising to some.

PS: It seems Sam has already turned Gabe's feedback into code based on reading his blog post MeMeme 2.0. There are definitely interesting times ahead.