Monday, 26 November 2007 - Dare Obasanjo's weblog

November 26, 2007

@ 01:56 PM

My weekend project was to read Dive Into Python and learn enough Python to be able to port Sam Ruby's meme tracker (source code) from CPython to Iron Python. Sam's meme tracker, shows the most popular links from the past week from the blogs in his RSS subscriptions. A nice wrinkle Sam added to his code is that more recent posts are weighted higher than older posts. So a newly hot item with 3 or 4 links posted yesterday ends up ranking higher than an item with 6 to 10 posts about it from five days ago. Using that sort of weighting probably wouldn't have occurred to me if I just hacked the feature on my own, so I'm glad I spent the time learning Sam's code.

There are a few differences between Sam's code and mine, the most significant being that I support two modes; showing the most popular items from all unread posts and showing the most popular items from the past week. The other differences mainly have to do with the input types (Atom entries vs. RSS feeds) and using .NET Libraries like System.Xml and System.IO instead of CPython libraries like libxml2 and blob. You can see the difference between both approaches for determining top stories on my feed subscriptions below

Top Stories in all Unread Posts

OpenSocial (score: 5.0)
The Future of Reading (score: 3.0)
MySpace (score: 3.0) [Ed Note: Mostly related to OpenSocial]

As you can probably tell, the weighted scoring isn't used when determining top stories in all unread posts. I did this to ensure that the results didn't end up being to similar for both approaches. This functionality is definitely going to make its way into RSS Bandit now that I've figured out the basic details on how it should work. As much as I'd like to keep this code in Iron Python, I'll probably port it to C# when integrating it for a number of practical reasons including maintainability (Torsten shouldn't have to learn Python as well), performance and better integration into our application.

Working with Python was a joy. I especially loved programming with a REPL. If I had a question about what some code does, it's pretty easy to write a few one or two liners to figure it out. Contrast this with using Web searches, trawling through MSDN documentation or creating a full blown program just to test the out some ideas when using C# and Visual Studio. I felt a lot more productive even though all I was using was Emacs and a DOS prompt.

I expected the hardest part of my project to be getting my wife to tolerate me spending most of the weekend hacking code. That turned out not to be a problem because it didn't take as long as I expected and for the most part we did spend the time together (me on the laptop, her reading The Other Boleyn Girl, both of us on the sofa).

There are at least two things that need some fine tuning. The first is that I get the title of the link from the text of the links used to describe it and that doesn't lead to very useful link text in over half of the cases. After generating the page, there may need to be a step that goes out to the HTML pages and extracts their title elements for use as link text. The second problem is that popular sites like Facebook and Twitter tend to show up every once in a while in the list just because people talk about them so much. This seems to happen less than I expected however, so this may not be a problem in reality.

Now I just have to worry about whether to call the button [Show Popular Stories] or [Show Most Linked Stories]. Thoughts?

import time, sys, re, System, System.IO, System.Globalization
from System import *
from System.IO import *
from System.Globalization import DateTimeStyles
import clr
clr.AddReference("System.Xml")
from System.Xml import *

#################################################################
#
# USAGE: ipy memetracker.py <directory-of-rss-feeds> <mode>
# mode = 0 show most popular links in unread items
# mode = 1 show most popular links from items from the past week
#################################################################

all_links = {}
one_week = TimeSpan(7,0,0,0)

cache_location = r"C:\Documents and Settings\dareo\Local Settings\Application Data\RssBandit\Cache"
href_regex = r"<a[\s]+[^>]*?href[\s]?=[\s\"\']+(.*?)[\"\']+.*?>([^<]+|.*?)?<\/a>"
regex = re.compile(href_regex)

(popular_in_unread, popular_in_past_week) = range(2)
mode = popular_in_past_week

class RssItem:
    """Represents an RSS item"""
    def __init__(self, permalink, title, date, read, outgoing_links):
        self.outgoing_links = outgoing_links
        self.permalink      = permalink
        self.title          = title
        self.date           = date
        self.read           = read

def MakeRssItem(itemnode):
    link_node = itemnode.SelectSingleNode("link")
    permalink = link_node and link_node.InnerText or ''
    title_node = itemnode.SelectSingleNode("title")
    title      = link_node and title_node.InnerText or ''
    date_node = itemnode.SelectSingleNode("pubDate")
    date       = date_node and DateTime.Parse(date_node.InnerText, None, DateTimeStyles.AdjustToUniversal) or DateTime.Now
    read_node = itemnode.SelectSingleNode("//@*[local-name() = 'read']")
    read       = read_node and int(read_node.Value) or 0
    desc_node = itemnode.SelectSingleNode("description")
    # obtain href value and link text pairs
    outgoing   = desc_node and regex.findall(desc_node.InnerText) or []
    outgoing_links = {}
    #ensure we only collect unique href values from entry by replacing list returned by regex with dictionary
    if len(outgoing) > 0:
        for url, linktext in outgoing:
            outgoing_links[url] = linktext
    return RssItem(permalink, title, date, read, outgoing_links)

if __name__ == "__main__":
    if len(sys.argv) > 1: #get directory of RSS feeds
        cache_location = sys.argv[1]
    if len(sys.argv) > 2: # mode = 0 means use only unread items, mode = 1 means use all items from past week
        mode           = int(argv[2]) and popular_in_past_week or popular_in_unread

    print "Processing items from %s seeking items that are %s" % (cache_location,
                                                                  mode and "popular in items from the past week"
                                                                  or "popular in unread items" )
    #decide what filter function to use depending on mode
    filterFunc = mode and (lambda x : (DateTime.Now - x.date) < one_week) or (lambda x : x.read == 0)
    #in mode = 0 each entry linking to an item counts as a vote, in mode = 1 value of vote depends on item age
    voteFunc   = mode and (lambda x: 1.0 - (DateTime.Now.Ticks - x.date.Ticks) * 1.0 / one_week.Ticks) or (lambda x: 1.0)

    di = DirectoryInfo(cache_location)
    for fi in di.GetFiles("*.xml"):
        doc = XmlDocument()
        doc.Load(Path.Combine(cache_location, fi.Name))
        # for each item in feed
        # 1. Get permalink, title, read status and date
        # 2. Get list of outgoing links + link title pairs
        # 3. Convert above to RssItem object
        items = [ MakeRssItem(node) for node in doc.SelectNodes("//item")]
        feedTitle = doc.SelectSingleNode("/rss/channel/title").InnerText
        # apply filter to pick candidate items, then calculate vote for each outgoing url
        for item in filter(filterFunc, items):
            vote = (voteFunc(item), item, feedTitle)
            #add a vote for each of the URLs
            for url in item.outgoing_links.Keys:
                if all_links.get(url) == None:
                    all_links[url] = []
                all_links.get(url).append(vote)

       # tally the votes, only 1 vote counts per feed
    weighted_links = []
    for link, votes in all_links.items():
        site = {}
        for weight, item, feedTitle in votes:
            site[feedTitle] = min(site.get(feedTitle,1), weight)
        weighted_links.append((sum(site.values()), link))
    weighted_links.sort()
    weighted_links.reverse()

    # output the results, choose link text from first item we saw story linked from
    print "<ol>"
    for weight, link in weighted_links[:10]:
        link_text = (all_links.get(link)[0])[1].outgoing_links.get(link)
        print "<li><a href='%s'>%s</a> (%s)" % (link, link_text, weight)
        print "<p>Seen on:"
        print "<ul>"
        for weight, item, feedTitle in all_links.get(link):
            print "<li>%s: <a href='%s'>%s</a></li>" % (feedTitle, item.permalink, item.title)
        print "</ul></p></li>"
    print "</ol>"

Categories: Programming

November 24, 2007

@ 02:28 PM

Comments [15]

Interested in "Python for C# Developers?"

I've been reading Dive Into Python for a couple of days have gotten far enough to tell what the following block of code does if the object variable is an arbitrary Python object

methodList = [method for method in dir(object) if callable(getattr(object, method))]
    processFunc = collapse and (lambda s: " ".join(s.split())) or (lambda s: s)
    print "\n".join(["%s %s" %
                      (method.ljust(spacing),
                       processFunc(str(getattr(object, method).__doc__)))
                     for method in methodList])

However I've been having a tougher time than I expected with learning Python using Dive Into Python for a couple of reasons, some to do with how the book is laid out and some to do with my choice of IronPython as my Python implementation of choice. The main problems I've been having are

IronPython doesn't have a number of standard libraries that are used in the book such as os, xml.dom, UserDict, sgmllib and so on. This means I can't construct or run a number of the more advanced samples in the book.
The book seems to dive into meaty topics before covering the basics. For example, introspection (aka reflection) and lambda functions (aka anonymous methods) are covered in Chapter 4 while defining classes and importing modules is covered in Chapter 5. Similarly, a discussion on for loops seems to have been removed from the most recent version of the book and replaced with a discussion on list comprehensions which are definitely superior but more alien to the novice.
I tend to learn comparatively. Thus I would love to get a more direct mapping of constructs in C# to those in Python with discussions on Python features that don't exist in C#. This is the tack I took with C# from a Java Developers perspective and it seems a lot of people found that useful.

For these reasons, I thought it would be useful to create a set of tutorials for myself that would address some of these issues I've been having. Then I wondered if other C# developers wouldn't find such an article or tutorial useful as well. Are there any C# developers out there that would actually be interested in this or am I an edge case?

PS: I wouldn't expect to get time to actually writing, editing and publishing such tutorials until the summer of next year at the earliest (which I expect should coincide with the availability of IronPython 2.0).

Categories: Programming

November 21, 2007

@ 04:00 AM

Comments [5]

The Joy of Programming: Learning Python and Simplifying Life

I’ve been working on simplifying my life and improving on my mental state over the last year or so. I’m now at the stage where I think I’ve gotten into a decent routine with diet and excercise. My next step [now that the rigors of buying the house and planning the wedding are over] is to broaden my programming horizons by learning a radically different programming language from my comfort zone, Python and C# respectively, while not harming my personal life or work habits.

It turns out that I can add an hour or two to my day by (i) leaving home earlier and thus avoiding traffic (ii) reading blogs less (iii) unsubscribing from most of the Microsoft internal mailing lists I was on and (iv) scheduling meetings so they are clumped together instead of having three meetings with 30 minutes in between each one thus burning up an hour of my time mentally twiddling my thumbs and checking email.

So far I’ve installed IronPython and python-mode. I’ve also started reading Dive into Python and have gotten as far as Chapter 3. I’d just like to thank folks like Mark Pilgrim, Jim Hugunin and Barry Warsaw who are gifting programmers with such wonderful resources. Right now I’m still trying to wrap my mind around Everything is An Object

Everything in Python is an object, and almost everything has attributes and methods.

This is so important that I'm going to repeat it in case you missed it the first few times: everything in Python is an object. Strings are objects. Lists are objects. Functions are objects. Even modules are objects.

All functions have a built-in attribute __doc__, which returns the doc string defined in the function's source code. The sys module is an object which has (among other things) an attribute called path. And so forth.

So far this is already an enjoyable experience for someone who has mostly been programming in Javascript (not object oriented, dynamic but weakly typed) and C# (statically typed, no REPL) for the past few years.

Once I’m done reading Dive into Python, my plan is to integrate Sam Ruby’s MeMeme 2.0 into RSS Bandit. That way even though I’ve stopped reading blogs regularly, I don’t end up finding out days later that Visual Studio 2008 and .NET Framework 3.5 were released because it wasn’t on TechMeme or programming.reddit.

Optimizing your life by writing code is fun. I guess this is what they call life hacking. Wink

Now playing: The Clash - Rock The Casbah

Categories: Programming

November 19, 2007

@ 03:21 PM

Comments [1]

What's Right and Wrong with Code Generation in Web Services

Via Steve Vinoski's Answers for Sergey I stumbled upon Sergey Beryozkin's Questions for Steve which started of with the following question

1. Do you think client code generation is evil ? If yes, do you expect people to do manual programming on a large scale ?

The problem with the software industry [or should I say humanity?] is that we like to take absolutist positions because they are easier to defend or argue against than admitting that the world is full of shades of gray.

Starting with the first question, code generation isn't necessarily bad let alone evil. However there are lots of problems with how code generation is implemented by the major vendors that support the SOAP/WSDL/XSD/WS-* family of technologies. Steve does a good job of laying out the problems with these approaches.

The first problem Steve points out is that a lot of these SOAP toolkits have implement some form of location transparency which tries to hide as much as possible the differences between invoking a remote system and calling a method on a local object. This behavior even flies in the face of SOA since one of the four tenets of service orientation is that boundaries are explicit. Another problem is that the inflexible and rigid requirements of static typing systems runs counter to the distributed and flexible nature of the Web. I posted a practical example a few years ago in my post entitled Why You Should Avoid Using Enumerated Types in XML Web Services. In that example, I pointed out that if you have a SOAP Web Service that returns an enumeration with the possible value {CDF, RSS10, RSS20} and in a future release modify that enumeration by adding a new syndication format {CDF, RSS10, RSS20, Atom} then even if you never return that syndication format to old clients written in .NET, these clients will still have to be recompiled because of the introduction of a new enumeration value. I find it pretty ridiculous that till today I have list of "people we need to call and tell to recompile their code whenever we change an enum value in any of our SOAP Web Services". Of course, some of this foolishness can be mitigated in a statically typed system using technologies like Java's dynamic class loading but that is really just an insecure workaround that tries to hide the fact that what you really need here is a dynamically typed system.

The second question is really asking whether we want developers writing XML processing code by hand instead of having a code generator do this work. Even though I used to work on the XML team at Microsoft, I do agree that it is a valid concern that you shouldn't want to spend a lot of effort writing code for parsing and processing XML if that is not the core function of your application. Again, Steve Vinoski hits the nail on the head with the suggestion to use standard data formats and MIME types. For example, if I decide to use application/atom+xml MIME type for the data that is returned by my RESTful Web service then clients can choose from a vast array of libraries for processing Atom feeds [such as the Windows RSS platform, ROME, Mark Pilgrim's Universal Feed Parser, etc] without having to write a lot of XML parsing code. If you must provide your own custom formats then it is imperative to make sure that it is easy to consume these formats from any platform by using a consistent and simple data model for the data format. A number of popular Web service APIs like the Flickr API, and the Facebook platform have provided client libraries for their APIs, this should be considered the exception and not the rule. Even in their case, it is interesting to note that a large proportion of the client libraries for these services are not actually maintained or developed by the creators of the service. This highlights the value of utilizing simple data formats and straightforward protocols. That way it isn't actually a massive undertaking for client developers to build and share libraries that abstract away the XML processing code. Of course, all of this can be avoided by just using standard MIME types and data formats that are already supported on a wide array of platforms instead of reinventing the wheel.

Categories: XML Web Services

November 19, 2007

@ 01:46 PM

Comments [5]

Guidelines for Building RESTful Web Services

In a comment on my previous post, pwb states

"REST" still suffers mightily from no real guidance on how exactly to construct something REST-ful. A dissertation doesn't cut it.

I guess it depends on what is meant by "real guidance". I can argue that there is no real guidance on how to construct object oriented systems, distributed applications or high performance data storage systems. Whether you agree or disagree with any of those statements depends on how "official" or "canonical" one expects such guidance to be.

If you are interested in building RESTful Web services. Here are the top 3 ways I suggest one learn about building and designing a RESTful system.

Learn the Atom Publishing Protocol: The best way to figure out how to build a RESTFul Web service is to actually use a well designed one and none is a better example of what a RESTful Web service should be than the Atom Publishing Protocol. For extra credit, you should also read up on Google's GData protocol and come up with a list of pros and cons of the approaches they've taken to extending AtomPub.
Read Joe Gregorio's "Restful Web" column on XML.com: Joe Gregorio is one of the editors of RFC 5023 (the Atom Publishing protocol) and is currently employed at Google working to make GData even more Web friendly. He started a series of articles on XML.com on building and designing RESTFul Web services complete with code samples. The entire list of articles can be found here but if you don't have the time to read them all, I suggest starting with How to Create a REST Protocol which covers the four decisions you must make as part of the design process for your service
- What are the URIs? [Ed note - This is actually "What are the resources?" ]
- What's the format?
- What methods are supported at each URI?
- What status codes could be returned?
If you have more time I suggest following that article up with Constructing or Traversing URIs? which contrasts client access models based on URI construction [where clients have baked into them knowledge of the URI structure of your service] or by URI traversal [where clients discover resources by following links either within the primary resources your service returns or via a service document that describes your service's end points]. Just because WSDL is a disaster doesn't mean that interface definition languages aren't useful or that they aren't needed in RESTful applications. Afterall even Atom has service documents.

And finally, because I don't believe you get to design a system without being familiar with what the code looks like and should do, I'd suggest reading Dispatching in a REST Protocol Application which walks through what the server side code for a particular RESTful service looks like. He even throws in some performance optimizations at the end.

Of course, I know you're actually going to read all the articles in the series because you're a dutiful lil' Web head and not shortcut seeker. Right? :)
Buy "RESTful Web Services" by Leonard Richardson, Sam Ruby, and David Heinemeier Hansson: If you are the kind of person who prefers to have a book than learning from "a bunch of free articles on the Internet" then RESTful Web Services is for you. I haven't read the book but have seen it on the desks of more co-workers than I'd care to admit and each of them favorably recommended it. Sam Ruby, Leonard Richardson and DHH know their stuff so you'll be learning at the feet of gurus.

To me, this seems like an abundance of resources for learning about building RESTful Web services. Now I understand that there are some for whom until it shows up on MSDN, IBM Developer Works or in a Gartner Report it might as well not exist. To these people all I can say is that "It must be great to have all that free time now that you have outsourced your thinking and business critical analysis to someone else". :)

Categories: XML Web Services

November 15, 2007

@ 12:43 PM

Comments [13]

WS-* is to REST as Theory is to Practice

Recently someone asked in the comments to one of my posts why I seem to be down on the WS-* family of technologies (XSD, WSDL, SOAP, etc) when just a few years ago I worked on Microsoft’s implementations of some of these technologies and used to blog about them extensively.

I was composing a response when I stumbled on James Snell’s notes on the recent QCon conference that captures the spirit of my “conversion” if you want to call it that. He wrote

Those who are familiar with my history with IBM should know that I was once a *major* proponent of the WS-* approach. I was one of the original members of the IBM Emerging Technologies Toolkit team, I wrote so many articles on the subject during my first year with IBM that I was able to pay a down payment on my house without touching a dime of savings or regular paycheck, and I was involved in most of the internal efforts to design and prototype nearly all of the WS-* specifications. However, over the last two years I haven’t written a single line of code that has anything to do with WS-*. The reason for this change is simple: when I was working on WS-*, I never once worked on an application that solved a real business need. Everything I wrote back then were demos. Now that I’m working for IBM’s WebAhead group, building and supporting applications that are being used by tens of thousands of my fellow IBMers, I haven’t come across a single use case where WS-* would be a suitable fit. In contrast, during that same period of time, I’ve implemented no fewer than 10 Atom Publishing Protocol implementations, have helped a number of IBM products implement Atom and Atompub support, published thousands of Atom feeds within the firewall, etc. In every application we’re working on, there is an obvious need to apply the fundamental principles of the REST architectural style. The applications I build today are fundamentally based on HTTP, XML, Atom, JSON and XHTML.

My movement towards embracing building RESTful Web services from being a WS-* advocate is based on my experiences as someone who worked on the fundamental building blocks of these technologies and then as someone who became a user of these technologies when I moved to ~~MSN~~ Windows Live. The seeds were probably sown when I found myself writing code to convert Microsoft’s GetTopDownloads Web service to an RSS feed because the SOAP Web service was more complicated to deal with and less useful than an RSS feed. Later on I realized that RSS was the quintessential RESTful Web service and just asking people “How many RSS feeds does Microsoft produce?” versus how many SOAP endpoints does Microsoft expose is illuminating in itself.

Since then we’ve reached a world where thousands of applications being utilized by millions of end users are built on RESTful Web services on the public internet. My favorite example of the moment is the Facebook developer platform and before that it was Flickr and Amazon S3. Compare that with the number of SOAP and WS-* interfaces that are being used to build real developer platforms that benefit end users on the Web today.

Earlier today, I was contemplating Don Box’s recent post where he complained about the diversity of authentication schemes of various RESTful Web services from the “J. Random Facebook/Flickr/GData” services on the Web today. Don seems to hint that WS-Security/WS-Trust would somehow solve this problem which is rightfully debunked by Sam Ruby who points out that all those technologies do is give you a more complicated version of the extensible authentication story that is available in HTTP. So the only real issue here is that there are actually enough RESTful Web services on the Internet for Don Box to complain about the diversity that comes from having a flexible authentication model for Web services. On the other hand, there are so few useful public WS-* Web services on the Web (read: zero) that Don Box hadn’t encountered the same problem with WS-Security/WS-Trust since no one is actually using them.

At this point I realize I’m flogging a dead horse. The folks I know from across the industry who have to build large scale Web services on the Web today at Google, Yahoo!, Facebook, Windows Live, Amazon, etc are using RESTful Web services. The only times I encounter someone with good things to say about WS-* is if it is their job to pimp these technologies or they have already “invested” in WS-* and want to defend that investment.

At the end of the day, my job is to enable successful developer platforms that enrich our users’ lives not pimp a particular technology. So if you are one of my readers and were wondering what was behind my shift from thinking that WS-* related technologies were the cat’s pajamas and my current RESTful Web services/APP/GData/Web3S bent, now you know.

Enjoy.

Now playing: Brooke Valentine - Girlfight (feat. Big Boi & Lil Jon)

Categories: XML Web Services

November 14, 2007

@ 01:57 PM

Comments [12]

Wedding Day Pictures

I’ve finally gotten around to uploading a couple of pictures from our wedding day in Las Vegas. Below are a few of the pictures I liked the most. Click on them to see more pics.

Wedding Favors	Guess Where?
Bride, Maid of Honor and Mom	Groom, Best Man and the Non-Conformist
It was a really nice day	Holding hands

The photographers took several hundred pictures and we’ve sifted through less than half of them. Since it’s taken us so long just to pick out this two dozen or so pictures I though if we waited much longer I’d be posting the wedding pics on around our first or second anniversary. Smile

Now playing: Jay-Z - What More Can I Say

Categories: Personal

November 12, 2007

@ 04:01 PM

Comments [14]

OpenID + OAuth is the Final Nail in the Coffin of the WS-* vs. REST Discussion

Although the choice of whether to pick between WS-* and REST when deciding to build services on the Web seems like a foregone conclusion, there seems to be one or two arguments on the WS-* that refuse to die. You can find a them in the notes on the talk by Sanjiva Weerawarana at QCon, WS-* vs. REST: Mashing up the Truth from Facts, Myths and Lies

history: why were WS created? people were doing XML over HTTP in 1998/1999
everyone invented their own way to do security, reliability, transactions, … (e.g. RosettaNet, ebXML)
Biggest criticism of SOAP in 2000: lack of security
REST-* is on its way - ARGH!

Today you can find other members of the Web Services community echoing some of Sanjiva’s points. You have Don Box in his blog post entitled Yes Steve, I've Tried saying

I wouldn't call myself an advocate for any specific technology (ducks), but I've spent a lot of time doing HTTP stuff, including a recent tour of duty to help out on our .NET 3.5 support for REST in WCF.

I have to say that the authentication story blows chunks.

Having to hand-roll yet another “negotiate session key/sign URL” library for J. Random Facebook/Flickr/GData clone doesn't scale.

and even Sam Ruby adds his voice in agreement with his post Out of the Frying Pan where he writes

I’d suggest that the root problem here has nothing to to with HTTP or SOAP, but rather that the owners and operators of properties such as Facebook, Flickr, and GData have vested interests that need to be considered.

For once I have to agree with Sanjiva and disagree with Sam and Don. The folks at Google, Yahoo! and a bunch of the other Silicon Valley startups realize that having umpteen different application interfaces, authentication and authorization stories is a pain for developers build mashups, widgets and full blown Web applications. The answer isn’t as Don argues that we all jump on WS-* or as Sam suggests that Web companies have a vested interest keeping the situation fragmented so we have to live with it.

In fact, we are already on the road to REST-* as a way to address this problem. What happens when you put together AtomPub/GData, OpenID, OAuth and OpenSocial? Sounds a lot like the same sort of vision Microsoft was pitching earlier in the decade, except this time it is built on a sturdier foundation [not crap like SOAP, WSDL and XSD] and is being worked on collaboratively by members of the Web community instead of a bunch of middleware vendors.

It’s unsurprising that Don and Sam don’t realize this is occuring given that their employers (Microsoft and IBM respectively) are out of the loop on most of this evolution which is primarily being by driven by Google and it’s coalition of the willing. Then again, it does remind me of how IBM and Microsoft pulled the same thing on the industry with WS-*. I guess turnabout is fair play. Wink

Now playing: D12 - American Psycho

Categories: Web Development | XML Web Services

November 11, 2007

@ 03:20 PM

Comments [6]

SocialAds Implications: Facebook will Acquire iLike and Flixster

I've been pondering the implications of Facebook's SocialAds announcement and it has created some interesting lines of thought. The moment the pin dropped was when Dave Winer linked to one of his old posts that contains the following money quote

that's when the whole idea of advertising will go poof, will disappear. If it's perfectly targeted, it isn't advertising, it's information. Information is welcome, advertising is offensive.

If you think about it, the reason Google makes so much money from search advertising is because the ads are particularly relevant when a user is seeking information or a trusted opinion as part of the process to make a commercial decision. If I'm searching for "iPod" or "car insurance" then it is quite likely that ads selling me these products are relevant to my search and are actually considered to be information instead of intrusive advertising.

Where Google's model breaks down is that a large amount of the advertising out there is intended to make you want to buy crap that you weren't even interested in until you saw the ads. In addition, trusted recommendations are a powerful way to convince customers to make purchases they were otherwise not considering. Former Amazon employee Greg Linden has written blog posts that indicate that 20% - 35% of Amazon's sales comes from recommendations like "people who like 50 Cent also like G-Unit". Given that Amazon made over 10 billion dollars in revenue last year (see financials), this means that $2 billion to $3.5 billion of that revenue is based on what Facebook is calling "social" ads.

So what does all this have to do with the title of my blog post? Glad you asked. Recently Yaron and I were chatting about the virtues of the Facebook platform. He argued that the fact that applications are encouraged to keep their data within their own silos (e.g. Flixster isn't supposed to be mucking with my iLike data and vice versa) prevents everyone [including Facebook] from benefiting from all this profile data being created from alternate sources. I argued that seeing the complexities introduced by having multiple applications being able to write to the same data store (e.g. the Windows registry) it's a lot better for users and app developers if they don't have to worry that some half baked app written by some drunken college kid is going to hose their Scrabulous scores or corrupt all their movie ratings.

However what this means is that some of the juiciest data to serve "social" ads against within Facebook (i.e. movies and music) is not in Facebook's databases but in the databases of the developers of Facebook applications like Slide, iLike and Flixster. Considering the following entry that shows up in my friends news feeds after I performed an action in iLike ,

This entry could be improved with "social" ads in a way that is informative and valuable to my friends while also providing financial value to the application developer. For instance, would you consider the following changes to that entry to be advertising or information?

Flixster does an even worse job than iLike in making the actions they show in my news feed to be both useful and monetizable. Here's the kind of stuff that shows up in my news feed from Flixster

I don't know about you but I consider this spam. In fact, it is also misleading since what it really means is that someone on my friends list (Steve Gordon) has also installed the Flixster application on their profile. However what if the application actually published some of my movie ratings into the news feed with more context such as

People keep asking how Facebook application developers will make money. From where I'm sitting, this looks like a freaking gold mine. The problem seems to be that these applications either haven't yet figured out how lucrative a position they're in or are still in the audience acquisition phase until they flip to the highest bidder.

If Mark Zuckerburg has any strategic bone in his body, he'd snap up these companies before a hostile competitor like Google or Fox Interactive Media does. I'd put money on it that people are slowly realizing this all over Silicon Valley.

What do you think?

Categories: Competitors/Web Companies | Social Software

November 8, 2007

@ 05:39 PM

Comments [4]

Embed Windows Live Messenger Anywhere on the Web

When I first saw the Meebo Me widget, I thought it was one of the coolest things I’d ever seen on the Web. I immediately went to chat with some folks on our team and the response was that they were already way ahead of me. After a bunch of hard work, I’m glad to say that you can now embed the world’s most popular IM client into any Web page [including your blog or favorite social networking site] and let anyone who’s visiting that page chat with you while you’re online.

More details can be found in Casey’s post on the Windows Live Messenger team’s blog entitled Who wants IMs from the web? I do! I do! where she writes

The Windows Live™ Messenger IM Control lets people on the Web reach you in Messenger by showing your Messenger status on your web site, blog, or social networking profile. The Windows Live™ Messenger IM Control runs in the browser and lets site visitors message you without installing Messenger first. The IM Control is supported in IE6, IE7, and Firefox 2.0 on Windows and Firefox 2.0 on Mac OS. The IM Control is supported in 32 languages.

This is a nice addition to the IM button functionality announced in Ali's post. An important difference between the two is that the new Windows Live™ Messenger IM Control allows people to send you IMs without installing Windows Live™ Messenger, and the IM button requires that they have it installed and are logged in.

I’ve already thrown it up on my Windows Live Space at http://carnage4life.spaces.live.com so anyone who wants to chat with me in real time can holla at me without having to install any bits. I expect it won’t be long before someone figures out how to port it to the Facebook platform which is something I’d love to see. I’d do it myself but I have RSS Bandit feature planning to work on in my free time. Smile

To prevent IM spam (aka SPIM), there is a Human Interactive Proof (HIP) challenge before a conversation can be initiated from the Web. For users concerned about privacy and wondering if anyone can just copy & paste some HTML, change some values and then spam you from the Web…rest assured this has been considered. In order for your online presence to be detected or IM conversations begun from the Web, you first have to turn on this feature. Safe defaults and making sure our users are always in control of their Web experience is key.

So what are you waiting for? Come over and say hello.

Now playing: Jodeci - Come & Talk To Me

Categories: Windows Live

Dare Obasanjo's weblog

"You can buy cars but you can't buy respect in the hood" - Curtis Jackson

Navigation for Monday, 26 November 2007 - Dare Obasanjo's weblog