Monday, 01 October 2007 - Dare Obasanjo's weblog

September 20, 2007

@ 04:00 AM

RSS Bandit v1.5.0.17 Installer Refreshed

I'd like to say big "Thank You" to all the folks who tried out the recent release of RSS Bandit. Based on feedback from the RSS Bandit forums, it looks like so far the release was solid except for two regressons and an oversight on my part.

These issues have been fixed and we just uploaded an updated installer which can be obtained at RssBandit1.5.0.17b_installer.zip.

The issues fixed by this refresh are listed below

Annoying dialog box pops up instead of yellow warning bar when ActiveX warning is displayed. (bug 1796850)
Difference between unread count on MyFeeds node and "Unread Items" folder, (bug 1796849)
Folder hierarchy not uploaded when uploading feed list to Newsgator Online
Feed upload to Newsgator Online stops with “InvalidUrl error” if a URL that cannot be processed by Newsgator Online is encountered (e.g. intranet URLs, local file system URLs, etc).

I’m disconnecting from the grid starting tomorrow morning and shouldn’t be back online until October 1^st. Try not to get into too much trouble while I’m gone. Wink

Now playing: G-Unit - I Wanna Get to Know You

Categories: RSS Bandit

September 19, 2007

@ 04:00 AM

Comments [0]

Microsoft REST APIs: Astoria = Web3S + Atom - RDF

I’ve mentioned in previous posts that various folks at Microsoft have come to grips with the fact that RESTful Web services are the best way to expose data sources on the Web. One problem I’ve voiced is that we may forget that REST is about uniform interfaces and end up with half a dozen different Microsoft protocols for doing essentially the same thing. This seemed to be the case when you consider Project Astoria and Web3S, both of which are designed for creating, retrieving, updating and deleting relational or not so relational data via a uniform interface over the Web. I’ve written about both projects in the past, see Google Base Data API vs. Astoria: Two Approaches to SQL-like Queries in a RESTful Protocol and Web3S: A RESTful Protocol for Accessing Windows Live Services if you’d like an overview of both technologies.

However, thanks to enterprising folks like Yaron and Pablo on both sides there is a much better story coming from Microsoft with regards to RESTful protocols which should please even the Atom contingent. The details are in Pablo’s post Astoria Design: payload formats which contains lots of juicy nuggets.

Let’s begin.

Pablo writes

The goal of Astoria is to make data available to loosely coupled systems for querying and manipulation. In order to do that we need to use protocols that define the interaction model between the producer and the consumer of that data, and of course we have to serialize the data in some form that all the involved parties understand. So protocols and formats are an important topic in our design process.
…
Multiple formats, one protocol (almost)

For the most part there is a single “protocol”, and by that I mean the set of HTTP headers for requests and responses, as well as the overall interaction model. In certain cases in order to make a format really look natural to clients we do need to introduce a format-specific protocol element, but we try to keep those to a minimum.

Also, any added protocol elements on top of HTTP need to be done so that unsophisticated agents can ignore a lot of that, do “plain HTTP” and still get by for the most part.
…
Now, with the almost-single protocol in place, the question comes to which formats should we do. Right now we’re thinking ATOM/APP, Web3S, and JSON. We need to define the basic requirements for any format used in Astoria, and then map those to each format we want to support. That’s what comes next.

What this means in practice is that Astoria defines the protocol semantics while Web3S will define the data format specific semantics. Even more interesting is that services that utilize Astoria will be able to take advantage of any client applications or libraries that support the Atom Publishing Protocol as long as they aren’t in reality tied to a proprietary implementation of APP such as GData (e.g. Windows Live Writer).

Pablo’s post goes on to talk about the data model used by Astoria and how it is mapped to Atom, JSON and Web3S respectively. He also calls for feedback from the community, so if you are interested in Microsoft’s implementation of RESTful protocols either as a developer customer or an interested observer…let Pablo know in the comments to his blog. There are lots of folks at Microsoft who’d love to hear what y’all have to say.

Before I forget, Pablo did have this to day about their RDF support.

What happened with RDF?

The May 2007 CTP also included support for RDF. While we got positive comments about the fact we supported it, we didn’t see any early user actually using it and we haven’t seen a particular popular scenario where RDF was a must-have. So we are thinking that we may not include RDF as a format in the first release of Astoria, and focus on the other 3 formats (which are already a bunch from the development/testing perspective).

My personal take is that while I understand how RDF fits in the picture of the semantic web and related tools, the semantic web goes well beyond a particular format. The point is to have well-defined, derivable semantics from services. I believe that Astoria does this independently of the format being used.

For some reason, I'm not surprised about this decision. I do wonder if dropping RDF will actually bring to light some closet RDF supporters who'd love to see supported in Astoria?

Now playing: N.W.A. - Appetite For Destruction

Categories: Windows Live | XML Web Services

September 19, 2007

@ 04:00 AM

Comments [3]

How Facebook Makes Identity Theft Easier

Last month there was a press release published by Sophos, an IT security company, with the tantalzing title Sophos Facebook ID probe shows 41% of users happy to reveal all to potential identity thieves which reports the following

The Sophos Facebook ID Probe involved creating a fabricated Facebook profile before sending out friend requests* to individuals chosen at random from across the globe.
...
Sophos Facebook ID Probe findings:

87 of the 200 Facebook users contacted responded to Freddi, with 82 leaking personal information (41% of those approached)

72% of respondents divulged one or more email address

84% of respondents listed their full date of birth

87% of respondents provided details about their education or workplace

78% of respondents listed their current address or location

23% of respondents listed their current phone number

26% of respondents provided their instant messaging screenname

In the majority of cases, Freddi was able to gain access to respondents' photos of family and friends, information about likes/dislikes, hobbies, employer details and other personal facts. In addition, many users also disclosed the names of their spouses or partners, several included their complete résumés, while one user even divulged his mother's maiden name - information often requested by websites in order to retrieve account details.

This is another example of how Facebook needs to be better at managing multiple social contexts. Right now, there is no way for me to alter my privacy settings to prevent people who I’ve added to my “friends list” from seeing my personal information. The thing is my “friends list” comprised of more than just friends. It is comprised of co-workers, people who work at the same company, people I went to high school with, and close personal friends. There’s also the category of “people who read my blog or use RSS Bandit” that I generally tend to decline friend requests from. I don’t mind some of these people being able to access my personal information (e.g. cell phone number, email address, birthday, etc) but clearly I also don’t want every random person who reads my blog that wants to be my “friend” on Facebook to have access to this information.

Is there a better way to do this? Below are screenshots of the permissions model we came up with for Profiles on MSN Spaces when I worked on the feature juxtaposed with the Profile permissions options on Facebook.

Facebook

Windows Live Spaces
Profile privacy settings on Windows Live Spaces

Straightforward isn’t it? I suspect that the problem here is that the folks at Facebook are refusing to acknowledge that their user base is changing now that they’ve opened up. As danah boyd writes in her post SNS visibility norms (a response to Scoble)

Facebook differentiated itself by being private, often irritatingly so. Hell, in the beginning Harvard kids couldn't interact with their friends at Yale, but that quickly changed. Teens and their parents worship Facebook for its privacy structures, often not realizing that joining the "Los Angeles" network is not exactly private. For college students and high school students, the school and location network are really meaningful and totally viable structural boundaries for sociability. Yet, the 25+ crowd doesn't really live in the same network boundaries. I'm constantly shifting between LA and SF as my city network. When I interview teens, 80%+ of their FB network is from their high school. Only 8% of my network is from Berkeley and the largest network (San Francisco) only comprises 17% of my network. Networks don't work for highly-mobile 25+ crowd because they don't live in pre-defined networks. (For once, I'm an example!)
...
I don't really understand why Facebook decided to make public search opt-out. OK, I do get it, but I don't like it. Those who want to be PUBLIC are more likely to change settings than those who chose Facebook for its perceived privacy. Why did Facebook go from default-to-privacy-protection to default-to-exposure? I guess I know the answer to this... it's all about philosophy.

The first excerpt illustrates the point well. Facebook worked well as a social tool in the rigid social contexts of high school and college but completely breaks down when you’re all grown up. Of course, the Facebook folks know this is an issue for some of their users. However it may be a “problem” that they consider to be By Design and not a bug.

The second excerpt is there because I’m surprised that danah is unsure about why Facebook profiles will now appear in search results. There are a lot of people for whom their social network profile is their primary or only online presence. Even for me, besides my blog(s), my Facebook profile is the only online identity Web which I keep updated regularly. It totally makes sense for Facebook to capitalize on this by making it so that everytime you search for a person whose primary presence is on their site, you get an ad to join their service [since only the fact that the person has a Facebook profile is exposed]. In addition, if you want to contact the person directly, you’re a lot better off joining Facebook and sending the person a private message than posting a comment on their blog [if they have one] or hoping that they’ve exposed their email address somewhere on the Web that isn’t their profile.

Update: The ability to expose a Limited Profile does render moot a lot of the points I just raised above. However making it a separate option from the privacy settings for the profile and incorrectly stating that your friends can always see your contact information makes it less likely to be used by users who are concerned about their privacy. Another example of a design flaw that is likely considered to be By Design according to the Facebook team.

Now playing: Metallica - The Unforgiven

Categories: Competitors/Web Companies | Social Software

September 18, 2007

@ 04:00 AM

Comments [1]

Message Queues, Denormalization and Scalability

There have been a number of recent posts on message queues and publish/subscribe systems in a couple of the blogs I read. This has been rather fortuitous since I recently started taking a look at the message queuing and publish/subscribe infrastructure that underlies some of our services at Windows Live. It’s been interesting just comparing how different my perspective and those of the folks I’ve been working with are from the SOA/REST blogging set.

Here are three blog posts that caught my eye all around the same time with relevant bits excerpted.

Tim Bray in his post Atom and Pushing and Pulling and Buffering writes

Start with Mike Herrick’s Pub/Sub vs. Atom & AtomPub?, which has lots of useful links. I thought Bill de hÓra’s Rates of decay: XMPP Push, HTTP Pull was especially interesting on the “Push” side of the equation. ¶

But they’re both missing an important point, I think. There are two problems with Push that don’t arise in Pull: First, what happens when the other end gets overrun, and you don’t want to lose things? And second, what happens when all of a sudden you have a huge number of clients wanting to be pushed to? This is important, because Push is what you’d like in a lot of apps. So it doesn’t seem that important to me whether you push with XMPP or something else.

The obvious solution is a buffer, which would probably take the form of a message queue, for example AMQP.

The problems Tim points out are a restating of the same problems you encounter in the Push or HTTP-based polling model favored by RSS and Atom; First, what happens when the client goes down and doesn’t poll the server for a while but you don’t want to lose things? Secondly, what happens when a large number of clients are pilling and server can’t handle the rate of requests?

The same way there are answers to these questions in the Pull model, so also are there answers in the Push model. The answer to the Tim’s first question is for the server to use a message queue to persist messages that haven’t been delivered and also note the error rate of the subscriber so it can determine whether to quit or wait longer before retrying. The are no magic answers for the second problem. In a Push model, the server gets more of a chance of staying alive because it controls the rate at which it delivers messages. It may deliver them later than expected but it can still do so. In the Pull model, the server just gets overwhelmed.

In a response to Tim’s post, Bill de hÓra writes in his post entitled XMPP matters

I'd take XMPP as a messaging backbone over AMQP, which Tim mentions because he's thinking about buffering message backlog, something that AMQP calls out specifically. And as he said - "I’ve been expecting message-queuing to heat up since late 2003." AMQP is a complicated, single purpose protocol, and history suggests that simple general purpose protocols get bent to fit, and win out. Tim knows this trend. So here's my current thinking, which hasn't changed in a long while - long haul, over-Internet MQ will happen over XMPP or Atompub. That said, AMQP is probably better than the WS-* canon of RM specs; and you can probably bind it to XMPP or adopt its best ideas. Plus, I'm all for using bugtrackers :)

So I think XMPP matters insofar as these issues are touched on, either in the core design or in supporting XEPs, or reading protocol tea leaves. XMPP's potential is huge; I suspect it will crush everything in sight just like HTTP did.

What I’ve found interesting about these posts is the focus on the protocol used by the publish/subscribe or message queue system instead of the features and scenarios. Even though I’m somewhat of an XML protocol geek, when it comes to this space I focus on features. It’s like source control, people don’t argue about the relative merits of the CVS, Subversion or Git wire protocols. They argue about the actual features and usage models of these systems.

In this case, message buffering or message queuing is one of the most important features I want in a publish/subscribe system. Of course, I’m probably cheating a little here because sometimes I think message queues are pretty important even if you aren’t really publishing or subscribing. For example, the ultimate architecture is one where you have a few hyper denormalized objects in memory which are updated quickly and when updated you place the database write operations in a message queue [to preserve order, etc] to be performed asynchronously or during off peak hours to smooth out the spikes in activity graphs.

Stefan Tilkov has a rather dreadful post entitled Scaling Messaging contains the following excerpt

Patrick may well be right — I don’t know enough about either AMQP or XMPP to credibly defend my gut feeling. One of my motivations, though, was that XMPP is based on XML, while AMQP (AFAIK) is binary. This suggests to me that AMQP will probably outperform XMPP for any given scenario — at the cost of interoperability (e.g. with regard to i18n). So AMQP might be a better choice if you control both ends of the wire (in this case, both ends of the message queue), while XMPP might be better to support looser coupling.

This calls to mind the quote, it is better to remain silent..yadda, yadda, yadda. You’d have thought that the REST vs. SOAP debates would have put to rest the pointlessness that is dividing up protocols based on “enterprise” vs. “Web” or even worse binary vs. XML. I expect better from Stefan.

As for me, I’ve been learning more about SQL Service Broker (Microsoft’s message queuing infrastructure built into SQL Server 2005) from articles like Architecting Service Broker Applications which gives a good idea of the kind of concerns I’ve grown to find fascinating. For example

Queues

One of the fundamental features of Service Broker is a queue as a native database object. Most large database applications I have worked with use one or more tables as queues. An application puts something that it doesn't want to deal with right now into a table, and at some point either the original application or another application reads the queue and handles what's in it. A good example of this is a stock-trading application. The trades have to happen with subsecond response time, or money can be lost and SEC rules violated, but all the work to finish the trade—transferring shares, exchanging money, billing clients, paying commissions, and so on—can happen later. This "back office" work can be processed by putting the required information in a queue. The trading application is then free to handle the next trade, and the settlement will happen when the system has some spare cycles. It's critical that once the trade transaction is committed, the settlement information is not lost, because the trade isn't complete until settlement is complete. That's why the queue has to be in a database. If the settlement information were put into a memory queue or a file, a system crash could result in a lost trade. The queue also must be transactional, because the settlement must be either completed or rolled back and started over. A real settlement system would probably use several queues, so that each part of the settlement activity could proceed at its own pace and in parallel.

The chapter on denormalizing your database in “Scaling Your Website 101” should have a pointer to the chapter on message queuing so you can learn about one of the options when formulating a strategy for dealing with the update anomalies that come with denormalization.

Now playing: Neptunes - Light Your Ass On Fire (feat. Busta Rhymes)

Categories: Web Development

September 17, 2007

@ 04:50 AM

Comments [16]

RSS Bandit + NewsGator Online: Your Feeds on the Desktop and on the Web

In a recent comment to one of my blog post Daniele Muscetta lists some of the reasons he no longer uses RSS Bandit as much as he used to...

I have been using it less and less this past year:
- at the beginning because of the troubles having it run on Vista,
- then because I really like the possibility to have my feeds everywhere instead than on one PC (I use multiple computers)
- and finally because with a web based aggregator I can immediately SHARE and MARK what I read and make it available again for other people to see. This one is a killer feature of the various google reader or bloglines... letting people make their own link blog republishing stuff they read and like. Of course I am not complaining... to do such a thing you would need an infrastructure - as opposed to just releasing a program that runs on the PC...

I agree that the ability to access your feeds from anywhere is a great advantage of Web-based feed readers. On the other hand, I haven't found a Web-based feed reader that has all of the features I've come to take for granted in RSS Bandit especially when you consider some of the extra features like downloading podcasts and being able to access newsgroups. I've always wanted to be able to provide the best of both worlds to our users which is why a few years, I added support for the NewsGator API with the intent of enabling our users get to no longer have to choose between a desktop feed reader and a Web-based one, and can use whichever they want whenever they want (i.e. the Outlook and Outlook Web Access model).

However my implementation was incomplete and I incorrectly blamed the API when in truth I didn't do a great job in understanding how the API should be used. Yesterday, I revisited the problem and fixed a number of brain dead misuses of the API in our implementation (e.g. ReplaceSubscriptions isn't a shortcut that can be used in place of multiple calls to AddSubscription and DeleteSbscription) . Now the feature works like a charm and I've created two screen casts showing how you can get the world's best RSS reading experience for FREE, today.

RSS Bandit & NewsGator (part 1)

RSS Bandit & NewsGator (part 2)

To learn more about using this feature, you can also read the RSS Bandit documentation on using NewsGator and RSS Bandit to synchronize your feeds across multiple computers.

Categories: RSS Bandit

September 17, 2007

@ 04:29 AM

Comments [2]

RSS Bandit v1.5.0.17 Released

The final version of the ShadowCat release of RSS Bandit is now available. This release isn't about new features but instead about making the existing features work a lot better and fixing a ton of stability issues related to our usage of Lucene for search engine for our full-text search.

Translations
This release is available in the following languages; English, German, Polish, French, Simplified Chinese, Russian, Brazilian Portuguese, Turkish, Dutch, Italian, Serbian and Bulgarian.

Installer
Download the installer from RssBandit1.5.0.17b_installer.zip . A snapshot of the source code will be available later in the week as a source code release.

New Features

Newspaper view can be configured to show unread items in a feed as multiple pages instead of a single page of items
Feed list and read/unread status of news items can be synchronized with NewsGator Online (and thus FeedDemon and NetNewsWire as well).

Major Bug Fixes

Images show up as broken links when viewing the feed for the Facebook blog
Application gets lost on second screen in multi-monitor setup (bug 1288729) and incorrect dual-display handling on startup (bug 1479148)
Marking search engines in options doesn't register as change (bug 1782946)
Application crashes with "InvalidOperationException" or "Index is closed" error message. (bug 1777810)
Application crashes with "docs out of order" error message. (bug 1783688)
Synchronization with Newsgator Online does not recognize items marked as read from within RSS Bandit (bug 1368567).
Automatic Mark as Read behavior while scrolling sometimes stops working.
Deleted items show up as unread after upgrading from v1.3.0.42
Application displays error dialog with "UnauthorizedAccessException" or "access denied" error at random times during the day. (bug 1777411)
Javascript errors on Web pages result in error dialogs being displayed or the application hanging.
Search indexing thread takes 100% of CPU and freezes the computer.
Crashes related to Lucene search indexing (e.g. IO exceptions, access violations, file in use errors, etc)
Crash when a feed has an IRI (i.e. URL with unicode characters such as http://www.bücher-forum.de/forum/rdf.php) instead of issuing an error message since they are not supported.
Context menus no longer work on category nodes after OPML import or remote sync
Crash on deleting a feed which still has enclosures being downloaded
Podcasts downloaded from the http://poweruser.tv/ feed are named "..mp3" instead of the actual file name.
Items marked as read in a search folder not marked as read in original feed
No news shown when subscribed to newsgroups.borland.com
Can't subscribe to feeds on your local hard drive (regression since it worked in previous versions)
Random crashes due to error renaming file "deleteable.new" to "deletable" or "segments.new" to "segments" in search index folder.
Items in Atom 0.3 feeds that have a <created> date but no <issued> date show their date as the last modified date of the feed instead of the created date.
Images don't show up on certain items when clicking on feed or category view if the feed uses relative links such as in Tim Bray’s feed at http://www.tbray.org/ongoing/ongoing.atom.
Empty pages displayed in newspaper view when browsing multiple feeds under a category node.
Newly added feeds do not inherit the feed refresh rate specified in the Options dialog.
In certain cases, the following error message is displayed when attempting feed upload via FTP; "Feedlist upload failed with error: Passive mode not allowed on this server.."
Application crashes on startup with the COMException "unknown error"
None of the options when right-clicking on "This Feed" in feed properties is valid for newsgroups.
Crash because the application cannot modify the .treestate.xml configuration file
Crash when clicking on enclosure link in toast window

After the final release of ShadowCat, the following release of RSS Bandit (codenamed Phoenix) will contain our next major user interface revamp and will be the version where we move to version 2.0 of the .NET Framework. ShadowCat will be the last version of RSS Bandit that will run on version 1.1 of the .NET Framwork. You can find some of our early Phoenix screen shots on Flickr.

Categories: RSS Bandit

September 14, 2007

@ 03:30 PM

Comments [9]

RSS Bandit Apologies

Torsten and I just got an email asking whether RSS Bandit is an abandoned project. This is a fair question given how little activity there has been from us over the past couple of months.

The fact is that we're both pretty busy in our personal lives right now. I'm getting married a week from tomorrow and Torsten just had a baby. Since RSS Bandit is a project we maintain in our free time, it has suffered while we go through transitions on the home front. I'm about a month away from being able to dedicate any serious time to the project and Torsten is in a similar boat.

However the currently checked in code is a lot more stable than the current release since we have fixed or worked around a number of issues in Lucene.NET which we use for building the search index. For that reason, I'll be releasing a new version of RSS Bandit this weekend which should be fix a lot of issues a number of our regular users have had with the last release.

Next month, I'll start thinking about the features I'd like to add in the next release and should be able to start writing new code again. Thanks for your patience and support.

Categories: RSS Bandit

September 12, 2007

@ 04:00 AM

Comments [5]

Some Thoughts on CouchDB and Relational Databases

Recently I took a look at CouchDB because I saw it favorably mentioned by Sam Ruby and when Sam says some technology is interesting, he’s always right. You get the gist of CouchDB by reading the CouchDB Quick Overview and the CouchDB technical overview.

CouchDB is a distributed document-oriented database which means it is designed to be a massively scalable way to store, query and manage documents. Two things that are interesting right off the bat are that the primary interface to CouchDB is a RESTful JSON API and that queries are performed by creating the equivalent of stored procedures in Javascript which are then applied on each document in parallel. One thing that not so interesting is that editing documents is lockless and utilizes optimistic concurrency which means more work for clients.

As someone who designed and implemented an XML Database query language back in the day, this all seems strangely familiar.

So far, I like what I’ve seen but there seems to already be a bunch of incorrect hype about the project which may damage it’s chances of success if it isn’t checked. Specifically I’m talking about Assaf Arkin’s post CouchDB: Thinking beyond the RDBMS which seems chock full of incorrect assertions and misleading information.

Assaf writes

This day, it happens to be CouchDB. And CouchDB on first look seems like the future of database without the weight that is SQL and write consistency.

CouchDB is a document oriented database which is nothing new [although focusing on JSON instead of XML makes it buzzword compliant] and is definitely not a replacement/evolution of relational databases. In fact, the CouchDB folks assert as much in their overview document.

Document oriented database work well for semi-structured data where each item is mostly independent and is often processed or retrieved in isolation. This describes a large category of Web applications which are primarily about documents which may link to each other but aren’t processed or requested often based on those links (e.g. blog posts, email inboxes, RSS feeds, etc). However there are also lots of Web applications that are about managing heavily structured, highly interrelated data (e.g. sites that heavily utilize tagging or social networking) where the document-centric model doesn’t quite fit.

Here’s where it gets interesting. There are no indexes. So your first option is knowing the name of the document you want to retrieve. The second is referencing it from another document. And remember, it’s JSON in/JSON out, with REST access all around, so relative URLs and you’re fine.

But that still doesn’t explain the lack of indexes. CouchDB has something better. It calls them views, but in fact those are computed tables. Computed using JavaScript. So you feed (reminder: JSON over REST) it a set of functions, and you get a set of queries for computed results coming out of these functions.

Again, this is a claim that is refuted by the actual CouchDB documentation. There are indexes, otherwise the system would be ridiculously slow since you would have to run the function and evaluate every single document in the database each time you ran one of these views (i.e. the equivalent of a full table scan). Assaf probably meant to say that there aren’t any relational database style indexes but…it isn’t a relational database so that isn’t a useful distinction to make.

I’m personally convinced that write consistency is the reason RDBMS are imploding under their own weight. Features like referential integrity, constraints and atomic updates are really important in the client-server world, but irrelevant in a world of services.

You can do all of that in the service. And you can do better if you replace write consistency with read consistency, making allowances for asynchronous updates, and using functional programming in your code instead of delegating to SQL.

I read these two paragraph five or six times and they still seem like gibberish to me. Specifically, it seems silly to say that maintaining data consistency is important in the client-server world but irrelevant in the world of services. Secondly, “Read consistency” and “write consistency” are not an either-or choice. They are both techniques used by database management systems, like Oracle, to present a deterministic and consistent experience when modifying, retrieving and manipulating large amounts of data.

In the world of online services, people are very aware of the CAP conjecture and often choose availability over data consistency but it is a conscious decision. For example, it is more important for Amazon that their system is always available to users than it is that they never get an order wrong every once in a while. See Pat Helland’s (ex-Amazon architect) example of a how a business-centric approach to data consistency may shape one’s views from his post Memories, Guesses, and Apologies where he writes

#1 - The application has only a single replica and makes a "decision" to ship the widget on Wednesday. This "decision" is sent to the user.

#2 - The forklift pummels the widget to smithereens.

#3 - The application has no recourse but to apologize, informing the customer they can have another widget in one month (after the incoming shipment arrives).

#4 - Consider an alternate example with two replicas working independently. Replica-1 "decides" to ship the widget and sends that "decision" to User-1.

#5 - Independently, Replica-2 makes a "decision" to ship the last remaining widget to User-2.

#6 - Replica-2 informs Replica-1 of its "decision" to ship the last remaining widget to User-2.

#7 - Replica-1 realizes that they are in trouble... Bummer.

#8 - Replica-1 tells User-1 that he guessed wrong.

#9 - Note that the behavior experienced by the user in the first example is indistinguishable from the experience of user-1 in the second example.

Eventual Consistency and Crappy Computers

Business realities force apologies. To cope with these difficult realities, we need code and, frequently, we need human beings to apologize. It is essential that businesses have both code and people to manage these apologies.

…

We try too hard as an industry. Frequently, we build big and expensive datacenters and deploy big and expensive computers.

In many cases, comparable behavior can be achieved with a lot of crappy machines which cost less than the big expensive one.

The problem described by Pat isn’t a failure of relational databases vs. document oriented ones as Assaf’s implication would have us believe. It is the business reality that availability is more important than data consistency for certain classes of applications. A lot of the culture and technologies of the relational database world are about preserving data consistency [which is a good thing because I don’t want money going missing from my bank account because someone thought the importance of write consistency is overstated] while the culture around Web applications is about reaching scale cheaply while maintaining high availability in situations where the occurence of data loss is unfortunate but not catastrophic (e.g. lost blog comments, mistagged photos, undelivered friend requests, etc).

Even then most large scale Web applications that don’t utilize the relational database features that are meant to enforce data consistency (triggers, foreign keys, transactions, etc) still end up rolling their own app-specific solutions to handle data consistency problems. However since these are tailored to their application they are more performant than generic features which may exist in a relational database.

For further reading, see an Overview of the Flickr Architecture.

Now playing: Raekwon - Guillotine (Swordz) (feat. Ghostface Killah, Inspectah Deck & GZA/Genius)

Categories: Platforms | Web Development

September 12, 2007

@ 04:00 AM

Comments [2]

OAuth: Standardizing Authentication and Authorization for Web APIs

I attended the Data Sharing Summit last Friday and it was definitely the best value (with regards to time and money invested) that I've ever gotten out of a conference. You can get an overview of that day’s activities in Marc Canter’s post Live blogging from the DataSharingSummit. I won’t bother with a summary of my impressions of the day since Marc’s post does a good job of capturing the various topics and ideas that were discussed.

What I will discuss is a technology initiative called OAuth which is being cooked up by the Web platform folks at Yahoo!, Google, Six Apart, Pownce, Twitter and a couple of other startups.

It all started when Leah Culver was showing off the profile aggregation feature of Pownce on her Pownce profile. If you look at the bottom left of that page, you’ll notice that it links to her profiles on Digg, Facebook, Upcoming, Twitter as well as her weblog. Being the paranoid sort, I asked whether she wasn’t worried that her users could fall victim to imposters such as this fake profile of Robert Scoble? She replied that her users should be savvy enough to realize that just because there are links to profiles in a side bar doesn’t prove that the user is actually that person. She likened the feature to blog *bling* as opposed to something that should be taken seriously.

I pressed further and asked whether she didn’t think that it would be interesting if a user of Pownce could prove their identity on Twitter via OpenID (see my proposal for social network interoperability based on OpenID for details) and then she could use the Twitter API to post a user’s content from Pownce onto Twitter and show their Twitter “tweets” within Pownce. This would get rid of all the discussions about having to choose between Pownce and Twitter because of your friends or even worse using both because your social circle spans both services. It should be less about building walled gardens and more about increasing the size of the entire market for everyone via interoperability. Leah pointed out something I’d overlooked. OpenID gives you a way to answer the question “Is leahculver @ Pownce also leahculver @ Twitter?” but it doesn’t tell you how Pownce can then use this information to perform actions on Leah’s behalf on Twitter. Duh. I had implicitly assumed that whatever authentication ticket returned from the OpenID validation request could be used as an authorization ticket when calling the OpenID provider’s API, but there’s nothing that actually says this has to be the case in any of the specs.

Not only was none of this thinking new to Leah, she informed me that she had been working with folks from Yahoo!, Google, Six Apart, Twitter, and other companies on a technology specification called OAuth. The purpose of which was to solve the problem I had just highlighted. There is no spec draft on the official site at the current time but you can read version 0.9 of the specification. The introduction of the specification reads

The OAuth protocol enables interaction between a Web Service Provider(SP) and Consumer website or application. An example use case would be allowing Moo.com, the OAuth Consumer, to access private photos stored on Flickr.com, the OAuth Service Provider. This allows access to protected resources (Protected Resources) via an API without requiring the User to provide their Flickr.com (Service Provider) credentials to Moo.com (Consumer). More generically, OAuth creates a freely implementable and generic methodology for API authentication creating benefit to developers wishing to have their Consumer interact with various Service Providers.

While OAuth does not require a certain form of user interface or interaction with a User, recommendations and emerging best practices are described below. OAuth does not specify how the Service Provider should authenticate the User which makes the protocol ideal in cases where authentication credentials are not available to the Consumer, such as with OpenID.

This is an interesting example of collaboration between competitors in the software industry and a giant step towards actual interoperability between social networking sites and social graph applications as opposed to mere social network portability.

The goal of a OAuth is to move from a world where sites collect users credentials (i.e. username/passwords) for other Web sites so they can screen scrape the user’s information to one where users authorize Web sites and applications to act on their behalf on the target sites in a way that puts the user in control and doesn’t require giving up their usernames and passwords to potentially untrustworthy sites (Quechup flavored spam anyone?). The way it does so is by standardizing the following interaction/usage flow

Pre-Authentication

The Consumer Developer obtains a Consumer Key and a Consumer Secret from the Service Provider.

Authentication

The Consumer attempts to obtain a Multi-Use Token and Secret on behalf of the end user.

Web-based Consumers redirect the User to the Authorization Endpoint URL.

Desktop-based Consumers first obtain a Single-Use Token by making a request to the API Endpoint URL then direct the User to the Authorization Endpoint URL.

If a Service Provider is expecting Consumer that run on mobile devices or set top boxes, the Service Provider should ensure that the Authorization Endpoint URL and the Single Use Token are short and simple enough to remember for entry into a web browser.

The User authenticates with the Service Provider.

The User grants or declines permission for the Service Provider to give the Consumer a Multi-Use Token.

The Service Provider provides a Multi-Use Token and Multi-Use Token Secret or indicates that the User declined to authorize the Consumer’s request.

For Web-based Consumers, the Service Provider redirects to a pre-established Callback Endpoint URL with the Single Use Token and Single-Use Authentication Secret as arguments.

Mobile and Set Top box clients wait for the User to enter their Single-Use Token and Single-Use Secret.

Desktop-based Consumers wait for the User to assert that Authorization has completed.

The Consumer exchanges the Single-Use Token and Secret for a Multi-User Token and Secret.

The Consumer uses the Multi-Use Token, Multi-Use Secret, Consumer Key, and Consumer Secret to make authenticated requests to the Service Provider.

This standardizes the kind of user-centric API model that is utilized by Web services such as the Windows Live Contacts API, Google AuthSub and the Flickr API to authenticate and authorize applications to access a user’s data. I suspect that the reason OAuth does not mandate a particular authentication mechanism is as a way to grandfather in the various authentication mechanisms used by all of these APIs today. It’s one thing to ask vendors to add new parameters and return types to the information exchanged during user authentication between Web sites and another to request that they completely replace one authentication technology with another.

I’d love to see us get behind an effort like this at Windows Live*. I’m not really sure if there is an open way to participate. There seems to be a Google Group but it is private and all the members seem to be folks that know each other personally. I guess I’ll have to shoot some mail to the folks I met at the Data Sharing Summit or maybe one of them will see this post and will respond in the comments.

PS: More details about OAuth can be found in the blog post by Eran Hammer-Lahav entitled Explaining OAuth.

*Disclaimer: This does not represent the intentions, strategies, wishes or product direction of my employer. It is merely wishful thinking on my part.

Now playing: Sean Kingston - Beautiful Girls (remix) (feat. Fabolous & Lil Boosie)

Categories: Social Software | Web Development | XML Web Services

September 11, 2007

@ 04:00 AM

Comments [1]

Justin Timberlake's FutureSex/LoveShow

We attended Justin Timberlake FutureSex/LoveShow on Saturday and it was my best concert going experience in the Seattle area to date. A lot had to do with the location. The Tacoma Dome being an enclosed building and a sports arena has the right combination of acoustics and availability of vending services. I attended Eminem’s Anger Management 3 and Kenny Chesney’s The Road & Radio at the White River Amphitheatre and Qwest Field respectively and both locations were suffered from lousy acoustics because they were open air locations.

We missed the opening band which didn’t bother me once it was confirmed that it was not going to be Timbaland. JT’s performance was top notch, he not only performed the hits from both albums but also threw in some unexpected surprises like his verses from Gone and Dick in a Box. After the audience was done singing along to the latter, he joked “You guys watch too much YouTube”. I was amused that he didn’t say “You guys watch too much SNL”. He also mentioned that it had just won an Emmy that night. “Only in America,” he laughed.

I was surprised to notice that audience seemed to have been an older crowd than the crowd from Bumbershoot. That said, all those JT fans in their 20s and 30s still scream like teenage girls so I’d suggest some ear plugs if you’re ever thinking of attending one of his concerts.

Now playing: Justin Timberlake - What Goes Around.../...Comes Around Interlude

Categories: Music

Dare Obasanjo's weblog

"You can buy cars but you can't buy respect in the hood" - Curtis Jackson

Navigation for Monday, 01 October 2007 - Dare Obasanjo's weblog

Sophos Facebook ID Probe findings:

Queues

Pre-Authentication

Authentication