Monday, 05 March 2007 - Dare Obasanjo's weblog

March 5, 2007

@ 06:20 PM

API Hall of Shame: Lucene's IndexReader and IndexWriter

For the current release of RSS Bandit we decided to forego our homegrown solution for providing search over a user's subscribed feeds and go with Lucene.NET. The search capabilities are pretty cool but the provided APIs leave a lot to be desired. The only major problem we encountered with Lucene.NET is that concurrency issues are commonplace. We decided to protect against this by having only one thread that modified the Lucene index since a lot of problems seemed to occur when multiple threads were trying to modify the search index.

This is where programming with Lucene.NET turns into a journey into the land of Daily WTF style proportions.

WTF #1: There are two classes used for modifying the Lucene index. This means you can't just create a singleton and protect access to it from multiple threads. Instead one must keep instances of two different types around and make sure if one instance is open the other is closed.

WTF #2: Although the classes are called IndexReader and IndexWriter, they are both used for editing the search index. There's a fricking Delete() method on a class named IndexReader.

Code Taken from Lucene Examples

public void  DeleteDocument(int docNum)
{
	lock (directory)
	{
		AssureOpen();
		CreateIndexReader();
		indexReader.DeleteDocument(docNum);
	}
}

void  CreateIndexReader()
{
	if (indexReader == null)
	{
		if (indexWriter != null)
		{
			indexWriter.Close();
			indexWriter = null;
		}
		indexReader = IndexReader.Open(directory);
	}
}

void  AddDocument(Document doc)
{
	lock (directory)
	{
		AssureOpen();
		CreateIndexWriter();
		indexWriter.AddDocument(doc);
	}
}

void  CreateIndexWriter()
{
	if (indexWriter == null)
	{
		if (indexReader != null)
		{
			indexReader.Close();
			indexReader = null;
		}
		
	}
}

As lame as this is, Lucene.NET is probably the best way to add desktop search capabilities to your .NET Framework application. I've heard they've created an IndexModifier class in newer versions of the API so some of this ugliness is hidden from application developers. How anyone thought it was OK to ship with this kind of API ugliness in the first place is beyond me.

Categories: Programming

March 5, 2007

@ 05:29 PM

Comments [0]

Seattle Startup Shoutout: iLike

Every once in a while someone asks me about software companies to work for in the Seattle area that aren't Microsoft, Amazon or Google. This is the first in a series of weekly posts about startups in the Seattle area that I often mention to people when they ask me this question.

The iLike service from GarageBand.com is one of a new breed of "social" music services which is a category popularized by Last.fm. The service consists of two primary aspects

A website where one can create a profile, add friends, view stats about the music you listen to and see what music is popular among iLike users.
An iTunes plugin which recommends songs from signed and unsigned artists based on what you are listening to and also allows you to see what your friends are currently listening to.

I tried the service and definitely like the concept of getting music recommendations from directly within iTunes. The only downside is that you get samples of the recommended songs (probably the same snippets from the iTunes music store) instead of having the entire recommended song streamed to you. I guess that makes sense since it is a free service and likely makes money via an affiliate program. The company recently got a bunch of funding from Ticketmaster so I expect that they will soon start integrating concert ticket recommendations into their user experience which would explain why they require a zip code when signing up for the service.

The president of iLike is Hadi Partovi who recently left Microsoft for the second time after a stint as a General Manager at MSN where he greenlighted start.com which eventually morphed into the live.com personalized page. One of the key developers of iLike is Steve Rider who was the original developer of start.com.

Press: Seattle Times on iLike

Number of Employees: 25

Location: Seattle, WA (Capitol Hill)

Jobs: jobs@iLike-inc.com, current open positions are for a Web / Server (Ruby) engineer, Software Development Engineer in Test, Web/DHTML engineer, Database engineer, and desktop client engineer

Categories: Seattle Startup Shoutout

March 4, 2007

@ 09:17 PM

Comments [7]

RSS Bandit v1.5.0.10 Released

Although this has taken much longer than I expected, the Jubilee release of RSS Bandit is now done and available for all. Besides the new features there are a number of performance improvements especially with regards to the responsiveness of the application.

Major differences between v1.5.0.10 and v1.3.0.42 below

Translations
This release is available in the following languages; English, German, Polish, French, Simplified Chinese, Russian, Brazilian Portuguese, Turkish, Dutch, Italian, Serbian and Bulgarian.

Installer
Download the installer from RssBandit1.5.0.10_installer.zip . A snapshot of the source code will be availabe later in the week as a source code release.

New Features

Comment Watching
Items marked as read when viewed in the Newspaper
Date-based grouping in the list view
Options for opening tabs in the background
Favicons
Enclosures treated as attachments
Certain user defined enclosures treated as podcasts including adding to playlists in iTunes and Windows media player
Remembering application state on restart - This will work similar to the Session Saver extension in Firefox in that open tabs and the tree view state will be remembered on restart
Revamping the search feature - We've moved the implementation of feed search to Lucene.Net from our custom feed search implementation which should make searches faster and provide richer search options. The syntax for performing Lucene search queries is available at http://lucene.apache.org/java/docs/queryparsersyntax.html
Support for Atom Threading Extensions
Easily configurable keyboard shortcuts - Just right-click when hovering over the toolbar menu and choose "Customize". This feature came as a freebie from switching to Infragistics NetAdvantage for some of our UI components.

Major Bug Fixes

Feed items appear in wrong feed folders - we now apply a set of heuristics to prevent this problem from surfacing ever again. I'm pretty sure this problem is due to bugs in HTTP Pipelining. However it is unclear whether the bugs are in the .NET Frameworks HTTP library, proxy servers that RSS Bandit is passing through or the Web servers that the application is fetching feeds from.
Relative Links in Atom 1.0 feeds appear incorrectly - now that this is fixed the links Tim Bray and Sam Ruby's feeds now work correctly
Atom feeds from the Blogger beta site show no posts when viewed in RSS Bandit - this was just a dumb bug on my part.
Sites with malformed cookies cause feeds not to be fetched - Specifically, fetching cookies from sites such as Windows Live Spaces results in the "An error has occurred when parsing Cookie header". Now we just ignore the error and soldier on.
RSS Bandit needs administrator privileges on first run - the fix for this was so esoteric it boggles my mind.
RSS Bandit stops downloading feeds after a while - This was actually two bugs. The first was that the application stopped automatically downloading feeds if any of them timed out while being fetched. The other was that feeds with whitespace in the URLs were not being updated.
Application doesn't work in Windows Vista - this has been tracked down to our use of old versions of the Divelement controls. Although these issues have been fixed in newer versions of the Divelement controls we have decided to move to NetAdvantage for Windows Forms controls for unrelated reasons. The new controls should work fine in Windows Vista.
Application crashes with NullReferenceException during Web browsing - this is another issue which should be fixed with our move to the NetAdvantage for Windows Forms controls.

Categories: RSS Bandit

March 2, 2007

@ 12:23 AM

Comments [0]

I Hate URI Canonicalization

I just got a phone call from an RSS Bandit user whose daily workflow had been derailed by a bug in the application. It seems that we were crashing with an ArgumentException stating "Argument already exists in collection" when she tried to import an OPML file. This seemed weird because I always make sure to check if a feed URL exists in the table of currently subscribed URIs before adding it. Looking at the code made me even more confused


if(!_feedsTable.ContainsKey(f1.link)){
	f1.lastretrievedSpecified = true;
	f1.lastretrieved = dta[count % dtaCount];
	_feedsTable.Add(f1.link, f1); 	/* exception thrown here */ 
}

So I looked at the implementations of the ContainsKey() and Add() in my data structure which lead me to the conclusion that we need better unit tests


public virtual bool ContainsKey(String key) {			
  return (IndexOfKey(key) >= 0);
}

public virtual void Add(String key, feedsFeed value) {
	if ((object) key == null)
		throw new ArgumentNullException("key");

	/* convert the URI to a canonicalized absolute URI */ 
	try{
		Uri uri = new Uri(key); 
		key = uri.AbsoluteUri;
		value.link = key; 
	}catch {}

	int index = IndexOfKey(key);

	if (index >= 0)
		throw new ArgumentException(
			"Argument already exists in collection.", "key");

	Insert(~index, key, value);
}

My apologies to any of our users who have been hit by this problem. It'll be fixed in the final release of Jubilee.

Categories: Programming | RSS Bandit

March 1, 2007

@ 05:51 PM

Comments [2]

Earn Money for Your Favorite Charity by Chatting on IM

From the blog post entitled The i'm Initiative and new secret emoticon on the Windows Live Messenger team's blog we learn

Not everyone has the financial ability to give money to the causes they care about. That is where the i'm Initiative steps in - it enables Windows Live Messenger users to make a difference by directing a portion of Messenger's advertising revenue to a cause of their choosing.
...
Wonderful! How does it work?
Use Messenger 8.1
Add the i'm emoticon to your display name by entering the code of the cause you would like to support
Send and receive IMs
A portion of the advertising revenue generated by your usage of Messenger will be donated to your cause. So the more IMs you send and receive the more money will be donated to your cause.
How does Messenger even generate revenue\money anyway?

Windows Live Messenger is a free service to users. We do include advertisements in the client that help pay for the service and our salaries. With the i'm Initiative you get to decide where a portion of the revenue goes.

The list of codes to create the emoticon are listed in the blog post. I'm using *9mil in my IM handle. This trend of tying charitable donations to the usage of Windows Live services is interesting. It's kinda cool for our users to feel like they are contributing to the betterment of the world simply by using our software the same way they have every day. Good stuff.

Categories: Windows Live

March 1, 2007

@ 02:55 PM

Comments [21]

Top 5 Industries the Internet has Killed

While I was house hunting a couple of weeks ago, I saw a house for sale that has a sign announcing that there was an "Open House" that weekend. I had no idea what an "Open House" was so I asked a real estate agent about it. I learned that during an "Open House", a real estate agent sits in an empty house that is for sale and literally has the door open so that people interested in the house can look around and ask questions about the house. The agent pointed out that with the existence of the Internet, this practice has now become outdated because people can get answers to most of their questions including pictures of the interior of houses for sale on real estate listing sites.

This got me to thinking about the Old Way vs. Net Way column that used to run in the Yahoo! Internet Life magazine back in the day. The column used to compare the "old" way of performing a task such as buying a birthday gift from a store with the "net" way of performing the same task on the Web.

We're now at the point in the Web's existence where some of the "old" ways to do things are now clearly obsolete in the same way it is now clear that the horse & buggy is obsolete thanks to the automobile. After looking at my own habits, I thought it would be interesting to put together a list of the top five industries that have been hurt the most by the Web. From my perspective they are

Map Makers: Do you remember having to buy a map of your city so you could find your way to the address of a friend or coworker when you'd never visited the neighborhood? That sucked didn't it? When was the last time you did that versus using MapQuest or one of the other major mapping sites.
Travel Agents: There used to be a time when if you wanted to get a good deal on a plane ticket, hotel stay or vacation package you had to call or visit some middle man who would then talk to the hotels and airlines for you. Thanks to sites like Expedia the end result may be the same but the process is a lot less cumbersome.
Yellow Pages: When I can find businesses near me via sites like http://maps.live.com and then go to sites like Judy's Book or City Search to get reviews, the giant yellow page books that keep getting left at my apartment every year are nothing but giant doorstops.
CD Stores: It's no accident that Tower Records is going out of business. Between Amazon and the iTunes Music Store you can get a wider selection of music, customer reviews and instant gratification. Retail stores can't touch that.
Libraries: When I was a freshman in college I went to the library a lot. By the time I got to my senior year most of my research was exclusively done on the Web. Libraries may not be dead but their usefulness has significantly declined with the advent of the Web.

I feel like I missed something obvious with this list but it escapes me at the moment. I wonder how many more industries will be killed by the Internet when all is said and done. I suspect real estate agents and movie theaters will also go the way of the dodo within the next decade.

PS: I suspect I'm not the only one who finds the following excerpt from the The old way vs. the net way article hilarious

In its July issue, it compared two ways of keeping the dog well-fed. The Old Way involved checking with the local feed store and a Petco superstore to price out a 40-lb. bag of Nutra Adult Maintenance dog food. The effort involved four minutes of calling and a half-hour of shopping.

The Net Way involved electronically searching for pet supplies. The reporter found lots of sites for toys and dog beds, but no dog food. An electronic search specifically for dog food found a "cool Dog Food Comparison Chart" but no online purveyor of dog chow. Not even Petco's Web site offered a way to order and purchase online. The reporter surfed for 30 minutes, without any luck. Thus, the magazine declared the "old way" the winner and suggested that selling dog food online is a business waiting to be exploited.

Yeah, somebody needs to jump on that opportunity. :)

Categories: Technology

February 27, 2007

@ 07:09 PM

Comments [4]

Facebook's Announces FQL for Developers

Today on the Facebook blog I spotted a post entitled FQL which contains the following excerpt

Two and a half months ago, a few of us were hanging out in the Facebook TV room, laying on the Fatboys and geeking out about how to move forward with the API for the Facebook Platform. We had a beta version that was fully functional, but we kept wishing that the interface were cleaner, more concise, and more consistent. Suddenly it occurred to me – this problem had been solved over 30 years earlier by database developers who came up with SQL – the Structured Query Language. What if we could use the same time-tested interface as the way for developers to access Facebook's data?
...
This isn't a simple problem – with millions of users and billions of friend connections, photos, tags, etc., Facebook's data doesn't exactly fit into your average database. And, even if it did, we still have to carefully apply all of those complicated privacy rules. Facebook Query Language would have to take those SQL-style queries from developers, figure out what data they're actually looking for, figure out if they're allowed to actually see the data, figure out where the data is stored, and then finally go and get the data to return back to the developer. I knew building FQL would be hard, but that's why I couldn't wait to do it.

This is one of those things I used to think was a great idea when I was on the XML team at Microsoft. Instead of exposing your data using APIs, why not expose your data as XML then allow people to perform XQuery operations over the data. In reality, this often isn't really feasible because you don't want people performing arbitrary queries over your data store that may request data too much data (SELECT * FROM blog_posts) or are expensive computationally.

Looking at the FQL developers guide it states that a typical queries look like

SELECT name, pic FROM user WHERE uid=211031 OR uid=4801660

SELECT name, affiliations FROM user
WHERE uid IN (SELECT uid2 FROM friend WHERE uid1=211031)
      AND "Facebook" IN affiliations.name AND uid < 10

SELECT src, caption, 1+2*3/4, caption, 10*(20 + 1) FROM photo
WHERE pid IN (SELECT pid FROM photo_tag WHERE subject=211031) AND
      pid IN (SELECT pid FROM photo_tag WHERE subject=204686) AND
      caption

and return results as XML. I've assumed that what is supported is a simple subset of SQL, perhaps written with Lex & Yacc or ANTLR but it still seems somewhat problematic to move away from the constrained interface of an API and provide access via a query language. It is definitely a lot cooler and more consistent to work with a query language than an API though. Later on when I have some free time, I'll see if I can deduce the grammer for FQL by trying out queries in the Facebook API test console. It looks like there goes one of my evenings this week.

Nice work.

Categories: Competitors/Web Companies | XML Web Services

February 27, 2007

@ 06:15 PM

Comments [2]

Created my first Yahoo! Pipe

With the hubbub now settling down down I decided to go back and try out Yahoo! Pipes. For a while, I've wanted a feed for articles by Chris Kelly over on Huffington Post so I decided to build that. After a couple of false starts I created the feed which currently doesn't have any items because there aren't any posts by Chris Kelly in the Huffington Post feed.

Now that I've actually used the service I'm pretty surprised that anyone thinks that this is a service that non-geeks will use. Programming with flowcharts to process RSS feeds seems even geekier than having a Star Trek wedding which was my previous bar for geekiest thing ever.

Categories: Syndication Technology

February 27, 2007

@ 03:50 AM

Comments [1]

How Big is the Health Related Search Market?

From the Microsoft press release Microsoft Demonstrates Further Commitment to Healthcare Market With Planned Acquisition of Web Search Company we learn

NEW ORLEANS — Feb. 26, 2007 — Microsoft Corp. today announced that it has agreed to acquire Medstory Inc., a privately held company based in Foster City, Calif., that develops intelligent Web search technology specifically for health information. The acquisition represents a strategic move for Microsoft in the consumer health search arena and signals a long-term commitment toward the development of a broader consumer health strategy. Medstory employees will join the Health Solutions Group, a recently formed division at Microsoft that will manage product development and delivery. Financial terms were not disclosed, as part of the agreement between the organizations.

This reminds me of the post Thoughts on health care, continued from Google's Adam Bosworth which stated

As I indicated in my post last week, I've been interested in the issue of health care and health information for a while. I just spoke at a conference about some of the challenges in the health care system that we at Google want to tackle. The conference, called Connecting Americans to Their Health Care, is a gathering focused on how consumers are transforming health care through the use of personal health technologies.

This speech will give you some insight into the problems that we believe need our attention.

It is also interesting that Adam Bosworth had been billed with the title Architect, Google Health for a while. I'd once heard that the the market for medical related keywords is one of the most lucrative for search engines which may explain the interest. However if you look at the list of most expensive adwords it would seem that building a vertical search engine targetted at debt consolidation is the real goldmine. :)

Categories: Competitors/Web Companies

February 26, 2007

@ 03:09 PM

Comments [4]

Entropy in Tagging Systems, Google's Office Killer and Conference Diversity

I'm almost caught up on my blog reading since getting back from vacation and I've spotted a couple of items I'd have blogged responses to if I was around. Since I don't have the time to write full blog posts on each of these items, here are links to the posts and brief outlines on what I thought about them

Harish Mallipeddi has a blog post entitled Measuring efficiency of tagging with Entropy links to the paper Understanding Navigability of Social Tagging Systems by Ed Chi and Todd Mytkowicz of Xerox Parc which excerpts the key findings from the paper. One result of their research which seems obvious in hindsight and shows one of the issues that social software has to deal with as its community of users grows was
The way he does that is to measure entropy (yup that same old same old Claude Shannon’s information theory which you learned in one of the CS courses) of entities like documents (D), users (U) and tags (T). His research group crawled the entire del.icio.us archive and then calculated the entropies. Here’s what they found:
• H(D|T) specifies the social navigation efficiency. How efficient is it for us to specify a set of tags to find a set of specific documents? We found that in del.icio.us that it is getting less and less efficient.
This makes sense when you think about it. Let's say the first set of users of del.icio.us came from a homogenous software development background and started applying the tag "xml" to mean items about the eXtensible Markup Language. Later on as the community grew, a number of gamers joined the site and they now use the tag "xml" to refer to items about the game X-Men Legends. Now if you are one of the original geek users of the site, the URL http://del.icio.us/tag/xml no longer is just about markup languages but also about video games. To actually find items strictly about the eXtensible Markup Language you may have to add other tags as refinements such as http://del.icio.us/tag/xml+programming.
What this means is that to the oldest users of the site, the quality of the tagging system will seem to degrade over time even though this is a natural consequence of growth and diversifying its user base. Of course, this is only a problem if a lot of people use del.icio.us to find all items about a topic (i.e. browsing by tags) as opposed to just storing their individual bookmarks or subscribing to the bookmarks of people they know and trust.
It seems Google announced some sort of Microsoft Office killer last week. You can read Don Dodge's Why Microsoft will not fall into the Innovators Dilemma and Robert Scoble's Microsoft has no innovator’s dillema? for two conflicting opinions on how this affects Microsoft. Personally, I think I've overdosed on the amount of times I've read the words innovator's dilemma in association with this announcement while catching up on email and blogs. What is funny about this situation is that almost everyone I've seen who throws the term around doesn't seem to have read the book. It is quite interesting to see Don Dodge write sentences like
Microsoft will do everything possible to preserve these businesses while transitioning to the new Live strategy.
and then follow that up with "No Innovators Dilemma here" without seeing the obvious contradiction in his words. Lots of doublethink at work it seems.
A side effect of reading this set of blog posts is that I found Don Dodge's Innovate or Imitate...Fame or Fortune? which praises being a fast follower as being more valuable than being an innovator. I've found that a lot of people at Microsoft point to past and recent successes such as XBox, Microsoft Office and Internet Explorer as proof that being a "fast follower" is the best strategy for Microsoft. There are three key problems with this kind of thinking
1. It assumes your competitors are incompetent. This may have worked in the old days but with competitors like Google and Apple Inc, it isn't the case anymore.
2. It requires that you have an ace up your sleeve that significantly one ups the competitors when you ship your knock off (e.g. integrating disparate applications into an Office Suite and pricing it lower than competitors, integrating product into the operating system, integrating a rich and social online experience into what was previously a solitary experience etc).
3. It ignores the fact that "first mover advantage" is actually true for applications that have network effects which is definitely the case for social software which a lot of software has become today.
The "diversity in conferences" recurring debate was kicked off again by a blog post by Jason Kottke entitled Gender Diversity at Web Conferences which encouraged the interesting responses from folks like Eric Meyer, Anil Dash and Shelley Powers. They are all good posts with stuff I agree and disagree with in them but I wasn't moved to write until I read the post Why are smart people still stuck on gender and skin-color blinders? by Tantek Çelik where he wrote

Why is it that gender (and less often race, nay, skin-color, see below) are the only physical characteristics that lots of otherwise smart people appear to chime in support for diversity of?

E.g. as long as we are trying for greater diversity in superficial physical characteristics (superficial because what do such characteristics have to do with the stated directly relevant criteria of "technical expertise, speaking skills, professional stature, brand appropriateness, and marketability" - though perhaps I can see a tenuous link with "rainbow" marketing), why not ask about other such characteristics?

Where are all the green-eyed folks?

Where are all the folks with facial tattoos?

Where are all the redheads?

Where are the speakers with non-ear facial piercings?

Surely such speakers would help with "hipness" marketing.

I found this post to be disingenious and wondered how anybody could downplay the gender and racial bias in the "Web 2.0" technology conference scene by equating it to a preference for green eyed speakers. So I decided to throw in my $0.02 on this topic...again.

After the last ETech, I realized I was seeing the same faces and hearing the same things over and over again. More importantly, I noticed that the demographics of the speaker lists for these conferences don't match the software industry as a whole let alone the users who we are supposed to be building the software for.
There were lots of little bits of ignorance by the speakers and audience which added up in a way that rubbed me wrong. For example, at the 2005 Web 2.0 conference a lot of people were ignorant of Skype except as 'that startup that got a bunch of money from eBay'. Given that there are a significant amount of foreigners in the U.S. software industry who use Skype to keep in touch with folks back home, it was surprising to see so much ignorance about it at a supposedly leading edge technology conference. The same thing goes for how suprised people were by how teenagers used the Web and computers. Additionally, there are just as many women using social software such as photo sharing, instant messaging, social networking, etc as men yet you rarely see their perspectives presented at any of these conferences.
When I think of diversity, I expect diversity of perspectives. People's perspectives are often shaped by their background and experiences. When you have a conference about an industry which is filled with people of diverse backgrounds building software for people of diverse backgrounds, it is a disservice to have the conversation and perspectives be homogenous. The software industry isn't just young white males in their mid-20s to mid-30s nor is that the primary demographic of Web users.

Personally, I've gotten tired of attending conferences where we heard more about technologies and sites that the homogenous demographic of young to middle aged, white, male computer geeks find interesting (e.g. del.icio.us and tagging) and less about what Web users actually use regularly or find interesting (hint: it isn't del.icio.us and it sure ~~as fuck~~ isn't tagging).

Categories: Competitors/Web Companies | Life in the B0rg Cube | Social Software

Dare Obasanjo's weblog

"You can buy cars but you can't buy respect in the hood" - Curtis Jackson

Navigation for Monday, 05 March 2007 - Dare Obasanjo's weblog