For the current release of RSS Bandit we decided to forego our homegrown solution for providing search over a user's subscribed feeds and go with Lucene.NET. The search capabilities are pretty cool but the provided APIs leave a lot to be desired. The  only major problem we encountered with Lucene.NET is that concurrency issues are commonplace. We decided to protect against this by having only one thread that modified the Lucene index since a lot of problems seemed to occur when multiple threads were trying to modify the search index.

This is where programming with Lucene.NET turns into a journey into the land of Daily WTF style proportions.

WTF #1: There are two classes used for modifying the Lucene index. This means you can't just create a singleton and protect access to it from multiple threads. Instead one must keep instances of two different types around and make sure if one instance is open the other is closed.

WTF #2: Although the classes are called IndexReader and IndexWriter, they are both used for editing the search index. There's a fricking Delete() method on a class named IndexReader.

Code Taken from Lucene Examples

public void  DeleteDocument(int docNum)
{
lock (directory)
{
AssureOpen();
CreateIndexReader();
indexReader.DeleteDocument(docNum);
}
}

void CreateIndexReader()
{
if (indexReader == null)
{
if (indexWriter != null)
{
indexWriter.Close();
indexWriter = null;
}

indexReader = IndexReader.Open(directory);
}
}


void AddDocument(Document doc)
{
lock (directory)
{
AssureOpen();
CreateIndexWriter();
indexWriter.AddDocument(doc);
}
}

void CreateIndexWriter()
{
if (indexWriter == null)
{
if (indexReader != null)
{
indexReader.Close();
indexReader = null;
}


}
}

As lame as this is, Lucene.NET is probably the best way to add desktop search capabilities to your .NET Framework application. I've heard they've created an IndexModifier class in newer versions of the API so some of this ugliness is hidden from application developers. How anyone thought it was OK to ship with this kind of API ugliness in the first place is beyond me.


 

Categories: Programming

Every once in a while someone asks me about software companies to work for in the Seattle area that aren't Microsoft, Amazon or Google. This is the first in a series of weekly posts about startups in the Seattle area that I often mention to people when they ask me this question.

The iLike service from GarageBand.com is one of a new breed of "social" music services which is a category popularized by Last.fm. The service consists of two primary aspects

  1. A website where one can create a profile, add friends, view stats about the music you listen to and see what music is popular among iLike users.
  2. An iTunes plugin which recommends songs from signed and unsigned artists based on what you are listening to and also allows you to see what your friends are currently listening to.

I tried the service and definitely like the concept of getting music recommendations from directly within iTunes. The only downside is that you get samples of the recommended songs (probably the same snippets from the iTunes music store) instead of having the entire recommended song streamed to you. I guess that makes sense since it is a free service and likely makes money via an affiliate program. The company recently got a bunch of funding from Ticketmaster so I expect that they will soon start integrating concert ticket recommendations into their user experience which would explain why they require a zip code when signing up for the service.

The president of iLike is Hadi Partovi who recently left Microsoft for the second time after a stint as a General Manager at MSN where he greenlighted start.com which eventually morphed into the live.com personalized page. One of the key developers of iLike is Steve Rider who was the original developer of start.com.

Press: Seattle Times on iLike

Number of Employees: 25

Location: Seattle, WA (Capitol Hill)

Jobs: jobs@iLike-inc.com, current open positions are for a Web / Server (Ruby) engineer, Software Development Engineer in Test, Web/DHTML engineer, Database engineer, and desktop client engineer


 

Although this has taken much longer than I expected, the Jubilee release of RSS Bandit is now done and available for all. Besides the new features there are a number of performance improvements especially with regards to the responsiveness of the application.

Major differences between v1.5.0.10 and v1.3.0.42 below

Translations
This release is available in the following languages; English, German, Polish, French, Simplified Chinese, Russian, Brazilian Portuguese, Turkish, Dutch, Italian, Serbian and Bulgarian.

Installer
Download the installer from RssBandit1.5.0.10_installer.zip . A snapshot of the source code will be availabe later in the week as a source code release.

New Features Major Bug Fixes
 

Categories: RSS Bandit

March 2, 2007
@ 12:23 AM

I just got a phone call from an RSS Bandit user whose daily workflow had been derailed by a bug in the application. It seems that we were crashing with an ArgumentException stating "Argument already exists in collection" when she tried to import an OPML file. This seemed weird because I always make sure to check if a feed URL exists in the table of currently subscribed URIs before adding it. Looking at the code made me even more confused


if(!_feedsTable.ContainsKey(f1.link)){
f1.lastretrievedSpecified = true;
f1.lastretrieved = dta[count % dtaCount];
_feedsTable.Add(f1.link, f1); /* exception thrown here */
}

So I looked at the implementations of the ContainsKey() and Add() in my data structure which lead me to the conclusion that we need better unit tests


public virtual bool ContainsKey(String key) {			
  return (IndexOfKey(key) >= 0);
}

public virtual void Add(String key, feedsFeed value) {
	if ((object) key == null)
		throw new ArgumentNullException("key");

	/* convert the URI to a canonicalized absolute URI */ 
	try{
		Uri uri = new Uri(key); 
		key = uri.AbsoluteUri;
		value.link = key; 
	}catch {}

	int index = IndexOfKey(key);

	if (index >= 0)
		throw new ArgumentException(
			"Argument already exists in collection.", "key");

	Insert(~index, key, value);
}

My apologies to any of our users who have been hit by this problem. It'll be fixed in the final release of Jubilee.


 

Categories: Programming | RSS Bandit

From the blog post entitled The i'm Initiative and new secret emoticon on the Windows Live Messenger team's blog we learn

Not everyone has the financial ability to give money to the causes they care about. That is where the i'm Initiative steps in - it enables Windows Live Messenger users to make a difference by directing a portion of Messenger's advertising revenue to a cause of their choosing.
...
Wonderful! How does it work?

  1. Use Messenger 8.1
  2. Add the i'm emoticon to your display name by entering the code of the cause you would like to support 
  3. Send and receive IMs
  4. A portion of the advertising revenue generated by your usage of Messenger will be donated to your cause. So the more IMs you send and receive the more money will be donated to your cause.
How does Messenger even generate revenue\money anyway?

Windows Live Messenger is a free service to users. We do include advertisements in the client that help pay for the service and our salaries. With the i'm Initiative you get to decide where a portion of the revenue goes.

The list of codes to create the emoticon are listed in the blog post. I'm using *9mil in my IM handle. This trend of tying charitable donations to the usage of Windows Live services is interesting. It's kinda cool for our users to feel like they are contributing to the betterment of the world simply by using our software the same way they have every day. Good stuff.


 

Categories: Windows Live

While I was house hunting a couple of weeks ago, I saw a house for sale that has a sign announcing that there was an "Open House" that weekend. I had no idea what an "Open House" was so I asked a real estate agent about it. I learned that during an "Open House", a real estate agent sits in an empty house that is for sale and literally has the door open so that people interested in the house can look around and ask questions about the house. The agent pointed out that with the existence of the Internet, this practice has now become outdated because people can get answers to most of their questions including pictures of the interior of houses for sale on real estate listing sites.

This got me to thinking about the Old Way vs. Net Way column that used to run in the Yahoo! Internet Life magazine back in the day. The column used to compare the "old" way of performing a task such as buying a birthday gift from a store with the "net" way of performing the same task on the Web.

We're now at the point in the Web's existence where some of the "old" ways to do things are now clearly obsolete in the same way it is now clear that the horse & buggy is obsolete thanks to the automobile. After looking at my own habits, I thought it would be interesting to put together a list of the top five industries that have been hurt the most by the Web. From my perspective they are

  1. Map Makers: Do you remember having to buy a map of your city so you could find your way to the address of a friend or coworker when you'd never visited the neighborhood? That sucked didn't it? When was the last time you did that versus using MapQuest or one of the other major mapping sites.

  2. Travel Agents: There used to be a time when if you wanted to get a good deal on a plane ticket, hotel stay or vacation package you had to call or visit some middle man who would then talk to the hotels and airlines for you. Thanks to sites like Expedia the end result may be the same but the process is a lot less cumbersome.

  3. Yellow Pages: When I can find businesses near me via sites like http://maps.live.com and then go to sites like Judy's Book or City Search to get reviews, the giant yellow page books that keep getting left at my apartment every year are nothing but giant doorstops.

  4. CD Stores: It's no accident that Tower Records is going out of business. Between Amazon and the iTunes Music Store you can get a wider selection of music, customer reviews and instant gratification. Retail stores can't touch that.

  5. Libraries: When I was a freshman in college I went to the library a lot. By the time I got to my senior year most of my research was exclusively done on the Web. Libraries may not be dead but their usefulness has significantly declined with the advent of the Web.

I feel like I missed something obvious with this list but it escapes me at the moment. I wonder how many more industries will be killed by the Internet when all is said and done. I suspect real estate agents and movie theaters will also go the way of the dodo within the next decade.

PS: I suspect I'm not the only one who finds the following excerpt from the The old way vs. the net way article hilarious

In its July issue, it compared two ways of keeping the dog well-fed. The Old Way involved checking with the local feed store and a Petco superstore to price out a 40-lb. bag of Nutra Adult Maintenance dog food. The effort involved four minutes of calling and a half-hour of shopping.

The Net Way involved electronically searching for pet supplies. The reporter found lots of sites for toys and dog beds, but no dog food. An electronic search specifically for dog food found a "cool Dog Food Comparison Chart" but no online purveyor of dog chow. Not even Petco's Web site offered a way to order and purchase online. The reporter surfed for 30 minutes, without any luck. Thus, the magazine declared the "old way" the winner and suggested that selling dog food online is a business waiting to be exploited.

Yeah, somebody needs to jump on that opportunity. :)


 

Categories: Technology

Today on the Facebook blog I spotted a post entitled FQL which contains the following excerpt

Two and a half months ago, a few of us were hanging out in the Facebook TV room, laying on the Fatboys and geeking out about how to move forward with the API for the Facebook Platform. We had a beta version that was fully functional, but we kept wishing that the interface were cleaner, more concise, and more consistent. Suddenly it occurred to me – this problem had been solved over 30 years earlier by database developers who came up with SQL – the Structured Query Language. What if we could use the same time-tested interface as the way for developers to access Facebook's data?
...
This isn't a simple problem – with millions of users and billions of friend connections, photos, tags, etc., Facebook's data doesn't exactly fit into your average database. And, even if it did, we still have to carefully apply all of those complicated privacy rules. Facebook Query Language would have to take those SQL-style queries from developers, figure out what data they're actually looking for, figure out if they're allowed to actually see the data, figure out where the data is stored, and then finally go and get the data to return back to the developer. I knew building FQL would be hard, but that's why I couldn't wait to do it.

This is one of those things I used to think was a great idea when I was on the XML team at Microsoft. Instead of exposing your data using APIs, why not expose your data as XML then allow people to perform XQuery operations over the data. In reality, this often isn't really feasible because you don't want people performing arbitrary queries over your data store that may request data too much data (SELECT * FROM blog_posts) or are expensive computationally.

Looking at the FQL developers guide it states that a typical queries look like

SELECT name, pic FROM user WHERE uid=211031 OR uid=4801660

SELECT name, affiliations FROM user
WHERE uid IN (SELECT uid2 FROM friend WHERE uid1=211031)
AND "Facebook" IN affiliations.name AND uid < 10

SELECT src, caption, 1+2*3/4, caption, 10*(20 + 1) FROM photo
WHERE pid IN (SELECT pid FROM photo_tag WHERE subject=211031) AND
pid IN (SELECT pid FROM photo_tag WHERE subject=204686) AND
caption

and return results as XML. I've assumed that what is supported is a simple subset of SQL, perhaps written with Lex & Yacc or ANTLR but it still seems somewhat problematic to move away from the constrained interface of an API and provide access via a query language. It is definitely a lot cooler and more consistent to work with a query language than an API though. Later on when I have some free time, I'll see if I can deduce the grammer for FQL by trying out queries in the Facebook API test console. It looks like there goes one of my evenings this week.

Nice work.


 

February 27, 2007
@ 06:15 PM

With the hubbub now settling down down I decided to go back and try out Yahoo! Pipes. For a while, I've wanted a feed for articles by Chris Kelly over on Huffington Post so I decided to build that.  After a couple of false starts I created the feed which currently doesn't have any items because there aren't any posts by Chris Kelly in the Huffington Post feed.

Now that I've actually used the service I'm pretty surprised that anyone thinks that this is a service that non-geeks will use. Programming with flowcharts to process RSS feeds seems even geekier than having a Star Trek wedding which was my previous bar for geekiest thing ever.


 

From the Microsoft press release Microsoft Demonstrates Further Commitment to Healthcare Market With Planned Acquisition of Web Search Company we learn

NEW ORLEANS — Feb. 26, 2007 — Microsoft Corp. today announced that it has agreed to acquire Medstory Inc., a privately held company based in Foster City, Calif., that develops intelligent Web search technology specifically for health information. The acquisition represents a strategic move for Microsoft in the consumer health search arena and signals a long-term commitment toward the development of a broader consumer health strategy. Medstory employees will join the Health Solutions Group, a recently formed division at Microsoft that will manage product development and delivery. Financial terms were not disclosed, as part of the agreement between the organizations.

This reminds me of the post Thoughts on health care, continued from Google's Adam Bosworth which stated

As I indicated in my post last week, I've been interested in the issue of health care and health information for a while. I just spoke at a conference about some of the challenges in the health care system that we at Google want to tackle. The conference, called Connecting Americans to Their Health Care, is a gathering focused on how consumers are transforming health care through the use of personal health technologies.

This speech will give you some insight into the problems that we believe need our attention.

It is also interesting that Adam Bosworth had been billed with the title Architect, Google Health for a while. I'd once heard that the the market for medical related keywords is one of the most lucrative for search engines which may explain the interest. However if you look at the list of most expensive adwords it would seem that building a vertical search engine targetted at debt consolidation is the real goldmine. :)


 

I'm almost caught up on my blog reading since getting back from vacation and I've spotted a couple of items I'd have blogged responses to if I was around. Since I don't have the time to write full blog posts on each of these items, here are links to the posts and brief outlines on what I thought about them

  • Harish Mallipeddi has a blog post entitled Measuring efficiency of tagging with Entropy links to the paper Understanding Navigability of Social Tagging Systems by Ed Chi and Todd Mytkowicz of Xerox Parc which excerpts the key findings from the paper. One result of their research which seems obvious in hindsight and shows one of the issues that social software has to deal with as its community of users grows was

    The way he does that is to measure entropy (yup that same old same old Claude Shannon’s information theory which you learned in one of the CS courses) of entities like documents (D), users (U) and tags (T). His research group crawled the entire del.icio.us archive and then calculated the entropies. Here’s what they found:

    • H(D|T) specifies the social navigation efficiency. How efficient is it for us to specify a set of tags to find a set of specific documents? We found that in del.icio.us that it is getting less and less efficient.

    This makes sense when you think about it. Let's say the first set of users of del.icio.us came from a homogenous software development background and started applying the tag "xml" to mean items about the eXtensible Markup Language. Later on as the community grew, a number of gamers joined the site and they now use the tag "xml" to refer to items about the game X-Men Legends. Now if you are one of the original geek users of the site, the URL http://del.icio.us/tag/xml no longer is just about markup languages but also about video games. To actually find items strictly about the eXtensible Markup Language you may have to add other tags as refinements such as http://del.icio.us/tag/xml+programming.

    What this means is that to the oldest users of the site, the quality of the tagging system will seem to degrade over time even though this is a natural consequence of growth and diversifying its user base. Of course, this is only a problem if a lot of people use del.icio.us to find all items about a topic (i.e. browsing by tags) as opposed to just storing their individual bookmarks or subscribing to the bookmarks of people they know and trust.

  • It seems Google announced some sort of Microsoft Office killer last week. You can read Don Dodge's Why Microsoft will not fall into the Innovators Dilemma and Robert Scoble's Microsoft has no innovator’s dillema? for two conflicting opinions on how this affects Microsoft. Personally, I think I've overdosed on the amount of times I've read the words innovator's dilemma in association with this announcement while catching up on email and blogs. What is funny about this situation is that almost everyone I've seen who throws the term around doesn't seem to have read the book. It is quite interesting to see Don Dodge write sentences like

    Microsoft will do everything possible to preserve these businesses while transitioning to the new Live strategy.
    and then follow that up with "No Innovators Dilemma here" without seeing the obvious contradiction in his words. Lots of  doublethink at work it seems.

    A side effect of reading this set of blog posts is that I found Don Dodge's Innovate or Imitate...Fame or Fortune? which praises being a fast follower as being more valuable than being an innovator. I've found that a lot of people at Microsoft point to past and recent successes such as XBox, Microsoft Office and Internet Explorer as proof that being a "fast follower" is the best strategy for Microsoft. There are three key problems with this kind of thinking

    1. It assumes your competitors are incompetent. This may have worked in the old days but with competitors like Google and Apple Inc, it isn't the case anymore.
    2. It requires that you have an ace up your sleeve that significantly one ups the competitors when you ship your knock off (e.g. integrating disparate applications into an Office Suite and pricing it lower than competitors, integrating product into the operating system, integrating a rich and social online experience into what was previously a solitary experience etc).
    3. It ignores the fact that "first mover advantage" is actually true for applications that have network effects which is definitely the case for social software which a lot of software has become today.

  • The "diversity in conferences" recurring debate was kicked off again by a blog post by Jason Kottke entitled Gender Diversity at Web Conferences which encouraged the interesting responses from folks like Eric Meyer, Anil Dash and Shelley Powers. They are all good posts with stuff I agree and disagree with in them but I wasn't moved to write until I read the post Why are smart people still stuck on gender and skin-color blinders? by Tantek Çelik where he wrote

    Why is it that gender (and less often race, nay, skin-color, see below) are the only physical characteristics that lots of otherwise smart people appear to chime in support for diversity of?

    E.g. as long as we are trying for greater diversity in superficial physical characteristics (superficial because what do such characteristics have to do with the stated directly relevant criteria of "technical expertise, speaking skills, professional stature, brand appropriateness, and marketability" - though perhaps I can see a tenuous link with "rainbow" marketing), why not ask about other such characteristics?

    Where are all the green-eyed folks?

    Where are all the folks with facial tattoos?

    Where are all the redheads?

    Where are the speakers with non-ear facial piercings?

    Surely such speakers would help with "hipness" marketing.

    I found this post to be disingenious and wondered how anybody could downplay the gender and racial bias in the "Web 2.0" technology conference scene by equating it to a preference for green eyed speakers. So I decided to throw in my $0.02 on this topic...again.

    After the last ETech, I realized I was seeing the same faces and hearing the same things over and over again. More importantly, I noticed that the demographics of the speaker lists for these conferences don't match the software industry as a whole let alone the users who we are supposed to be building the software for.

    There were lots of little bits of ignorance by the speakers and audience which added up in a way that rubbed me wrong. For example, at the 2005  Web 2.0 conference a lot of people were ignorant of Skype except as 'that startup that got a bunch of money from eBay'. Given that there are a significant amount of foreigners in the U.S. software industry who use Skype to keep in touch with folks back home, it was surprising to see so much ignorance about it at a supposedly leading edge technology conference. The same thing goes for how suprised people were by how teenagers used the Web and computers. Additionally, there are just as many women using social software such as photo sharing, instant messaging, social networking, etc as men yet you rarely see their perspectives presented at any of these conferences. 

    When I think of diversity, I expect diversity of perspectives. People's perspectives are often shaped by their background and experiences. When you have a conference about an industry which is filled with people of diverse backgrounds building software for people of diverse backgrounds, it is a disservice to have the conversation and perspectives be homogenous. The software industry isn't just young white males in their mid-20s to mid-30s nor is that the primary demographic of Web users.

    Personally, I've gotten tired of attending conferences where we heard more about technologies and sites that the homogenous demographic of young to middle aged, white, male computer geeks find interesting (e.g. del.icio.us and tagging) and less about what Web users actually use regularly or find interesting (hint: it isn't del.icio.us and it sure as fuck isn't tagging).