Last month Clemens Vasters wrote a blog post entitled Autonomy isn't Autonomy - and a few words about Caching where he talks about "autonomous" services and data caching. He wrote

A question that is raised quite often in the context of "SOA" is that of how to deal with data.  Specifically, people are increasingly interested in (and concerned about) appropriate caching strategies
...
By autonomous computing principles the left shape of the service is "correct". The service is fully autonomous and protects its state. That’s a model that’s strictly following the Fiefdoms/Emissaries idea that Pat Helland formulated a few years back. Very many applications look like the shape on the right. There are a number of services sticking up that share a common backend store. That’s not following autonomous computing principles. However, if you look across the top, you'll see that the endpoints (different colors, different contracts) look precisely alike from the outside for both pillars. That’s the split: Autonomous computing talks very much about how things are supposed to look behind your service boundary (which is not and should not be anyone’s business but yours) and service orientation really talks about you being able to hide any kind of such architectural decision between a loosely coupled network edge. The two ideas compose well, but they are not the same, at all.

..
However, I digress. Coming back to the data management issue, it’s clear that a stringent autonomous computing design introduces quite a few challenges in terms of data management. Data consolidation across separate stores for the purposes of reporting requires quite a bit of special consideration and so does caching of data. When the data for a system is dispersed across a variety of stores and comes together only through service channels without the ability to freely query across the data stores and those services are potentially “far” away in terms of bandwidth and latency, data management becomes considerably more difficult than in a monolithic app with a single store. However, this added complexity is a function of choosing to make the service architecture follow autonomous computing principles, not one of how to shape the service edge and whether you use service orientation principles to implement it.
...
Generally, my advice with respect to data management in distributed systems is to handle all data explicitly as part of the application code and not hide data management in some obscure interception layer. There are a lot of approaches that attempt to hide complex caching scenarios away from application programmers by introducing caching magic on the call/message path. That is a reasonable thing to do, if the goal is to optimize message traffic and the granularity that that gives you is acceptable. I had a scenario where that was a just the right fit in one of my last newtelligence projects. Be that as it may, proper data management, caching included, is somewhat like the holy grail of distributed computing and unless people know what they’re doing, it’s dangerous to try to hide it away.

That said, I believe that it is worth a thought to make caching a first-class consideration in any distributed system where data flows across boundaries. If it’s known at the data source that a particular record or set of records won’t be updated until 1200h tomorrow (many banks, for instance, still do accounting batch runs just once or twice daily) then it is helpful to flow that information alongside the data to allow any receiver determine the caching strategy for the particular data item(s).

Service autonomy is one topic where I still have difficulty in striking the right balance. In an ideal SOA world, you have a mesh of interconnected services which depend on each other to perform their set tasks. The problem with this SOA ideal is that it introduces dependencies. If you are building an online service, dependencies mean that sometimes you'll be woken up by your pager at 3AM in the morning and it's somebody else's fault not yours. This may encourage people who build services to shun dependencies and build self-contained web applications which reinvent the wheel instead of utilizing external services. I'm still trying to decide if this is a bad thing or not.

As for Clemens' comments on caching and services, I find it interesting how even WS-* gurus inadvertently end up articulating the virtues of HTTP's design and the REST architectural style when talking about best practices for building services. I wonder if we will one day see WS-* equivalents of ETags and If-Modified-Since. WS-Caching anyone? :)


 

Categories: XML Web Services

I was chatting with Kurt Weber yesterday and asked when Windows Live Expo would be getting out of beta. He asked me to check out the team blog later in the day and when I did I saw his blog post entitled Official U.S. Launch of Windows Live Expo. It turns out that yesterday was launch day and below is an excerpt of his blog post describing some of the new features for the launch

 Some of the new features for our latest release include:
  • New Look - A brand new look & feel for the site which includes the official Windows Live look and integration, accessibility, scaling, and easier to use.
  • Comments on a listing – Similar to comments on a blog; this feature will allow users to discuss issues in the soapbox area or ask the seller for more details about an item.
  • APIs – Developers can now access all of our listings using a variety of parameters in order to create cool mash-ups (such as http://www.blockrocker.com). Full details about the API are available at http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnlive/html/winliveexpo.asp
  • Driving directions – Users can now easily get driving directions to whatever listing they are viewing (courtesy of our friends at Live Local) by simply clicking a button.

For those keeping score, Expo is the fourth fifth Windows Live service to come out of beta.

Update: Thanks to Szajd for reminding me that there have been five Windows Live services to come out of beta; (Windows Live OneCare, Windows Live Favorites, Windows Live Messenger, Windows Live Custom Domains and Windows Live Expo).
 

Categories: Windows Live

Now that a bunch of Windows Live services are coming out of beta (e.g. Windows Live Messenger, Windows Live Favorites) and a couple more MSN properties are about to make the switch (e.g. MSN Spaces to Windows Live Space) there has begun to be a bit more marketing effort being done around Windows Live. The marketing teams have created a number of websites that explain the value proposition of Windows Live and take you behind the scenes. Check them out

  1. discoverspaces.live.com: This website gives a preview of Windows Live Spaces including some new features such as the Friends list.

  2. inside.live.com: Interviews with members of Windows Live product teams like Leah PearlMan (Windows Live Messenger) and Reeves Little. (Windows Live Mail).

  3. wire.live.com: An aggregation of news stories, blog posts and message board postings about Windows Live. Think of it as Microsoft Presspass on crack.

  4. experience.live.com: This site aggregates the above sites and has place holders for a couple of other upcoming promotional sites about Windows Live.
This is pretty hot, for once I have to say our marketing guys are kicking ass.
 

Categories: Windows Live

Tim O'Reilly has a blog post entitled Operations: The New Secret Sauce where he summarizes an interview he had with Debra Chrapaty, the VP of Operations for Windows Live. He writes

People talk about "cloud storage" but Debra points out that that means servers somewhere, hundreds of thousands of them, with good access to power, cooling, and bandwidth. She describes how her "strategic locations group" has a "heatmap" rating locations by their access to all these key limiting factors, and how they are locking up key locations and favorable power and bandwidth deals. And as in other areas of real estate, getting the good locations first can matter a lot. She points out, for example, that her cost of power at her Quincy, WA data center, soon to go online, is 1.9 cents per kwh, versus about 8 cents in CA. And she says, "I've learned that when you multiply a small number by a big number, the small number turns into a big number." Once Web 2.0 becomes the norm, the current demands are only a small foretaste of what's to come. For that matter, even server procurement is "not pretty" and there will be economies of scale that accrue to the big players. Her belief is that there's going to be a tipping point in Web 2.0 where the operational environment will be a key differentiator
...
Internet-scale applications are really the ones that push the envelope with regard not only to performance but also to deployment and management tools. And the Windows Live team works closely with the Windows Server group to take their bleeding edge learning back into the enterprise products. By contrast, one might ask, where is the similar feedback loop from sites like Google and Yahoo! back into Linux or FreeBSD?

This is one of those topics I've been wanting to blog about for a while. I think somewhere along the line at MSN Windows Live we realized there was more bang for the buck optimizing some of our operations characteristics such as power consumption per server, increasing the number of servers per data center, reducing cost per server, etc than whatever improvements we could make in code or via database optimizations. Additionally, it's also been quite eye opening how much stuff we had to roll on our own which isn't just standard parts of a "platform". I remember talking to a coworker about all the changes we were making so that MSN Spaces could be deployed in multiple data centers and he asked why we didn't get this for free from "the platform". I jokingly responded "It isn't like the .NET Framework has a RouteThisUserToTheRightDataCenterBasedOnTheirGeographicalLocation() API does it?".

I now also give mad props to some of our competitors for what used to seem like quirkiness that now is clearly a great deal of operational savviness. There is a reason why Google builds their own servers, when I read things like "One-third of the electricity running through a typical power supply leaks out as heat" I get quite upset and now see it as totally reasonable to build your own power supplies to get around such waste. Unfortunately, there doesn't seem to be a lot of knowledge out there about the building and managing a large scale, globally distributed server infrastructure. However we are feeding a lot of our learnings back to the folks building enterprise products at Microsoft (e.g. our team now collaborates a lot with the Windows Communication Foundation team) as Debra states which is great for developers building on Microsoft platforms. 


 

About two weeks ago, Greg Reinacker wrote about NewsGator's past, present and future in two blog posts entitled NewsGator platform roadmap - Part I (a look back) and NewsGator platform roadmap - Part II (a look forward). The blog posts are a good look at the achievements of a company that has gone from a one-man shop building an RSS reading plugin for Outlook into being the dominant syndication platform company on almost any platform from Windows & Mac to the Web & mobile phones. If you are interested in XML syndication, then Greg's posts are bookmark-worthy since they describe the future plans of a company that probably has the best minds building RSS/Atom applications working there today. Below are some excerpts from his posts in my areas of interest

NewsGator Online

As I said 16 months ago, the proposed feature list is long and distinguished - and it still is.  There is so much to do here...some of the short-term planned additions range from more interactive feed discovery mechanisms (based on the larger community of users and their subscriptions), to completely different user interface paradigms (where a user could potentially select from different options, each catering to a different kind of user).

A larger initiative is around the whole paradigm. Techies aside, users don't want to think about feeds, and subscriptions, and searching for content...Given all that, we're really rethinking the way we present information to the user, and the way users discover new information.  We're designing ways for people to participate in a larger community if they wish, and get more value out of the content they consume, at the point they discover it.  While we all have our own set of feeds, and we all participate to some extent in the larger ecosystem, there is a lot of potential in linking people with similar interests to each other.  Some users will continue to use our system as they always have - and others will use it in completely different ways.  We're testing a couple of approaches on this right now - I think it's truly a game-changer.

NewsGator Inbox, FeedDemon, NetNewsWire

As I mentioned before, the enthusiasm around these products has continued to grow - people obviously see the value in a rich, synchronized, offline-capable user experience for consuming content.  Moving forward, online integration will get tighter, and more complete - ranging from the low hanging fruit like FeedDemon "News Bins" becoming Clippings (and thus synchronize with the entire platform), to more involved features like analytics-related features (recommendations, interest-based surfacing, etc.) and community-related features.
...
NewsGator core platform

This is the heart of our entire product line (with the exception of NewsGator Enterprise Server).  Moving forward, we're investing a lot in the platform.  We're building out more support for deep analytics (which we can use to deliver different kinds of user experience), and building out a much deeper metadata engine (which means if a client retrieves content from our system, they'll get much richer data than they otherwise would).  We'll have other ways to "slice" our data to get what you need, without having to subscribe to hundreds of feeds.

The API has been very successful, and we process millions of API calls per day from client applications, web services, and private label clients.  This traffic actually makes up a large percentage of our overall system traffic - which I think is a testament to the popularity and utility of the API.  Moving forward here, we're obviously very committed to the API story, and we'll continue to enhance it as we add platform capabilities.

There's lots of good stuff here. The first thing that pops out at me is that while a bunch of startups these days tend to proclaim the death of desktop software, NewsGator is actually seeing the best of both worlds and improving the quality of the desktop experience by harnessing a Web-based platform. It's not Web-based software replacing desktop software, it's desktop software becoming better by working in tandem with APIs and applications on the Web. When Ray Ozzie talks about "live software", NewsGator is the company that leaps most readily to my mind.

I like the idea of making discovery of new content more of a social experience. It'd be interesting to see what would happen if NewsGator Online had a del.icio.us-inspired interface for browsing and subscribing to people's feeds. I notice that Gordon Weakliem who works on the NewsGator API recently wrote a post entitled Needles in Haystacks where he talks about serendipitous discovery of new websites by browsing bookmarks of people with similar interests to him in del.icio.us. I'm sure it's just a matter of time before NewsGator adds these features to their platform.

I also like the idea of exposing richer metadata in the NewsGator API especially if it relates to the social features that they plan to unveil in the next couple of months. Unfortunately, I've never been able to get the NewsGator API to work quite right with RSS Bandit but I'll be revisiting that code later in the summer.


 

Since my girlfriend has kids, I spend a lot more time around kids than I expected to at this age. One of the things I've realized is that I'll probably end up as one of those dads that shows strangers his baby pictures. Since I don't have baby pictures to show y'all, you get the next best thing

  1. Scene: On Our Way To Dinner

    Kids: What Does Your Shirt Say?

    Me: I Only Date Crack Whores [see the T-shirt here]

    Kids: Mommy Isn't A Crack Whore.

    Me: I'll Go Change My Shirt

    This explains why my girlfriend made me throw out my I don't have a girlfriend. But I do know a woman who'd be mad at me for saying that T-shirt. I'm guessing she forgot about this one.

  2. Scene: Playing Video Games with one of Their Friends

    Me: I'm too old to play games with you guys

    Kids: You're not old, you're only 28.

    Kids Friend: You're 28? My mom is 28 and she likes black guys. You should marry my mom.

    Me to girlfriend: Should I tell her mom she said that?

    My Girlfriend: No. Dummy!


 

Categories: Personal

Greg Linden has a blog post entitled Yahoo building a Google FS clone? where he writes

The Hadoop open source project is building a clone of the powerful Google cluster tools Google File System and MapReduce.

I was curious to see how much Yahoo appears to be involved in Hadoop. Doug Cutting, the primary developer of Lucene, Nutch, and Hadoop, is now working for Yahoo but, at the time, that hiring was described as supporting an independent open source project.

Digging further, it seems Yahoo's role is more complicated. Browsing through the Hadoop developers mailing list, I can see that more than a dozen people from Yahoo appear to be involved in Hadoop. In some cases, the involvement is deep. One of the Yahoo developers, Konstantin Shvachko, produced a detailed requirement document for Hadoop. The document appears to lay out what Yahoo needs from Hadoop, including such tidbits as handling 10k+ nodes, 100k simultaneous clients, and 10 petabytes in a cluster.

Also noteworthy is Eric Baldeschwieler, a director of software development at Yahoo, who recently talked about direct support from Yahoo for Hadoop. Eric said, "How we are going to establish a testing / validation regime that will support innovation ... We'll be happy to help staff / fund such a testing policy."

I find this effort by Yahoo! to be rather interesting given that platform pieces like GFS, BigTable, MapReduce and Sawzall give Google quite the edge in building mega-scale services and in Greg Linden's words are 'major force multipliers' that enable them to pump out new online services at a rapid pace. I'd expect Google's competitors to build similar systems and keep them close to their chest not give them away. I suspect that the reason Yahoo! is going this route is that they don't have enough folks to build this in-house and have thus collaborated with Hadoop project to get some help. This could potentially backfire since there is nothing stopping small or large competitors from reusing their efforts especially if it uses a traditional Open Source license.

On a related note, Greg also posted a link to an article by David F. Carr entitled How Google Works which has the following interesting quote

Google has a split personality when it comes to questions about its back-end systems. To the media, its answer is, "Sorry, we don't talk about our infrastructure."

Yet, Google engineers crack the door open wider when addressing computer science audiences, such as rooms full of graduate students whom it is interested in recruiting.

As a result, sources for this story included technical presentations available from the University of Washington Web site, as well as other technical conference presentations, and papers published by Google's research arm, Google Labs.

I do think it is cool that Google developers publish so much about the stuff they are working on. One of the things I miss from being on the XML team at Microsoft is being around people with a culture of publishing research like Erik Meijer and Michael Rys. I even got a research paper on XML query languages published while on the team. I'd definitely would like to publish research quality papers on some of the stuff I'm working on now. I've done MSDN articles and a ThinkWeek paper in the past few years, it's probably about time I start thinking about writing a research paper again. 

PS: If you work on online services and you don't read Greg Linden's blog, you are missing out. Subscribed. 


 

Over the weekend, I had a few hours to spend and finally added comment watching to RSS Bandit. The feature is pretty straightforward, users have the ability to mark an item as 'Watched'. Once in this state, an indication is made when there are new comments for that item. Determining whether there are new comments uses a number of mechanisms including polling the comment feed and checking the values of RSS/Atom extensions such as slash:comments and thr:count. I'm already getting a lot of use out of the feature to passively notify me of new comments to my blog

The only issue now is that there is a disagreement between Torsten and I as to what the menu interaction should be for the feature. I've currently implemented the menu option a submenu where you can select 'Watch Comments->On' or 'Watch Comments->Off' depending on whether comments are currently being watched for that item or not. See the screenshot below.

Torsten would prefer a menu option more like what Outlook Express where the menu option is a checkbox as shown in the OE screenshot below.

If you're an RSS Bandit user can you chime in with your opinion?


 

Categories: RSS Bandit

The Office team continues to impress me how savvy they are about the changing software landscape. In his blog post entitled Open XML Translator project announced (ODF support for Office) Brian Jones writes

Today we are announcing the creation of the Open XML Translator project that will help translate between the Office Open XML formats and the OpenDocument format. We've talked a lot about the value the Open XML formats bring, and one of them of course is the ability to filter it down into other formats. While we still aren't seeing a strong demand for ODF support from our corporate or consumer customers, it's now a bit different with governments. We've had some governments request that we help build solutions so that can use ODF for certain situations, so that's why we are creating the Open XML Translator project. I think it's going to be really beneficial to a number of folks and for a number of reasons.

There has been a push in Microsoft for better interoperability and this is another great step in that direction. We already have the PDF and XPS support for Office 2007 users that unfortunately had to be separated out of the product and instead offered as a free download. There will be a menu item in the Office applications that will point people to the downloads for XPS, PDF, and now ODF. So you'll have the ability to save to and open ODF files directly within Office (just like any other format).

For me, one of the really cool parts of this project is that it will be open source and located up on SourceForge, which means everyone will have the ability to see how to leverage the open architectures of both the Office Open XML formats and ODF. We're developing the tools with the help of Clever Age (based in France) and a few other folks like Aztecsoft (based in India) and Dialogika (based in Germany). There should actually be a prototype of the first translator (for Word 2007) posted up on SourceForge later on today (http://sourceforge.net/projects/odf-converter). It's going to be made available under the BSD license, and anyone can provide feedback, submit bugs, and of course directly contribute to the project. The Word tool should be available by the end of this year, with the Excel and PPT versions following in 2007.

This announcement is cool on so many levels. The coolest being that the projects will not only be Open Source but will be hosted on SourceForge. That is sweet. It is interesting to note that it is government customers and not businesses that are interested in ODF support in Office. I guess that makes sense if you consider which parties have been expressing interest in Open Office.

There already some great analyst responses to this move such as Stephen O'Grady of Redmonk who in his post Microsoft Office to Support ODF: The Q&A has some great insights. My favorite insight is excerpted below

Q: How about Microsoft's competitors?
A: Well, this is a bittersweet moment for them. For those like Corel that have eschewed ODF support, it's a matter of minor importance - at least until Microsoft is able to compete in public sector markets that mandate ODF and they are not.

But for those vendors that have touted ODF support as a diffentiator, this is a good news/bad news deal. The good news is that they can and almost certainly will point to Microsoft's support as validation of further ODF traction and momentum, they will now be competing - at least in theory, remember the limitation - with an Office suite that is frankly the most capable on the market. I've said for years that packages like OpenOffice.org are more than good enough for the majority of users, and that's been validated by our own usage of the product over the past few years; but Microsoft's suite is better than good enough. I'm interested to see if there's any fallout from the UI overhaul, but for now Office remains the undisputed champ of the Office arena. This means that commercial packages like StarOffice and Workplace, not to mention open source projects such as Abiword, KOffice, and OpenOffice.org will have to compete more on features and innovation and less on their support for formats such as ODF or PDF.

It'll be good to see the debate migrate away from support for file formats back to exactly which product's features provides the best value for customers. Everybody wins. Mad props to the Office team for making this decision. Rock on.


 

Categories: XML

Dave Winer has a blog post where he responds to a post entitled SOAP, REST and XML-RPC by Randy Charles Morin. He writes

I wonder if it's be possible for me to disagree with Randy Morin without getting flamed. I never said XML-RPC is better than SOAP or REST, or more perfect or pure, or better documented. I don't care if the others have better websites, or more advocates posting on mail lists. The reason I advise would-be platform developers to support XML-RPC is because at least for some developers (including me) it's so much faster to implement, so we spend less time creating glue and get to building applications sooner. I've learned that the sooner developers get to the fun part, the more likely they are to deploy. And if that's the goal, why not support it? BTW, I never said they shouldn't support SOAP or REST, in fact I often provide multiple interfaces to my would-be platforms, because I've learned that if you want uptake for new ideas, you shouldn't argue over small things like this, you should say yes whenever you can.

I agree 100% with Dave Winer. If you are building a service on the Web, then you shouldn't discriminate against any platform, application or device. This means you can't pick one approach or one technology for building your service because different platforms have different levels of support for various approaches. A developer using Visual Studio will find using SOAP easier then REST or XML-RPC while on the flip side a developer using Python or Perl is likely more at home dealing with XML-RPC than using SOAP. Choosing one technology over the other is choosing to discriminate against one platform or set of developers over the other.

In some cases this is necessary to keep maintenance costs down by supporting a small set of protocols but in general if you are building a service on the Web, you want it to be inclusive not exclusive. Arguments of technological superiority be damned.


 

Categories: XML Web Services