As a regular user of Twitter I've felt the waves of frustration wash over me these past couple of weeks as the service has been hit by one outage after another. This led me to start pondering the problem space [especially as it relates to what I'm currently working on at work] and deduce that the service must have some serious architectural flaws which have nothing to do with the reason usually thrown about by non-technical pundits (i.e. Ruby on Rails is to blame).

Some of my suspicions were confirmed by a recent post on the Twitter developer blog entitled  Twittering about architecture which contains the following excerpt

Twitter is, fundamentally, a messaging system. Twitter was not architected as a messaging system, however. For expediency's sake, Twitter was built with technologies and practices that are more appropriate to a content management system. Over the last year and a half we've tried to make our system behave like a messaging system as much as possible, but that's introduced a great deal of complexity and unpredictability. When we're in crisis mode, adding more instrumentation to help us navigate the web of interdependencies in our current architecture is often our primary recourse. This is, clearly, not optimal.

Our direction going forward is to replace our existing system, component-by-component, with parts that are designed from the ground up to meet the requirements that have emerged as Twitter has grown.

Given that Twitter has some unique requirements that would put to test a number of off-the-shelf and even custom messaging applications, it is shocking that it isn't even architected as a messaging system. This makes sense when you consider the background of the founders was a blogging tool and they originally intended Twitter to be a "micro" blogging service.

If Twitter was simply a micro-content publishing tool with push notifications for SMS and IM then the team wouldn't be faulted for designing it as a Content Management System (CMS). In that case you'd just need three data structures

  • a persistent store for each users tweets
  • a cache of their tweets in memory to improve read performance
  • a persistent list of [IM and SMS] end points subscribed to each users tweets and an asynchronous job (i.e. a daemon) which publishes to each users subscribers after each post

Unfortunately, Twitter isn't just a blogging tool that allows people to subscribe to my posts via SMS & IM instead of just RSS. It also has the notion of followers. That's when things get hairy. Isreal over at AssetBar had a great post about this entitled Twitter-proxy: Any Interest? where he wrote

Consider the messaging problem:

Nothing is as easy as it looks. When Robert Scoble writes a simple “I’m hanging out with…” message, Twitter has about two choices of how they can dispatch that message:

  1. PUSH the message to the queue’s of each of his 6,864 24,875 followers, or
  2. Wait for the 6,864 24,875 followers to log in, then PULL the message.

The trouble with #2 is that people like Robert also follow 6,800 21,146 people. And it’s unacceptable for him to login and then have to wait for the system to open records on 6,800 21,146 people (across multiple db shards), then sort the records by date and finally render the data. Users would be hating on the HUGE latency.

So, the twitter model is almost certainly #1. Robert’s message is copied (or pre-fetched) to 6,864 users, so when those users open their page/client, Scoble’s message is right there, waiting for them. The users are loving the speed, but Twitter is hating on the writes. All of the writes.

How many writes?

A 6000X multiplication factor:

Do you see a scaling problem with this scenario?

Scoble writes something–boom–6,800 21,146 writes are kicked off. 1 for each follower.

Michael Arrington replies–boom–another 6,600 17,918 writes.

Jason Calacanis jumps in –boom–another 6,500 25,972 writes.

Beyond the 19,900 65,036 writes, there’s a lot of additional overhead too. You have to hit a DB to figure out who the 19,900 65,036 followers are. Read, read, read. Then possibly hit another DB to find out which shard they live on. Read, read, read. Then you make a connection and write to that DB host, and on success, go back and mark the update as successful. Depending on the details of their messaging system, all the overhead of lookup and accounting could be an even bigger task than the 19,900 65,036 reads + 19,900 65,036 writes. Do you even want to think about the replication issues (multiply by 2 or 3)? Watch out for locking, too.

And here’s the kicker: that giant processing & delivery effort–possibly a combined 100K 130K disk IOs– was caused by 3 users, each just sending one, tiny, 140 char message. How innocent it all seemed.

Not only does Isreal's post accurately describes the problem with the logical model for Twitter's "followers" feature, it looks like he may have also nailed the details of their implementation which would explain the significant issues they've had scaling the site. The problem is that if you naively implement a design that simply reflects the problem statement then you will be in disk I/O hell. It won't matter if you are using Ruby on Rails, Cobol on Cogs, C++ or hand coded assembly, the read & write load will kill you.

This leads me to my new mantra which I've stolen from Jim Gray via Pat Helland; DISK IS THE NEW TAPE.

In addition, the fact that the Twitter folks decided not to cap the number of followers or following may have saved them from the kind of flames that Robert Scoble sends at Facebook for having a 5000 friend limit but it also means that they not only have to deal with supporting users with thousands of followers but also users that follow thousands of users [both of which would be optimized in completely different ways].  Clearly a lot of feature decisions have been made on that product without thought to how they impact the scalability of the service.

PS: If this problem space sounds interesting to you, we're hiring. I'm specifically looking for good operations folks. Shoot me a resume at dareo@msft.com - replace msft.com with microsoft.com

Now Playing: Rick Ross - Maybach Music (featuring Jay-Z)


 

Friday, 23 May 2008 05:13:57 (GMT Daylight Time, UTC+01:00)
Israel's post seems pretty kooky to me. It's almost certainly true that (1) when Scoble writes a message it is not copied to all his followers in real-time, instead the copy operation is placed on a background queue (2) the entire message isn't copied, just a reference and (3) you can further optimize by chunking message delivery that is using a combination of push and pull so that if a user subscribes to 20,000 topics you're not constantly sending messages to him one at a time. I've worked on messaging systems in finance that much more severe constraints than Twitter and it's common knowledge that you never touch disk or db unless you absolutely have to (and even then you can cheat and hire human auditors to balance the books at the end of the month if messages are lost). Frankly the entire situation just reflects the "get it done and toss up the webapp" culture of startups and that's the real problem. It's good that Twitter has finally bitten the bullet and just admitted they're rewriting from scratch so that's that.
Friday, 23 May 2008 05:32:44 (GMT Daylight Time, UTC+01:00)
Back in 2000, I was using this web app developed in part by a guy named Ev Williams (blogger.com, anyone?). As it grew in popularity, the site started slowing to a crawl, and frequently went down. It simply wasn't built to scale.

I forgave Ev back then. But to have the exact same issue with one of his other apps (twitter, this time) eight years later? I admit to being miffed. Figure it out, Ev. Build your apps to scale!

(...grumble grumble grumble...)
Friday, 23 May 2008 05:41:19 (GMT Daylight Time, UTC+01:00)
I think it's absolutely impossible to deduce exactly what the problems are with the Twitter architecture if you've never worked on it. And from the sound of it, only about half a dozen people in the world have worked on Twitter's code.

Anywho, Twitter doesn't rely solely on disk I/O. They use starling, an asynchronous messaging server:

http://dev.twitter.com/2008/01/announcing-starling.html

"It's fast, it's stable, it speaks the memcache protocol so it doesn't need a new client library, and it's disk-backed for persistence."

Of course, no one outside Twitter knows exactly how much the disk-backed persistence comes into play. All we can do as users of Twitter is either wait for them to get their act together, go somewhere else, or build our own thing.
Friday, 23 May 2008 05:42:02 (GMT Daylight Time, UTC+01:00)
I wanted to look at these numbers to see roughly how much writing each of the people I was following were doing, over time, and rank them (of course) so I produced a "spewage" report.

http://twitter.scripting.com/spewage.html

Enjoy!
Friday, 23 May 2008 09:18:08 (GMT Daylight Time, UTC+01:00)
I'm no architecture scaling expert but it seems to me like a decent replicated queuing mechanism is what is required. Multiple deamons across multiple machines pushing and popping to their heart's content.

The inherent latency of queues isn't a problem for something like Twitter.

Is the Live Spaces newsfeed built on a queueing architecture, Dare?

-Jamie
Friday, 23 May 2008 12:32:40 (GMT Daylight Time, UTC+01:00)
The problem sounds like the one Exchange has and has solved with email. Single-instance-storage per message store, transport queues, etc.

Maybe they ought to use ESE ;)
Oren Novotny
Friday, 23 May 2008 13:32:02 (GMT Daylight Time, UTC+01:00)
Assuming that speed of operation is most important to active twitterers, and assuming that this is only a small percentage of the userspace, would it help to use push (1) for the hardcore and pull (2) for the rest?
Saturday, 24 May 2008 11:07:36 (GMT Daylight Time, UTC+01:00)
Actually given that a user's attention span is limited, it may be strategic to have different user profiles, for instance

1) Followers who follow all the time
2) Followers who follow occassionally
3) Phantom users who follows everyone but is unlikely to read other's posts
Monday, 26 May 2008 20:14:19 (GMT Daylight Time, UTC+01:00)
Seems definitely like an scaling issue, which is one that hits most systems when they get to a certain size.

Not having seen the Architecture I'd imagine a multi-message queue system would be the most suitable. Having some form of central Directory which indicates where copies of a message need to be sent, then having dispatchers push updates out to their targets, with single instancing on each endpoint DB. You could then look at having some form of multiplexing front to serve out from whichever DB your account and associated tweets sit on.

The key would then be looking at how to best load balance the servers together. You'd need to do some form of load function and user migration between databases to create the most efficient message queue and storage solution. By looking at the clumps of users (I would guess that in the same way social networks have clusters of people) you could structure the DB's to group clumps of users together to minimise the cross server query and message dispatch. This would have the effect of reducing the number of writes to the DB's as the the clumps of users could be looked up in real time.

So when user x kicks off a message it only copies 1 message/write to each of the db servers that has some followers. any responses would again similarly only kick of 1 write back to the original server and other associated db instances. If you have 6000 or 21000 users you would then need the analysis / migration algorithms so as to cut down the interdb message load. By creating indexes on the correct fields in the DB you could easily get queries to return the last 20 or so messages that you have subscribed to. This would reduce the problem of real-time generating the display / RSS feeds on someone who follows everyone.
Dave Mc
Comments are closed.