Biz Stone, Twitter’s , recently wrote in a blog post entitled The Replies Kerfuffle that
We removed a setting that 3% of all accounts had ever touched but for those folks it was beloved. … 97% of all accounts were not affected at all by this change—the default setting is that you only see replies by people you follow to people you follow. For the 3% who wanted to see replies to people they don't follow, we cannot turn this setting back on in its original form for technical reasons and we won't rebuild it exactly the same for product design reasons. … Even though only 3% of all Twitter accounts ever changed this setting away from the default, it was causing a strain and impacting other parts of the system. Every time someone wrote a reply Twitter had to check and see what each of their followers' reply setting was and then manifest that tweet accordingly in their timeline—this was the most expensive work the database was doing and it was causing other features to degrade which lead to SMS delays, inconsistencies in following, fluctuations in direct message counts, and more. Ideally, we would redesign and rebuild this feature but there was no time, hence the sudden deploy.
As someone whose day job is working on a system for distributing a user’s updates and activities to their social network in real-time across Web and desktop applications, I’m always interested in reading about the implementation choices of others who have built similar systems. However when it comes to Twitter I tend to be more confused than enlightened whenever something is revealed about their architecture.
So let’s look at what we know. When Ashton Kutcher posts an update on Twitter such as
it has to be delivered to all 1.75 million of his followers. On the other hand, when Ashton Kutcher posts an update directed to one of his celebrity friends such as
then Twitter needs to decide how to deliver it based on the Replies settings of users.
One option would to check each of the 1.75 million followers of aplusk’s setting to decide whether they need have @replies restricted to only people they are following. Since this will be true for 97% of his followers (i.e. 1.7 million people) then there would need to be a 1.7 million checks to see if the intended recipient are also friends of John Mayer before delivering the message to each of them. On the other hand, it would be pretty straightforward to deliver the message to the 3% of users who want to see all replies. Now this seems to be what Biz Stone is describing as how Twitter works but in that case the default setting should be more expensive than the feature that is only used by a minority of their user base.
In that case I’d expect Twitter to argue that the feature they want to remove for engineering reasons is filtering out some of the tweets you see based on whether you are a follower of the person the message is directed to not the other way around.
What have I missed here?
Update: A comment on Hacker News put me on track to what I probably missed in analyzing this problem. In the above example, if the default case was the only case they had to support then all they have to do to determine who should receive Ashton Kutcher’s reply to John Meyer is perform an intersection of both user’s follower lists. Given that both lists need to be in memory for the system to be anywhere near responsive, performing the intersection isn’t as expensive as it sounds.
However with the fact that 3% of users will want to have received the update even though they aren’t John Mayer’s friends means Twitter needs to do a second pass over whoever was not found in both follower lists and check what their @reply delivery settings are. In the above example, even if every follower of John Mayer was a follower of Ashton Kutcher, it would still require 750,000 settings checks. Given that it sounds like they keep this setting in their database instead of in some sort of cache, it is no surprise that this is a feature they’d like to eliminate.