In the past year both Google and Facebook have released the remote procedure call (RPC) technologies that are used for communication between servers within their data centers as Open Source projects. 

Facebook Thrift allows you to define data types and service interfaces in a simple definition file. Taking that file as input, the compiler generates code to be used to easily build RPC clients and servers that communicate seamlessly across programming languages. It supports the following programming languages; C++, Java, Python, PHP and Ruby.

Google Protocol Buffers allows you to define data types and service interfaces in a simple definition file. Taking that file as input, the compiler generates code to be used to easily build RPC clients and servers that communicate seamlessly across programming languages. It supports the following programming languages; C++, Java and Python.

That’s interesting. Didn’t Steve Vinoski recently claim that RPC and it's descendants are "fundamentally flawed"? If so, why are Google and Facebook not only using RPC but proud enough of their usage of yet another distributed object RPC technology based on binary protocols that they are Open Sourcing them? Didn’t they get the memo that everyone is now on the REST + JSON/XML bandwagon (preferrably AtomPub)?

In truth, Google is on the REST + XML band wagon. Google has the Google Data APIs (GData) which is a consistent set of RESTful APIs for accessing data from Google's services based on the Atom Publishing Protocol aka RFC 5023. And even Facebook has a set of plain old XML over HTTP APIs (POX/HTTP) which they incorrectly refer to as the Facebook REST API.

So what is the story here?

It is all about coupling and how much control you have over the distributed end points. On the Web where you have little to no control over who talks to your servers or what technology they use, you want to utilize flexible technologies that make no assumptions about either end of the communication. This is where RESTful XML-based Web services shine. However when you have tight control over the service end points (e.g. if they are all your servers running in your data center) then you can use more optimized communications technologies that add a layer of tight coupling to your system. An example of the kind of tight coupling you have to live with is that  Facebook Thrift requires specific versions of g++ and Java if you plan to talk to it using code written in either language and you can’t talk to it from a service written in C#.

In general, the Web is about openness and loose coupling. Binary protocols that require specific programming languages and runtimes are the exact opposite of this. However inside your Web service where you control both ends of the pipe, you can optimize the interaction between your services and simplify development by going with a binary RPC based technology. More than likely different parts of your system are already doing this anyway (e.g. memcached uses a binary protocol to talk between cache instances, SQL Server uses TDS as the communications protocol between the database and it's clients, etc).

Always remember to use the right tool for the job. One size doesn’t fit all when it comes to technology decisions.

FURTHER READING

  • Exposing a WCF Service With Multiple Bindings and Endpoints – Keith Elder describes how Windows Communication Foundation (WCF) supports multiple bindings that enable developers to expose their services in a variety of ways.  A developer can create a service once and then expose it to support net.tcp:// or http:// and various versions of http:// (Soap1.1, Soap1.2, WS*, JSON, etc).  This can be useful if a service crosses boundaries between the intranet and the Internet.

Now Playing: Pink - Family Portrait


 

Thursday, 10 July 2008 16:50:00 (GMT Daylight Time, UTC+01:00)
Actually, Thrift does support C# now. See http://svn.facebook.com/svnroot/thrift/trunk/lib/ for the full list of supported languages.

Thursday, 10 July 2008 17:23:12 (GMT Daylight Time, UTC+01:00)
However when you have tight control over the service end points (e.g. if they are all your servers running in your data center) then you can use more optimized communications technologies that add a layer of tight coupling to your system.


This is just flat out wrong. What about when those data centers are geographically distributed in different countries with different managers etc... Do you expect them all to be upgraded simultaneously? This is just one of the reasons RPC and tight coupling sucks! As for why Facebook would publicize thrift? Their marketing people are not distributed systems designers maybe.

Also, while google protocol buffers may have support for code generation, it is mainly a serialization format similar to both ASN.1.
-Andrew
Thursday, 10 July 2008 17:32:23 (GMT Daylight Time, UTC+01:00)
Memcached instances do not talk to each other at all. The client figures out which of the servers in your memcached pool should have your information for a given key and talk to it directly. The servers are pretty "dumb".

But you're right in that the wire protocol between MC client and MC server is a simple text protocol that's not self-describing.

-Tim
Tim Gebhardt
Thursday, 10 July 2008 17:49:53 (GMT Daylight Time, UTC+01:00)

However when you have tight control over the service end points (e.g. if they are all your servers running in your data center) then you can use more optimized communications technologies that add a layer of tight coupling to your system.

This is just flat out wrong. What about when those data centers are geographically distributed in different countries with different managers etc... Do you expect them all to be upgraded simultaneously?

Then you don't have tight control over the service end points.
Alejandro Izaguirre
Thursday, 10 July 2008 18:04:36 (GMT Daylight Time, UTC+01:00)
However when you have tight control over the service end points (e.g. if they are all your servers running in your data center) then you can use more optimized communications technologies that add a layer of tight coupling to your system.
This is just flat out wrong. What about when those data centers are geographically distributed in different countries with different managers etc... Do you expect them all to be upgraded simultaneously?

Then you don't have tight control over the service end points.




Ok. But does Google really have tight control. Is that even a realistic assumption? If you have tight control, your service probably isn't very interesting to anyone but you and your company.
Thursday, 10 July 2008 18:05:30 (GMT Daylight Time, UTC+01:00)
However when you have tight control over the service end points (e.g. if they are all your servers running in your data center) then you can use more optimized communications technologies that add a layer of tight coupling to your system.


Protobufs are no more tightly coupled than XML-based systems. The only reason XML is more interoperable at the moment is because it is ubiquitous. From a purely technical standpoint, the level of coupling is identical.

With XML:
- you get a file in a known format (XML) which can be parsed with a generic parser
- the data therein is completely meaningless and useless without some application-specific documentation about what it means.

With Protobufs:
- you get a file in a known format (Protobuf) which can be parsed with a generic parser.
- the data therein is completely meaningless and useless without an application-specific .proto file and documentation for what each of the fields means.

Protobufs no more require a "specific language and runtime" than XML does: in both cases you need software capable of parsing the format. The only difference between the two is that with Protobufs, you need to grab a .proto file in addition to reading the documentation for how to interpret the fields.

It so happens that the existing implementation of Protobufs generates code in specific languages for performance reasons, but this is not a requirement -- it is easy to write a parser that does not generate any code.
Thursday, 10 July 2008 18:36:14 (GMT Daylight Time, UTC+01:00)
Agree w/Joshua. Except the final sentence, kinda. While it's not a stretch to write a Protobufs parser to read the encoded bytes into dynamic data structures, it's not really possible to GENERATE encoded bytes from a similar dynamic data structure, since you need to provide the 'field indices'. You could only get that from a 'schema' of some sort. ie, trivial to read protobufs and turn them into JSON (in some fashion), not really possible to go the other way though.
Friday, 11 July 2008 06:40:00 (GMT Daylight Time, UTC+01:00)
Protobufs aren't even close to being the revenge of CORBA, though. You're simply sending around packaged data. There's no method calls. It's all network-centric.

There's also explicit support for evolving object definitions over time.
Friday, 11 July 2008 09:59:04 (GMT Daylight Time, UTC+01:00)
A small clarification: Google's Protocol Buffer stuff optionally provides the ability to define service interfaces. One can stop short of that and merely use it to define message formats/protocol.

Andrew Stone said: "This is just flat out wrong. What about when those data centers are geographically distributed in different countries with different managers etc... Do you expect them all to be upgraded simultaneously? This is just one of the reasons RPC and tight coupling sucks! As for why Facebook would publicize thrift? Their marketing people are not distributed systems designers maybe."

No, Google doesn't expect to upgrade everything simultaneously but they've managed to work that into their RPC-stuff: http://citeseer.ist.psu.edu/759360.html
Comments are closed.