[erlang-questions] rpc is bad? (was Re: facebook chat server)

Thu May 22 18:55:06 CEST 2008

On 5/22/08, Ulf Wiger (TN/EAB) <ulf.wiger@REDACTED> wrote:
> Steve Vinoski skrev:
> >
> > What all those years of CORBA taught me, BTW, is that RPC,
> >
>  > for a number of reasons, is generally A Really Bad Idea.
>  > Call it a hard-won lesson. The Erlang flavor of RPC is
>  > great because the entire Erlang system has distribution
>  > fundamentally designed and built into it, but for normal
>  > languages, RPC creates more problems than it solves.
>
>  This is certainly something that I'd love to see you expand
>  in a blog article. Specifically, I'd like to hear your views
>  on what the better alternative would be.
>  (No touch of sarcasm - I'm genuinely interested.)

Well, if you had time you could dig through my various IEEE Internet
Computing columns from the past 6 years and find many reasons listed
there. For example, "RPC Under Fire" from the Sep/Oct 2005 lists a
number of problems:

<http://steve.vinoski.net/pdf/IEEE-RPC_Under_Fire.pdf> (note that it's PDF)

Also, pretty much any of my columns that cover REST to any degree
would also contain mentions of RPC's shortcomings. All the columns can
be found here:

<http://steve.vinoski.net/blog/internet-computing-columns/>

But if you don't have the time or energy, the fundamental problem is
that RPC tries to make a distributed invocation look like a local one.
This can't work because the failure modes in distributed systems are
quite different from those in local systems, so you find yourself
having to introduce more and more infrastructure that tries to hide
all the hard details and problems that lurk beneath. That's how we got
Apollo NCS and Sun RPC and DCE and CORBA and DSOM and DCOM and EJB and
SOAP and JAX-RPC, to name a few off the top of my head, each better
than what came before in some ways but worse in other ways, especially
footprint and complexity. But it's all for naught because no amount of
infrastructure can ever hide those problems of distribution. Network
partitions are real, timeouts are real, remote host and service
crashes are real, the need for piecemeal system upgrade and handling
version differences between systems is real, etc. The distributed
systems programmer *must* deal with these and other issues because
they affect different applications very differently; no amount of
hiding or abstraction can make these problems disappear. As I said
about such systems in a recent column:

"The layers of complexity required to maintain the resulting leaky
illusion of local/remote transparency are reminiscent of the
convoluted equations that pre-Copernican astronomers used to explain
how the Sun and other planets revolved around the Earth." (from
"Serendipitous Reuse"
<http://dsonline.computer.org/portal/pages/dsonline/2008/02/w1tow.html>)

RPC systems in C++, Java, etc. also tend to introduce higher degrees
of coupling than one would like in a distributed system. Typically you
have some sort of IDL that's used to generate stubs/proxies/skeletons
-- code that turns the local calls into remote ones, which nobody
wants to write or maintain by hand. The IDL is often simple, but the
generated code is usually not. That code is normally compiled into
each app in the system. Change the IDL and you have to regenerate the
code, recompile it, and then retest and redeploy your apps, and you
typically have to do that atomically, either all apps or none, because
versioning is not accounted for. In an already-deployed production
system, it can be pretty hard to do atomic upgrades across the entire
system. Overall, this approach makes for brittle, tightly-coupled
systems.

Such systems also have problems with impedance mismatch between the
IDL and whatever languages you're translating it to. If the IDL is
minimal so that it can be used with a wide variety of programming
languages, it means advanced features of well-stocked languages like
Java and C++ can't be used. OTOH if you make the IDL more powerful so
that it's closer to such languages, then translating it to C or other
more basic languages becomes quite difficult. On top of all that, no
matter how you design the IDL type system, all the types won't --
indeed, can't -- map cleanly into every desired programming language.
This turns into the need for non-idiomatic programming in one or more
of the supported languages, and developers using those languages tend
to complain about that. If you turn the whole process around by using
a programming language like Java for your RPC IDL in an attempt to
avoid the mismatch problems, you find it works only for that language,
and that translating that language into other languages is quite
difficult.

There's also the need with these systems to have the same or similar
infrastructure on both ends of the wire. Earlier posters to this
thread complained about this, for example, when they mentioned having
to have CORBA ORBs underneath all their participating applications. If
you can't get the exact same infrastructure under all endpoints, then
you need to use interoperable infrastructure, which obviously relies
on interoperability standards. These, unfortunately, are often
problematic as well. CORBA interoperability, for example, eventually
became pretty good, but it took about a decade. CORBA started out with
no interoperability protocol at all (in fact, it originally specified
no network protocol at all), and then we suffered with interop
problems for a few years once IIOP came along and both the protocol
itself and implementations of it matured.

Ultimately, RPC is a leaky abstraction. It can't hide what it tries to
hide, and because of that, it can easily make the overall problem more
difficult to deal with by adding a lot of accidental complexity.

In my previous message I specifically mentioned Erlang as having
gotten it right. I believe that to be true not only because the
handling of distribution is effectively built in and dealt with
directly, but also because Erlang makes no attempt to hide those hard
problems from the developer. Rather, it makes them known to the
developer by providing facilities for dealing with timeouts, failures,
versioning, etc. I think what Erlang gives us goes a very long way and
is well beyond anything I've experienced before. Erlang really doesn't
provide RPC according to the strict definition of the term, BTW,
because remote calls don't actually look like local ones.

(BTW, this is the kind of stuff I'll be talking about at Erlang
eXchange next month.)

>  One of these days, I'll try to write some understandable
>  documentation for my VCC code (version-controlled channels,
>  http://svn.ulf.wiger.net/vcc/trunk/vcc/SPECIFICATION.txt).
>
>  This is basically a more complex version of Joe's lib_chan,
>  with support for automatic transformation of data between
>  nodes running different versions, and at least the
>  ambition to also support non-Erlang peers.
>
>  The non-Erlang allowances are a 32-bit aligned framing protocol
>  per-channel encode/decode and an XML-based meta protocol.
>
>  For Erlang-to-Erlang, it could be used to cleanly separate
>  communication for different applications, and could offer
>  a reliable way to perform redundancy upgrade, install
>  contract checkers, etc.
>
>  The channels have asynchronous semantics, but - partly inspired
>  by Haskell - you can send a message on a temporary channel
>  (using an existing channel as template), and optionally have
>  it automatically removed when the other side has sent a
>  reply.
>
>  I'd love to receive feedback, or even better: volunteers to
>  help finish it. It's been gathering dust for a while, but
>  I keep coming back to the conclusion that it would be very
>  useful.

That sounds pretty interesting. Have you looked at BEEP (Blocks
Extensible Exchange Protocol, <http://beepcore.org/>) at all? Some of
what you describe above sounds similar, on the surface at least, to
what BEEP does, and even uses similar terminology.

--steve