Extending Erlang

Sun Feb 23 08:23:47 CET 2003

A person doesn't read much email for a couple of days, and all sorts
of interesting discussions take place.

Vlad Dumitrescu said:

> Oh yes, I did read it. Even if EDTK makes thing simpler, it's still
> cumbersome. I feel the programming model is heavier than really
> needed. I might be wrong and maybe it can't be simpler than that.

Never being an Ericsson employee, I've had to learn much of Erlang's
history from this list, conversations at conferences, and looking at
the Erlang source and its evolution (at least since R6).  I put a
general history of port mechanism in my EDTK paper.  As for the port
mechanism's "heaviness"...

... You don't need to use all of the mechanisms that are there:
port_command/2, port_control/3, and port_call/3 on the Erlang side;
driver_output_term(), driver_output_binary(), driver_output(),
driver_outputv(), driver_output2(), driver_send_term(), and perhaps a
few others I missed.

It seems to me that a lot of it is a result of cruft in maintaining
backward compatibility.  The Trustees of the Erlang/OTP Sacred Flame
appear to be very reluctant to cause backward compatibility problems.
That usually translates to, "Don't yank features out."  There was a
similar (in my mind) discussion on this list not long ago about
records: not a great idea, but since they're in there, they're a pain
to remove.

With the exception of driver_send_term(), everything in the driver
interface appears geared to maintaining the view of communicating
*sequentially* with an external process.  As has been noted by many,
this isn't very fun for linked-in drivers.(*)

In defense of pipe drivers, there is a *lot* to be said for them.
Chanting the Erlang robustness mantra:

1. Full fault isolation from the VM.
2. Relatively fast communication mechanism (for most OSes, when
   compared to sockets ... I hear that pipes under Windows suck.
   Bummer.)
3. Full fault isolation from the VM.
4. The VM has full control over restarting a dead pipe driver process:
   just pipe(), fork() and exec().  Communicating remotely via TCP to
   another network host makes port startup more difficult.
5. Full fault isolation from the VM.

Having said all those things, if you want trade robustness for speed,
the existing serial interface is icky.  I haven't written a "real" BIF
since the R6 days, but it's nice (speed-wise) to be able to look at
the internal guts of an Erlang term as well as nice (logic-wise) to be
able to see that term's structure.  It's quite annoying (logic-wise
and speed-wise) to have to take a complex Erlang term, serialize it,
"send" it to the linked-in driver, then unserialize it.

Vlad Dumitrescu also said:

> Not to diminish the Erlang Driver Toolkit's value, but the two tools
> want to do the same thing, in principle. I don't think an IDL
> specification is too restrictive, but for this particular purpose we
> could tweak it to use Erlang types if needed.

IMHO, the OMG's IDL doesn't allow you to specify an interface with the
amount of detail that a reasonably-efficient Erlang <-> foreign
language interface requires.  If Erlang didn't have single-assignment
semantics, we could use a tool like SWIG with slight modification and
call the job done.

I intentionally threw a lot of stuff into the EDTK XML specification
because creating a foreign-language-interface-like driver for Erlang
seemed to require it.  I chose some weird libraries as my example
drivers because I wanted a wide variety of "real world" C code.  The
result was complex ... because it needed to be.  If I ever get off my
hobby-time butt to extend EDTK further, it will become a bit more
complex.  And I don't see many opportunities for simplification.(**)

James Hague:

> I think the interoperability tutorial does a great job of explaining
> things, as long as you're writing a vanilla driver.  It would be good
> to have a concrete example that involves port_control() and directly
> creating Erlang terms.

FWIW, EDTK-generated drivers only use port_command/2 on the Erlang
side because it has the greatest chance to avoid extra serialized data
copies inside the VM.  (The speed freak in me made me do it that
way.(***)) Similarly, they only use driver_output_term() on the C
side.

FWIW #2, my driver wishlist is:

1. No serialization for linked-in drivers: access to Erlang terms,
like real BIFs do.

2. The async thread pool currently in the VM is too naive: its
scheduling algorithm can block the VM when it's not supposed to, and
port shutdown when you have async threads doing stuff for the
to-be-closed port is just evil.

3. A flexible way to extend I/O-handling drivers, e.g. file_drv,
inet_drv.  Examples: adding compression & encryption to file_drv
operations, adding protocol handlers to inet_drv (and move all the
HTTP parsing stuff currently there into the new scheme).

-Scott

(*) The docs aren't usually clear about *how* to use these functions.
As James Hague and others point out, knowing *when* to use them isn't
as clear.

(**) I have very mixed feelings for the XML specification file format,
but that's not what I'm referring to.  Single assignment,
serialization (until serialization requirements truly are dropped),
linked-in driver thread management, custom error checking, "value
maps", and other EDTK specification knobs are either mandatory or too
useful to give up, IMHO.

(***) Perhaps I did it that way to atone for a driver I once wrote
that used three (!) UNIX processes:

     Erlang VM <-pipe-> Expect <-pipe-> Sendmail

... because I knew it would be quicker for me to parse "sendmail"'s
output using Expect and give nice pipe-style driver I/O to Erlang than
it would be to write a parser in Erlang to parse sendmail's output
directly.  Sad, no?