[erlang-questions] Keeping massive concurrency when interfacing with C

Wed Oct 5 17:55:59 CEST 2011

 On Wed, 5 Oct 2011 09:24:19 +0200, Peer Stritzinger <peerst@REDACTED> 
 wrote:
> I have to admit I was not aware of this. OTOH it seems not to
> be available, can't find anything except the paper and EEP7
> which is the foreign function interface to the external number
> crunching libraries they invented.

 Hi, I'm one of the authors of the "HPTC with Erlang" work.
 You're right, nothing was publicly released except for the FFI
 implementation described in EEP 7 --- and since the project ended
 last year, I believe that nothing else will be released in the
 future.  The last bits were a (prototypal) NIF-based FFI
 implementation [1], together with a request to withdraw EEP 7
 (since it was clearly superseded by NIFs) [2].

   [1] http://muvara.org/hg/erlang-ffi/
   [2] http://erlang.org/pipermail/eeps/2010-July/000292.html

> The price you have to pay for the slapped on heavyweight
> library is that these usually don't scale up to the number of
> processes Erlang can handle.

 IMHO it mostly depends on:

     1. the size of the operands you're working on;

     2. the complexity of the foreign functions you're going to
        call.

 Our project was primarily focused on real-time numerical
 computing, and thus we needed a method for quickly calling
 "simple" numerical foreign functions (such as multiplications of
 relatively small (portions of) matrices).  Those functions, taken
 alone, would usually return almost immediately: in other words,
 their execution time was similar to that of regular BIFs.  We
 used BLAS because its optimized implementations are usually
 "fast enough", but (if necessary) we could have developed
 our own optimized C code.

 When more complicated formulas are assembled with repeated FFI
 calls to those simple functions, then the Erlang scheduler can
 kick in several times before the final result is obtained, thus
 guaranteeing VM responsiveness (albeit reducing the general
 numerical throughput).

> Keeping a pool of numerical processes to keep the cores busy
> but not too many of them that the OS is upset. Having work
> queues that adapt these to the 20k processes.

 If the native calls performed by those 20k Erlang processes are
 not "heavy" enough, then introducing work queues may actually
 increase the Erlang VM load and internal lock contention, thus
 decreasing responsiveness (wrt plain NIF calls).  I suspect that
 some comparative benchmarking could be useful.

> The suggested n-dim matrix type (e.g. a record containing the
> metadata and a binary for the data) combined with some NIFs on
> these that speed up the parts where Erlang is not so fast.
> Keeping in mind not to do too much work in the NIFs at one time
> not to block the scheduler.

 This is exactly what we did for interfacing BLAS and other
 numerical routines (except that we used our FFI, since NIFs were
 not yet available).

 Maybe a next-generation, general-pourpose numerical computing
 module for Erlang could adopt different strategies depending on
 the size of the operands passed to its functions:

   1. if the vectors/matrices are "small enough", then the native
      code could be called directly using NIFs;

   2. otherwise, the operands could be passed to a separate worker
      thread, which will later send back its result to the waiting
      Erlang process (using enif_send()).

 In the second case, the future NIF extensions planned by OTP
 folks may be very useful --- see Rickard Green's talk at the SF
 Bay Area Erlang Factory 2011: http://bit.ly/eH61tX

> For real heavy numerical stuff I think the best way is to do
> this in the systems are built for this and interface them
> somehow to erlang with ports or sockets.

 Sure, but the problem with this approach is that you may need to
 constantly (de)serialize and transfer large numerical arrays
 among the Erlang VM and the external number crunching systems,
 thus wasting processor cycles, and memory/network bandwidth.

 Regards,
-- 
 Alceste Scalas <alceste@REDACTED>