[erlang-questions] Keeping massive concurrency when interfacing with C
Alceste Scalas
alceste@REDACTED
Wed Oct 5 17:55:59 CEST 2011
On Wed, 5 Oct 2011 09:24:19 +0200, Peer Stritzinger <peerst@REDACTED>
wrote:
> I have to admit I was not aware of this. OTOH it seems not to
> be available, can't find anything except the paper and EEP7
> which is the foreign function interface to the external number
> crunching libraries they invented.
Hi, I'm one of the authors of the "HPTC with Erlang" work.
You're right, nothing was publicly released except for the FFI
implementation described in EEP 7 --- and since the project ended
last year, I believe that nothing else will be released in the
future. The last bits were a (prototypal) NIF-based FFI
implementation [1], together with a request to withdraw EEP 7
(since it was clearly superseded by NIFs) [2].
[1] http://muvara.org/hg/erlang-ffi/
[2] http://erlang.org/pipermail/eeps/2010-July/000292.html
> The price you have to pay for the slapped on heavyweight
> library is that these usually don't scale up to the number of
> processes Erlang can handle.
IMHO it mostly depends on:
1. the size of the operands you're working on;
2. the complexity of the foreign functions you're going to
call.
Our project was primarily focused on real-time numerical
computing, and thus we needed a method for quickly calling
"simple" numerical foreign functions (such as multiplications of
relatively small (portions of) matrices). Those functions, taken
alone, would usually return almost immediately: in other words,
their execution time was similar to that of regular BIFs. We
used BLAS because its optimized implementations are usually
"fast enough", but (if necessary) we could have developed
our own optimized C code.
When more complicated formulas are assembled with repeated FFI
calls to those simple functions, then the Erlang scheduler can
kick in several times before the final result is obtained, thus
guaranteeing VM responsiveness (albeit reducing the general
numerical throughput).
> Keeping a pool of numerical processes to keep the cores busy
> but not too many of them that the OS is upset. Having work
> queues that adapt these to the 20k processes.
If the native calls performed by those 20k Erlang processes are
not "heavy" enough, then introducing work queues may actually
increase the Erlang VM load and internal lock contention, thus
decreasing responsiveness (wrt plain NIF calls). I suspect that
some comparative benchmarking could be useful.
> The suggested n-dim matrix type (e.g. a record containing the
> metadata and a binary for the data) combined with some NIFs on
> these that speed up the parts where Erlang is not so fast.
> Keeping in mind not to do too much work in the NIFs at one time
> not to block the scheduler.
This is exactly what we did for interfacing BLAS and other
numerical routines (except that we used our FFI, since NIFs were
not yet available).
Maybe a next-generation, general-pourpose numerical computing
module for Erlang could adopt different strategies depending on
the size of the operands passed to its functions:
1. if the vectors/matrices are "small enough", then the native
code could be called directly using NIFs;
2. otherwise, the operands could be passed to a separate worker
thread, which will later send back its result to the waiting
Erlang process (using enif_send()).
In the second case, the future NIF extensions planned by OTP
folks may be very useful --- see Rickard Green's talk at the SF
Bay Area Erlang Factory 2011: http://bit.ly/eH61tX
> For real heavy numerical stuff I think the best way is to do
> this in the systems are built for this and interface them
> somehow to erlang with ports or sockets.
Sure, but the problem with this approach is that you may need to
constantly (de)serialize and transfer large numerical arrays
among the Erlang VM and the external number crunching systems,
thus wasting processor cycles, and memory/network bandwidth.
Regards,
--
Alceste Scalas <alceste@REDACTED>
More information about the erlang-questions
mailing list