Experiences with Multithreaded drivers

Thu Aug 24 15:34:09 CEST 2006

Hi all,

I have been writing about multi threaded drivers before and want to
share my experience.

I have been part of developing a system where a linked in C driver has
been used to make blocking calls, potentially blocking a very long time.
One Erlang process (A) could lock a resource, and another Erlang process
(B) would be blocked if it tried to lock the same resource, until it is
released by the first proccess (A). When this was executed in the main
thread, the whole beam would be blocked and we had a pretty clear
deadlock situation. So, what first happened was that we made sure that
two processes could not make a blocking call at the same time. Fair
enough, but if the resource was locked from the beginning (by any
external process), even A:s call would be blocked, but there would be no
deadlock. However, blocking the beam emulator for an unknown amount of
time wasn't acceptable, so we looked into an asynchronous driver.

By using driver_async we could make the blocking call go into an OS
thread and the beam would still be running. However, the deadlock would
still occur if the unlocking call would be queued after the blocked call
in the same thread.

By using a Key in driver_async, it is possible to assure that all calls
with the same keys goes through the same thread, but it does not seem
possible to make sure that two different keys get different OS threads
(from the Erlang thread pool), even if there is enough threads in the
pool (+A). Due to this, the deadlock could still occur with a huge
thread pool and Keys in the driver_async call.

Also, the round-robbing way of the Erlang thread pool could be expected
to choose a non-busy thread if such exists, but that does not seem to
happen. I would however not trust this behaviour to keep away from the
deadlock since any other driver (e.g. io) could potentially be using any
amount of threads...

So, by still synchronising in Erlang, so that two (possibly) blocking
calls could not be done at the same time it was possible to make
blocking calls, only blocking the process calling the port in the beam
emulator. When the blocking call returns however, data need to be passed
back to Erlang (sent to the port owner). First we tried to just use
driver_output in the thread, but the emulator didn't seem to happy about
that. Sometimes it seemed to cause a crash (segfault), and sometimes the
messages were lost. I would guess that there is some data structures
expecting all driver_output to be executed by the main thread, am I
right? So, by using ready_async (driver_entry) (this is called by the
main thread) we _seem_ to have created a more stable way of sending data
back.

To be able to use more than one thread at the same time (all requests to
this resource are not of locking nature) we decided to spawn a thread
(plain pthread) every time a port was opened, forcing all calls to go
through that (every port has it's own thread that no one else can
touch). However, when we used driver_output in this thread, we were
forced to hit the keyboard for the message to arrive, and a large enough
message would crash the emulator (i'm guessing same reason as above). I
guess the way of returning from this (non-thread pool) thread is to
signal the main thread using driver_select and get the data through a
shared data structure.

We never tried the latter part, but rewrote it as a port driver instead,
spawning a new port for every "thread", which seems to work very well.

While this experience (read fight with OS threads/erl_driver) have
increased the love for Erlang processes and message passing it is still
frustrating not to be able to link in multi threaded drivers without
trying to trick the beam emulator.

Regards
-- 
Oscar Hellström, oscar@REDACTED
web: personal.oscarh.net
jid: oscar@REDACTED