[erlang-questions] Strange BEAM slowdown

Mon Feb 22 22:00:49 CET 2016

If this was related to scheduler threads locking up, due to spending too 
much time (more than 1-2ms roughly) in the erlzmq2 NIF, when the 
erlzmq:recv function is called, you can change the ZeroMQ connections 
you create to receive with active mode instead of passive, to receive 
the messages in the Erlang process without the call to erlzmq:recv.  I 
don't quite understand the need to rewrite the NIF, since it already is 
using a background thread for the receive (at 
https://github.com/zeromq/erlzmq2).  An example of using the active mode 
for recv is at 
https://github.com/CloudI/cloudi_service_zeromq/blob/master/src/cloudi_service_zeromq.erl 
.

On 02/22/2016 09:14 AM, Timothy Legant wrote:
> Hello,
>
> We have an application where we read a huge volume of small messages
> from ZMQ sockets and distribute them to Erlang processes.  We are
> seeing strange behavior where, after a short while, beam.smp's load
> drops quite a bit and then the data begins queuing, eating memory
> until we either stop the program or the Linux OOM killer does it for
> us.
>
> DETAILS
> -------
> CentOS release 6.6 (Final)
> Erlang/OTP 17 [erts-6.4] [source-2e19e2f] [64-bit] [smp:56:56] [async-threads:20] [hipe] [kernel-poll:true]
>
> beam.smp is started with the flags: +sbt db +sub true
>
> We have 60+ data sources (TCP/ZMQ sockets), each of which feeds an
> independent set of processes; there is no interaction between the
> processes handling the data from one socket and the processes handling
> data from other sockets.
>
> Our first implementation used the erlzmq2 library to read the socket.
> We then parsed the messages in Erlang and sent Erlang terms to the
> data handling processes.
>
> After seeing the problem behavior we suspected that the repeated calls
> to erlzmq:recv() and parsing in Erlang might be the cause of the
> backup so we rewrote that code as a NIF (background thread + several
> API calls).  Our NIF implementation reads the ZMQ socket, parses the
> data and then sends it to the data handling processes.  We (obviously,
> I suppose) create one of these background threads for each of the 60+
> data source sockets.
>
> Despite the entirely different implementation of ZMQ handling, parsing
> and dispatch of the data, we are seeing the same issue: first the load
> drops off precipitously and then the data starts queuing in the ZMQ
> socket buffers and the program is unusable.
>
>
> We are curious if anyone has seen this sort of behavior with BEAM or
> might have suggestions on where to look for the issue.
>
>
> Thanks,
>
> Tim
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions