[erlang-questions] enif_send() and overrun heap

Guilherme Andrade g@REDACTED
Wed Jun 20 15:30:20 CEST 2018

Hello list,

I'm bumping into a weird issue in OTP 20.3 (macOS) whereby calling
enif_send() a few dozen times a second from a dirty scheduler (CPU bound),
with msg_env=NULL, results in heap overrun.

These is the flow that sooner or later results in heap overrun:

1) Single Erlang process makes a specific NIF call ~25 times per second
(dirty scheduler, CPU bound)
2) This call will receive a few network packets (non-blocking)
3) Each of these packets gets wrapped in a tuple (allocated in process env)
4) For each of these wrapped packets, a lookup is made in a map, passed as
a NIF argument, for a process dedicated to handling this particular packet
5.a) If the lookup succeeds, enif_send() is called to dispatch the wrapped
packet to said dedicated process (with msg_env=NULL) - this is what happens
to most packets
5.b) If the lookup fails, the wrapped packet is accumulated and later
returned to the NIF caller

Now, when total packets per second increase to a few dozen, sooner or later
(sometimes as soon as after ~10 seconds) the VM stops abruptly with this
error message:

> hend=0x0000000013655fa0
> stop=0x0000000013655ef8
> htop=0x00000000136562c8
> heap=0x0000000013652db0
> beam/erl_gc.c, line 708: <0.506.0>: Overrun stack and heap

(The pid mentioned above corresponds to the NIF caller.)

I tried three things (independently) that prevent the overrun from
happening under this load:
A) Increasing the NIF caller heap size from the default (233 words) to
23300 words
B) Not running the NIF under a dirty scheduler
C) Not calling enif_send

Any ideas on why the overrun happens? Am I missing some very obvious
transgression in the way enif_send() or dirty schedulers are supposed to be

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20180620/98508608/attachment.htm>

More information about the erlang-questions mailing list