[erlang-questions] enif_send() and overrun heap

Guilherme Andrade g@REDACTED
Wed Jun 20 16:18:05 CEST 2018


On Wed, 20 Jun 2018, 14:42 Daniel Goertzen, <daniel.goertzen@REDACTED>
wrote:

> I wonder if running under a dirty scheduler counts as "running from a
> created thread" in which case the first parameter to enif_send() should be
> NULL.
>

Good point! That seemed to do it, thanks.

May someone from OTP  team confirm dirty schedulers count as created
threads for enif_send? (It makes perfect sense they would, but it's best to
check.)

If confirmed, maybe I'll do a PR to make documentation more explicit about
this corner case.



> On Wed, 20 Jun 2018 at 08:30 Guilherme Andrade <g@REDACTED> wrote:
>
>> Hello list,
>>
>> I'm bumping into a weird issue in OTP 20.3 (macOS) whereby calling
>> enif_send() a few dozen times a second from a dirty scheduler (CPU bound),
>> with msg_env=NULL, results in heap overrun.
>>
>> These is the flow that sooner or later results in heap overrun:
>>
>> 1) Single Erlang process makes a specific NIF call ~25 times per second
>> (dirty scheduler, CPU bound)
>> 2) This call will receive a few network packets (non-blocking)
>> 3) Each of these packets gets wrapped in a tuple (allocated in process
>> env)
>> 4) For each of these wrapped packets, a lookup is made in a map, passed
>> as a NIF argument, for a process dedicated to handling this particular
>> packet
>> 5.a) If the lookup succeeds, enif_send() is called to dispatch the
>> wrapped packet to said dedicated process (with msg_env=NULL) - this is what
>> happens to most packets
>> 5.b) If the lookup fails, the wrapped packet is accumulated and later
>> returned to the NIF caller
>>
>> Now, when total packets per second increase to a few dozen, sooner or
>> later (sometimes as soon as after ~10 seconds) the VM stops abruptly with
>> this error message:
>>
>> > hend=0x0000000013655fa0
>> > stop=0x0000000013655ef8
>> > htop=0x00000000136562c8
>> > heap=0x0000000013652db0
>> > beam/erl_gc.c, line 708: <0.506.0>: Overrun stack and heap
>>
>> (The pid mentioned above corresponds to the NIF caller.)
>>
>> I tried three things (independently) that prevent the overrun from
>> happening under this load:
>> A) Increasing the NIF caller heap size from the default (233 words) to
>> 23300 words
>> B) Not running the NIF under a dirty scheduler
>> C) Not calling enif_send
>>
>> Any ideas on why the overrun happens? Am I missing some very obvious
>> transgression in the way enif_send() or dirty schedulers are supposed to be
>> used?
>>
>>
>> --
>> Guilherme
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-questions
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20180620/2b3704d8/attachment.htm>


More information about the erlang-questions mailing list