[erlang-questions] enif_send() and overrun heap

Daniel Goertzen daniel.goertzen@REDACTED
Wed Jun 20 17:17:32 CEST 2018


Hmm, I think there's more to it.  The source shows that dirty schedulers
*are* specifically handled in enif_send(), so something else is up.



On Wed, 20 Jun 2018 at 09:18 Guilherme Andrade <g@REDACTED> wrote:

>
> On Wed, 20 Jun 2018, 14:42 Daniel Goertzen, <daniel.goertzen@REDACTED>
> wrote:
>
>> I wonder if running under a dirty scheduler counts as "running from a
>> created thread" in which case the first parameter to enif_send() should be
>> NULL.
>>
>
> Good point! That seemed to do it, thanks.
>
> May someone from OTP  team confirm dirty schedulers count as created
> threads for enif_send? (It makes perfect sense they would, but it's best to
> check.)
>
> If confirmed, maybe I'll do a PR to make documentation more explicit about
> this corner case.
>
>
>
>> On Wed, 20 Jun 2018 at 08:30 Guilherme Andrade <g@REDACTED> wrote:
>>
>>> Hello list,
>>>
>>> I'm bumping into a weird issue in OTP 20.3 (macOS) whereby calling
>>> enif_send() a few dozen times a second from a dirty scheduler (CPU bound),
>>> with msg_env=NULL, results in heap overrun.
>>>
>>> These is the flow that sooner or later results in heap overrun:
>>>
>>> 1) Single Erlang process makes a specific NIF call ~25 times per second
>>> (dirty scheduler, CPU bound)
>>> 2) This call will receive a few network packets (non-blocking)
>>> 3) Each of these packets gets wrapped in a tuple (allocated in process
>>> env)
>>> 4) For each of these wrapped packets, a lookup is made in a map, passed
>>> as a NIF argument, for a process dedicated to handling this particular
>>> packet
>>> 5.a) If the lookup succeeds, enif_send() is called to dispatch the
>>> wrapped packet to said dedicated process (with msg_env=NULL) - this is what
>>> happens to most packets
>>> 5.b) If the lookup fails, the wrapped packet is accumulated and later
>>> returned to the NIF caller
>>>
>>> Now, when total packets per second increase to a few dozen, sooner or
>>> later (sometimes as soon as after ~10 seconds) the VM stops abruptly with
>>> this error message:
>>>
>>> > hend=0x0000000013655fa0
>>> > stop=0x0000000013655ef8
>>> > htop=0x00000000136562c8
>>> > heap=0x0000000013652db0
>>> > beam/erl_gc.c, line 708: <0.506.0>: Overrun stack and heap
>>>
>>> (The pid mentioned above corresponds to the NIF caller.)
>>>
>>> I tried three things (independently) that prevent the overrun from
>>> happening under this load:
>>> A) Increasing the NIF caller heap size from the default (233 words) to
>>> 23300 words
>>> B) Not running the NIF under a dirty scheduler
>>> C) Not calling enif_send
>>>
>>> Any ideas on why the overrun happens? Am I missing some very obvious
>>> transgression in the way enif_send() or dirty schedulers are supposed to be
>>> used?
>>>
>>>
>>> --
>>> Guilherme
>>> _______________________________________________
>>> erlang-questions mailing list
>>> erlang-questions@REDACTED
>>> http://erlang.org/mailman/listinfo/erlang-questions
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20180620/940838a1/attachment.htm>


More information about the erlang-questions mailing list