Repeated reference in select messages from the new socket module

Guilherme Andrade g@REDACTED
Wed Oct 21 17:14:01 CEST 2020


I've finally found the chance to look into this matter again.

On Thu, 8 Oct 2020 at 09:46, Raimo Niskanen <
raimo+erlang-questions@REDACTED> wrote:

> On Sat, Oct 03, 2020 at 03:00:10AM +0100, Guilherme Andrade wrote:
> > Hello list,
> >
> > Today I found a peculiar situation when using the new socket[1] module.
> >
> > Upon a `:send/3` call with `nowait` as the timeout returning `{select,
> > SelectInfo}`, the controlling process will sometimes receive a duplicate
> > reference within two asynchronous select messages; the second message
> comes
> > later /when/ the socket is closed by a separate process (not the
> > controlling one) just at the right time - the conditions are hard to
> > replicate.
> >
> > That is:
> > - the controlling process first gets a `{'$socket', socket(), select,
> > SelectRef}` message when the socket is available for writing
> > - the controlling process then gets a second `{'$socket', socket(),
> abort,
> > {SelectRef, closed}` message
> > ...and `SelectRef` is the same for both.
> >
> > I looked for the root cause within `prim_socket_nif.c` (OTP 22.3.4.10)
> and,
> > if I'm interpreting it correctly, this may happen upon 1) the socket
> > becoming available for writing and dispatching the message earlier passed
> > onto `enif_select_write`[2] and 2) a secondary process closing the socket
> > and dispatching the abort message while the controlling process is still
> > registered as a writer.
> >
> > However, the C NIF code responsible for handling `socket` stuff is quite
> a
> > lot to take in an afternoon, and I may have misunderstood it.
> >
> > Is my theory correct? Can select messages with a duplicate reference be
> > dispatched to the same process? And is this expected, or possibly a bug?
>
> It is not supposed to happen.  I'd call it a bug.  We will definitely
> look in to this.
>

Here's an escript that quickly reproduces the issue on my machine (OTP
22.3.4.10, mac OS):
https://gist.github.com/g-andrade/42ee10e5e1fc97c157ce0dc627cbf2b7


>
> >
> > I worked around it by flushing the controlling process message queue with
> > `receive` to avoid the unexpected, second message by generically
> consuming
> > any remaining `select` messages related to that socket.
> >
> > I can distill the code that replicates it (on OTP 22.3.4.10, mac OS) but
> > I'm asking about it first, just in case this is known or expected
> somehow.
> > The duplicate reference did caught me offhand.
>
> There has been lots of rewrites for instance regarding lock handling
> on the master branch, so it would be nice to know if this bug is
> still present on master.
>

I ran the same script under OTP 23.1.1 and could not reproduce the issue,
so it's very possibly been fixed somewhere along the way.

Should I create a ticket at bugs.erlang.org?


>
> / Raimo
>
>
>
> >
> > Cheers!
> >
> > [1]: https://erlang.org/doc/man/socket.html
> > [2]: http://erlang.org/doc/man/erl_nif.html#enif_select_write
> >
> > --
> > Guilherme
>
> --
>
> / Raimo Niskanen, Erlang/OTP, Ericsson AB
>


-- 
Guilherme
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20201021/57508774/attachment.htm>


More information about the erlang-questions mailing list