Repeated reference in select messages from the new socket module

Raimo Niskanen raimo+erlang-questions@REDACTED
Thu Oct 8 10:46:00 CEST 2020


On Sat, Oct 03, 2020 at 03:00:10AM +0100, Guilherme Andrade wrote:
> Hello list,
> 
> Today I found a peculiar situation when using the new socket[1] module.
> 
> Upon a `:send/3` call with `nowait` as the timeout returning `{select,
> SelectInfo}`, the controlling process will sometimes receive a duplicate
> reference within two asynchronous select messages; the second message comes
> later /when/ the socket is closed by a separate process (not the
> controlling one) just at the right time - the conditions are hard to
> replicate.
> 
> That is:
> - the controlling process first gets a `{'$socket', socket(), select,
> SelectRef}` message when the socket is available for writing
> - the controlling process then gets a second `{'$socket', socket(), abort,
> {SelectRef, closed}` message
> ...and `SelectRef` is the same for both.
> 
> I looked for the root cause within `prim_socket_nif.c` (OTP 22.3.4.10) and,
> if I'm interpreting it correctly, this may happen upon 1) the socket
> becoming available for writing and dispatching the message earlier passed
> onto `enif_select_write`[2] and 2) a secondary process closing the socket
> and dispatching the abort message while the controlling process is still
> registered as a writer.
> 
> However, the C NIF code responsible for handling `socket` stuff is quite a
> lot to take in an afternoon, and I may have misunderstood it.
> 
> Is my theory correct? Can select messages with a duplicate reference be
> dispatched to the same process? And is this expected, or possibly a bug?

It is not supposed to happen.  I'd call it a bug.  We will definitely
look in to this.

> 
> I worked around it by flushing the controlling process message queue with
> `receive` to avoid the unexpected, second message by generically consuming
> any remaining `select` messages related to that socket.
> 
> I can distill the code that replicates it (on OTP 22.3.4.10, mac OS) but
> I'm asking about it first, just in case this is known or expected somehow.
> The duplicate reference did caught me offhand.

There has been lots of rewrites for instance regarding lock handling
on the master branch, so it would be nice to know if this bug is
still present on master.

/ Raimo



> 
> Cheers!
> 
> [1]: https://erlang.org/doc/man/socket.html
> [2]: http://erlang.org/doc/man/erl_nif.html#enif_select_write
> 
> -- 
> Guilherme

-- 

/ Raimo Niskanen, Erlang/OTP, Ericsson AB


More information about the erlang-questions mailing list