Repeated reference in select messages from the new socket module

Raimo Niskanen raimo+erlang-questions@REDACTED
Fri Oct 23 11:13:03 CEST 2020


Since socket is experimental we are not very keen on fixing bugs
on the 22 track.  If you can reproduce on maint or master, then
we could make an effort to fix it.  Probably on maint, absolutely
on master.

Thank you for investigating!
/ Raimo


On Wed, Oct 21, 2020 at 04:14:01PM +0100, Guilherme Andrade wrote:
> I've finally found the chance to look into this matter again.
> 
> On Thu, 8 Oct 2020 at 09:46, Raimo Niskanen <
> raimo+erlang-questions@REDACTED> wrote:
> 
> > On Sat, Oct 03, 2020 at 03:00:10AM +0100, Guilherme Andrade wrote:
> > > Hello list,
> > >
> > > Today I found a peculiar situation when using the new socket[1] module.
> > >
> > > Upon a `:send/3` call with `nowait` as the timeout returning `{select,
> > > SelectInfo}`, the controlling process will sometimes receive a duplicate
> > > reference within two asynchronous select messages; the second message
> > comes
> > > later /when/ the socket is closed by a separate process (not the
> > > controlling one) just at the right time - the conditions are hard to
> > > replicate.
> > >
> > > That is:
> > > - the controlling process first gets a `{'$socket', socket(), select,
> > > SelectRef}` message when the socket is available for writing
> > > - the controlling process then gets a second `{'$socket', socket(),
> > abort,
> > > {SelectRef, closed}` message
> > > ...and `SelectRef` is the same for both.
> > >
> > > I looked for the root cause within `prim_socket_nif.c` (OTP 22.3.4.10)
> > and,
> > > if I'm interpreting it correctly, this may happen upon 1) the socket
> > > becoming available for writing and dispatching the message earlier passed
> > > onto `enif_select_write`[2] and 2) a secondary process closing the socket
> > > and dispatching the abort message while the controlling process is still
> > > registered as a writer.
> > >
> > > However, the C NIF code responsible for handling `socket` stuff is quite
> > a
> > > lot to take in an afternoon, and I may have misunderstood it.
> > >
> > > Is my theory correct? Can select messages with a duplicate reference be
> > > dispatched to the same process? And is this expected, or possibly a bug?
> >
> > It is not supposed to happen.  I'd call it a bug.  We will definitely
> > look in to this.
> >
> 
> Here's an escript that quickly reproduces the issue on my machine (OTP
> 22.3.4.10, mac OS):
> https://gist.github.com/g-andrade/42ee10e5e1fc97c157ce0dc627cbf2b7
> 
> 
> >
> > >
> > > I worked around it by flushing the controlling process message queue with
> > > `receive` to avoid the unexpected, second message by generically
> > consuming
> > > any remaining `select` messages related to that socket.
> > >
> > > I can distill the code that replicates it (on OTP 22.3.4.10, mac OS) but
> > > I'm asking about it first, just in case this is known or expected
> > somehow.
> > > The duplicate reference did caught me offhand.
> >
> > There has been lots of rewrites for instance regarding lock handling
> > on the master branch, so it would be nice to know if this bug is
> > still present on master.
> >
> 
> I ran the same script under OTP 23.1.1 and could not reproduce the issue,
> so it's very possibly been fixed somewhere along the way.
> 
> Should I create a ticket at bugs.erlang.org?
> 
> 
> >
> > / Raimo
> >
> >
> >
> > >
> > > Cheers!
> > >
> > > [1]: https://erlang.org/doc/man/socket.html
> > > [2]: http://erlang.org/doc/man/erl_nif.html#enif_select_write
> > >
> > > --
> > > Guilherme
> >
> > --
> >
> > / Raimo Niskanen, Erlang/OTP, Ericsson AB
> >
> 
> 
> -- 
> Guilherme

-- 

/ Raimo Niskanen, Erlang/OTP, Ericsson AB


More information about the erlang-questions mailing list