OTP socket.erl, unexpected interaction when receiving from errqueue

Wed Jan 13 17:35:06 CET 2021

We have sketched on a possible implementation that works for the VM:s
select machinery.

This is how it supposedly would behave from the socket user's point of view.

An application that wishes to differ between input and error events sets
the socket option: {otp,select_error} to 'true'.  If the platform does not
support poll() and epoll(), setting the option fails.

When calling socket:recv*(..., nowait | Ref), if you get a
SelectInfo return you will later get one or possibly both of these
process messages:
    {'$socket', Socket, select, Ref}
    {'$socket', Socket, select_error, Ref}
The socket is registered for both POLLIN and POLLERR.
socket:cancel(Socket, SelectInfo) clears both.
Calling socket:recv*(...) also clears both.
Any 'select' or 'select_error' message in the process mailbox
from the select operation in progress is also removed.

It is the socket user's responsibility to handle socket options
{ip,recverr} and the receive flag 'errqueue" appropriately for
this use case.

socket:send*(..., nowait | Ref) does not generate any 'select_error'
process message, and does not register for POLLERR.

Another possibility might be to squeeze in a flag/option/switch for all
socket:recv*() operations, to change them into only using POLLERR, so it
will be the socket user's responsibility to combine the recv flag
'errqueue' with this flag/option/switch, and then the normal 'select'
process message would mean that POLLERR has triggered.

This sounds simpler but would require parallel POLLIN and POLLERR
operations from different processes, therefore another job queue in the NIF
to handle queueing POLLER.  Also, the user's code would probably need to
handle POLLERR in a different process than POLLIN.  I think it would be
impossible to handle both POLLIN and POLLERR in the same process.

Is this a feasible solution, or should we stay with the suggested
workarounds associated with not being able to differ between POLLIN and
POLLERR...?

/ Raimo

On Mon, Jan 11, 2021 at 11:02:26AM +0100, Raimo Niskanen wrote:
> On Fri, Jan 08, 2021 at 06:26:19PM +0100, Andreas Schultz wrote:
> > Am Fr., 8. Jan. 2021 um 16:44 Uhr schrieb Raimo Niskanen <
> > raimo+erlang-questions@REDACTED>:
> > 
> > > The VM guys ran into problems when testing this feature.
> > > It seems to be not very portable, e.g on platforms
> > > with only select there is no notion of an error condition,
> > > neither for epoll.  It is only poll that can supply this info,
> > > and what renders a POLLERR differs between platforms.
> > >
> > 
> > IP_RECVERR is Linux specific, it is therefore not surprising that other
> > OSes behave differently.
> 
> The problem is with closing the far end of a pipe, then Linux returns
> POLLERR, but FreeBSD does not.  We were hoping to find a platform
> independent way to verify that POLLERR works.
> 
> > >From the Linux man page, I would expect that EPOLLERR works,
> 
> On Ubuntu 18.04, epoll(7) only mentions EPOLLIN | EPOLLOUT,
> not EPOLLERR.  But it is present in sys/epoll.h, as is EPOLLHUP,
> which corresponds to POLLHUP, also not documented in epoll(7).
> 
> I keep forgetting that one can not trust Linux documentation
> to be complete. :-(
> 
> > 
> > > Furthermore I have done some manual testing of the socket
> > > option {ip,recverr}:
> :
> > The sequence of events very much depends on the state of the socket when
> > you call the recvfrom/recvmsg, leading to an unstable behavior.
> > Consider this example:
> > 
> > Erlang/OTP 24 [DEVELOPMENT] [erts-11.1.4] [source-a348f5a237] [64-bit]
> > [smp:12:12] [ds:12:12:10] [async-threads:1] [jit]
> > 
> > Eshell V11.1.4  (abort with ^G)
> > 1> {ok, S} = socket:open(inet, dgram).
> > {ok,{'$socket',#Ref<0.1380161320.4108451842.177694>}}
> > 2> socket:setopt(S, {ip,recverr}, true).
> > ok
> > 3> socket:sendto(S, "hello", #{family => inet, addr => {127, 0, 0, 1}, port
> > => 44444}).
> > ok
> > 4> socket:sendto(S, "hello", #{family => inet, addr => {127, 0, 0, 1}, port
> > => 44444}).
> > {error,econnrefused}
> 
> The possible workaround was that here you got an error return value from
> the socket, and you have set {ip,recverr} so therefore you should read
> the error queue now.
> 
> 
> > 5> socket:recvfrom(S, 0, [], nowait).
> > {select,{select_info,recvfrom,
> >                      #Ref<0.1380161320.4108320780.177852>}}
> > 6> flush().
> > Shell got {'$socket',{'$socket',#Ref<0.1380161320.4108451842.177694>},
> >                      select,#Ref<0.1380161320.4108320780.177852>}
> > ok
> > 7> socket:recvfrom(S, 0, [], nowait).
> > {select,{select_info,recvfrom,
> >                      #Ref<0.1380161320.4108320780.177867>}}
> > 8> flush().
> > Shell got {'$socket',{'$socket',#Ref<0.1380161320.4108451842.177694>},
> >                      select,#Ref<0.1380161320.4108320780.177867>}
> > 
> > We get an endless stream of select_infos without any hint that there might
> > be something hiding in the error queue.
> > 
> > > I think this approach might be a good one.  When getting an error;
> > > read the error queue with timeout 0, maybe twice to give it higher
> > > effective priority, then back to normal reading.
> > >
> > 
> > To avoid the endless select loop from above, the user would have to do a
> > *speculative / blind* read of the error queue.
> 
> I agree that we want to avoid having to do a speculative read of the error
> queue for every read operation because that would hurt performance.
> 
> There is a maybe useful indication here: after getting a select message,
> reading with 'nowait' returns no data but instead a new
> select continuation.
> 
> That could be taken as a "safe hint" to next time first do a not so
> speculative read of the error queue.  If you get data you do not need to
> read the error queue.
> 
> This speculative read of the error queue should also very rarely be needed
> since when there is an error condition on the socket the normal read
> returns an error; then you should read the error queue.
> 
> It is only when some other socket operation as in your example the send
> that might be done outside the read loop gets the error value, that this
> speculative read would be triggered.
> 
> 
> > 
> > The IP_RECVERR flag has to be set explicitly, such a situation can
> > therefore not sneak up unexpectedly on a user. So this would IMHO be
> > acceptable, but needs very explicit documentation somewhere (maybe near the
> > recverr flag?)
> 
> That is the least we should do.  But if documenting pecularities turns out
> to be unsufficient, we have the possibility to add something not 100%
> backwards compatible to the socket interface for OTP-24.0, if that would be
> a better technical solution than something backwards compatible.
> 
> 
> > 
> > I still would prefer to get the POLLERR vs. POLLIN indication in the select
> > message. Even when it only works on Linux and not on others.
> > The MSG_ERRQUEUE recvmsg flag and the whole IP_RECVERR mechanism to get the
> > payload of ICMP errors for UDP sockets is not portable and only usable on
> > Linux anyhow.
> > The classical approach on e.g. FreeBSD seems to involve an additional RAW
> > socket to read the ICMPs. Not sure how/if that is supposed to work on
> > Windows or OSX.
> > 
> > The problem I see with reporting the poll flags is that they are a
> > bitfield. Having both POLLIN and POLLERR set is therefore possible.
> > Representing this as a map or proplist seems a bit wasteful.
> > Since we always only request `POLLIN | POLLERR` and `POLLOUT | POLLERR`,
> 
> Another problem is that you can not request POLLERR (the flag is ignored);
> you get it anyway, and as you say maybe in combination with either, both
> or none of POLLIN | POLLOUT.
> 
> > maybe using a select_info tuple of `{select_info, select_tag(),
> > select_ref(), select_opt()}` with `select_opt() :: {ReadWrite :: boolean(),
> > Error :: boolean()}` could work ???
> 
> We have to fit it into the VM:s enif_select machinery, and currently you
> set the message when requesting select_read of select_write.  How the
> error flag should be returned is a tricky question since that would require
> changing the message that you set.
> 
> / Raimo
> 
> 
> > 
> > Regards,
> > Andreas
> > 
> > 
> > > What do you think?
> > >
> > > / Raimo
> > >
> > >
> > > On Fri, Dec 04, 2020 at 10:45:03AM +0100, Raimo Niskanen wrote:
> > > > The VM team has now planned a task to extend the NIF API to handle
> > > > select/poll on error i.e enif_select_error() and ERL_NIF_SELECT_ERROR.
> > > >
> > > > When that is done we can use that in the socket API, probably exactly as
> > > > you first suggested.
> > > >
> > > > A question is if we should just add that, and all code using
> > > > the socket API has to be prepared for a new message, or if we need
> > > > an option on the 'otp' protocol level.
> > > >
> > > > The Experimental status of the socket API allows for such a
> > > > backwards-incompatible change, but that does not mean that
> > > > we need to do one...
> > > >
> > > > Until then I have just merged, into 'master', an optimization of
> > > > recv with Timeout =:= 0 so it skips the select/cancel select dance.
> > > >
> > > > Cheers
> > > > / Raimo
> > > >
> > > >
> > > > On Wed, Nov 25, 2020 at 02:45:22PM +0100, Raimo Niskanen wrote:
> > > > > On Wed, Nov 25, 2020 at 01:42:49PM +0100, Andreas Schultz wrote:
> > > > > > Am Mi., 25. Nov. 2020 um 11:37 Uhr schrieb Raimo Niskanen <
> > > > > > raimo+erlang-questions@REDACTED>:
> > > > > >
> > > > > > > Is it so that recvmsg(fd, &msg, MSG_ERRQUEUE) only receives from
> > > > > > > the error queue, and never any regular data?
> > > > > > >
> > > > > >
> > > > > > That is my understanding from the man page. Experiments also confirm
> > > this.
> > > > >
> > > > > The man page is a not entirely unambigous.  But I found a stackoverflow
> > > > > thread that also confirms this (also referring to experiments).
> > > > >
> > > > >
> > > https://stackoverflow.com/questions/17326913/linux-udp-socket-recvmsg-with-msg-errqueue
> > > > >
> > > > > >
> > > > > > Reading the errqueue is actually quite hard to test. The behavior
> > > for local
> > > > > > errors and remote errors (e.g. reception of ICMP errors) is sometimes
> > > > > > different.
> > > > > > Small sample:
> > > > > >
> > > > > > 1> {ok, Socket} = socket:open(inet, dgram, udp).
> > > > > > {ok,{'$socket',#Ref<0.1577644963.140640257.162512>}}
> > > > > > 2> ok = socket:setopt(Socket, ip, recverr, true).
> > > > > > ok
> > > > > > 3> Dest = #{family => inet, addr => {127,0,0,1}, port => 1234}.
> > > > > > #{addr => {127,0,0,1},family => inet,port => 1234}
> > > > > > 4> socket:sendto(Socket, <<"Data">>, Dest, nowait).
> > > > > > ok
> > > > > > 5> socket:sendto(Socket, <<"Data">>, Dest, nowait).
> > > > > > {error,{econnrefused,[<<"Data">>]}}
> > > > >
> > > > > Weird to get econnrefused from sendto() on an unconnected dgram socket.
> > > > >
> > > > > >
> > > > > >
> > > > > > The first sendto returns `ok`, the error can be read in subsequent
> > > recvmsg.
> > > > > > The second sendto returns the error immediately because the kernel
> > > has
> > > > > > learned that the local endpoint does not exist.
> > > > > >
> > > > > >
> > > > > >
> > > > > > > Today there is no VM support for a NIF do differ between POLLIN
> > > > > > > and POLLERR.  I have asked the VM guys to have a look at that.
> > > > > > >
> > > > > > > Without that you can as your response to receiving
> > > > > > > {'$socket',Socket,select,SelectHandle}
> > > > > > > call socket:recvmsg(Socket, [errqueue], 0), to poll, and then
> > > > > > > if the poll gave {error,timeout} call socket:recvmsg(Socket, 0,
> > > nowait).
> > > > > > >
> > > > > >
> > > > > > If was actually thinking about doing `socket:recvmsg(Socket,
> > > [errqueue],
> > > > > > nowait)`
> > > > > > if
> > > > > >    a) just received a select message and
> > > > > >    b) socket:recvfrom returned a new select info instead of reading
> > > any data
> > > > > >
> > > > > > That should capture the situation where only data in the errqueue is
> > > > > > present without having to
> > > > > > use the 0 timeout.
> > > > >
> > > > > Might there not be a possibility to starve out error messages in the
> > > face of
> > > > > continously incoming data with this approach?
> > > > >
> > > > > >
> > > > > > Then we will have to optimize Timeout =:= 0, and maybe introduce
> > > > > > > Timeout =:= poll with a nicer return value for no data.
> > > > > > >
> > > > > >
> > > > > > I like the general idea, but the `nowait` option now feels a  bit
> > > wrong. We
> > > > > > end up with `nowait == do a select` and `poll == just check if there
> > > is
> > > > > > data`.
> > > > > > It might be too late for changing `nowait`, but what about adding
> > > `select`
> > > > > > as an alias to `nowait` ?
> > > > >
> > > > > I think it is too late to change 'nowait' now, and it is not entirely
> > > off.
> > > > > It states that the call will not wait, but not what happens instead.
> > > > > While 'select' states what to do, but only sometimes.
> > > > >
> > > > > 'poll' also states what to do.  We could add 'select' as an alias for
> > > > > 'nowait', some might find it more appropriate.
> > > > >
> > > > > We thought about having only polling operations and a dedicated select
> > > > > operation, but that would need minimum two NIF calls also when handling
> > > > > much data, so we decided on doing automatic select depending on the
> > > > > result of the operation.
> > > > >
> > > > > / Raimo
> > > > >
> > > > >
> > > > > >
> > > > > > Regards,
> > > > > > Andreas
> > > > > >
> > > > > > Timeout =:= 0 today causes quite some overhead, but it should work.
> > > > > > >
> > > > > > > Cheers
> > > > > > > / Raimo
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Nov 23, 2020 at 05:42:23PM +0100, Andreas Schultz wrote:
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > If setup a socket with:
> > > > > > > >
> > > > > > > >     socket:setopt(Socket, ip, recverr, true)
> > > > > > > >
> > > > > > > > If then started a asynchronous recvmsg with:
> > > > > > > >
> > > > > > > >     {select, SelectInfo} = socket:recvfrom(Socket, 0, [], nowait)
> > > > > > > >
> > > > > > > > When now something arrives in the error queue, I'll get a select
> > > info
> > > > > > > > message with:
> > > > > > > >
> > > > > > > >     {'$socket', Socket, select, SelectInfo}
> > > > > > > >
> > > > > > > > The problem is, nothing in there tells me to read from the error
> > > queue.
> > > > > > > The
> > > > > > > > underlying OS poll/epoll call would have this information, but
> > > it is lost
> > > > > > > > in the Erlang message.
> > > > > > > >
> > > > > > > > When I now try to read from the socket with:
> > > > > > > >
> > > > > > > >    socket:recvfrom(Socket, 0, [], nowait)
> > > > > > > >
> > > > > > > > all I get is another `{select, SelectInfo}` tuple, followed by
> > > another
> > > > > > > > `{'$socket', Socket, select, SelectInfo}` messages.
> > > > > > > > This can actually end up in an endless busy loop.
> > > > > > > >
> > > > > > > > To actually clear this I would to do a:
> > > > > > > >
> > > > > > > >     socket:recvmsg(Socket, [errqueue], nowait)
> > > > > > > >
> > > > > > > > On an POSIX socket, I would have to actually poll for POLLIN |
> > > POLLERR to
> > > > > > > > get a similar behavior. But the return of the poll would tell me
> > > whether
> > > > > > > it
> > > > > > > > was POLLIN or POLLERR (similar for epoll).
> > > > > > > > For the Erlang API it might sense to always poll for both
> > > conditions, but
> > > > > > > > we then should get an indication what exactly it was.
> > > > > > > >
> > > > > > > > Would it be possible to change the $socket message to something
> > > like:
> > > > > > > >
> > > > > > > >     {'$socket', Socket, error, SelectInfo}
> > > > > > > >
> > > > > > > > for POLLERR/EPOLLERR ???
> > > > > > > >
> > > > > > > > Regards
> > > > > > > > Andreas
> > > > > > > > --
> > > > > > > >
> > > > > > > > Andreas Schultz
> > > > > > >
> > > > > > > --
> > > > > > >
> > > > > > > / Raimo Niskanen, Erlang/OTP, Ericsson AB
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > > > Andreas Schultz
> > > > >
> > > > > --
> > > > >
> > > > > / Raimo Niskanen, Erlang/OTP, Ericsson AB
> > > >
> > > > --
> > > >
> > > > / Raimo Niskanen, Erlang/OTP, Ericsson AB
> > >
> > > --
> > >
> > > / Raimo Niskanen, Erlang/OTP, Ericsson AB
> > >
> > 
> > 
> > -- 
> > 
> > Andreas Schultz
> > 
> > -- 
> > 
> > Principal Engineer
> > 
> > t: +49 391 819099-224
> > 
> > ------------------------------- enabling your networks
> > -----------------------------
> > 
> > Travelping GmbH
> > Roentgenstraße 13
> > 39108 Magdeburg
> > Germany
> > 
> > t: +49 391 819099-0
> > f: +49 391 819099-299
> > 
> > e: info@REDACTED
> > w: https://www.travelping.com/
> > Company registration: Amtsgericht Stendal
> > Managing Director: Holger Winkelmann
> > Reg. No.: HRB 10578
> > VAT ID: DE236673780
> 
> -- 
> 
> / Raimo Niskanen, Erlang/OTP, Ericsson AB

-- 

/ Raimo Niskanen, Erlang/OTP, Ericsson AB