OTP socket.erl, unexpected interaction when receiving from errqueue

Andreas Schultz andreas.schultz@REDACTED
Fri Jan 8 18:26:19 CET 2021


Am Fr., 8. Jan. 2021 um 16:44 Uhr schrieb Raimo Niskanen <
raimo+erlang-questions@REDACTED>:

> The VM guys ran into problems when testing this feature.
> It seems to be not very portable, e.g on platforms
> with only select there is no notion of an error condition,
> neither for epoll.  It is only poll that can supply this info,
> and what renders a POLLERR differs between platforms.
>

IP_RECVERR is Linux specific, it is therefore not surprising that other
OSes behave differently.
>From the Linux man page, I would expect that EPOLLERR works,

Furtermore I have done some manual testing of the socket
> option {ip,recverr}:
>
> Erlang/OTP 24 [DEVELOPMENT] [erts-11.1.3] [source-aaa3fc53a3] [64-bit]
> [smp:8:8] [ds:8:8:10] [async-threads:1] [jit]
>
> Eshell V11.1.3  (abort with ^G)
> 1> {ok, S} = socket:open(inet, dgram).
>
> {ok,{'$socket',#Ref<0.204328729.875954179.14364>}}
> 2> socket:setopt(S, {ip,recverr}, true).
>
> ok
> 3> socket:recvmsg(S, [], nowait).
>
> {select,{select_info,recvmsg,
>                      #Ref<0.204328729.875823107.14380>}}
> 4> flush().
> ok
> 5> socket:sendto(S, "hello", #{family => inet, addr => {127,0,0,1}, port =>
> 44444}).


> ok
> 6>
> 6> flush().
>
> Shell got {'$socket',{'$socket',#Ref<0.204328729.875954179.14364>},
>                      select,#Ref<0.204328729.875823107.14380>}
> ok
> 7> socket:recvmsg(S, [], nowait).
>
> {error,econnrefused}
> 8> socket:recvmsg(S, [errqueue], 0).
> {ok,#{addr =>
>           #{addr => {127,0,0,1},family => inet,port => 44444},
>       ctrl =>
>           [#{data =>
>
> <<111,0,0,0,2,3,3,0,0,0,0,0,0,0,0,0,2,0,0,0,127,0,0,1,...>>,
>              level => ip,type => recverr,
>              value =>
>                  #{code => port_unreach,data => 0,error => econnrefused,
>                    info => 0,
>                    offender => #{addr => {127,0,0,1},family => inet,port =>
> 0},
>                    origin => icmp,type => dest_unreach}}],
>       flags => [errqueue],
>       iov => [<<"hello">>]}}
> 9> socket:recvmsg(S, [errqueue], 0).
> {error,timeout}
> 10> socket:recvmsg(S, [], nowait).
> {select,{select_info,recvmsg,
>                      #Ref<0.204328729.875823107.14427>}}
> 11> flush().
> ok
>
>
> We get a select message, do a normal read.  It returns an error,
> so we read the error queue to get a verbose error message.
>

The sequence of events very much depends on the state of the socket when
you call the recvfrom/recvmsg, leading to an unstable behavior.
Consider this example:

Erlang/OTP 24 [DEVELOPMENT] [erts-11.1.4] [source-a348f5a237] [64-bit]
[smp:12:12] [ds:12:12:10] [async-threads:1] [jit]

Eshell V11.1.4  (abort with ^G)
1> {ok, S} = socket:open(inet, dgram).
{ok,{'$socket',#Ref<0.1380161320.4108451842.177694>}}
2> socket:setopt(S, {ip,recverr}, true).
ok
3> socket:sendto(S, "hello", #{family => inet, addr => {127, 0, 0, 1}, port
=> 44444}).
ok
4> socket:sendto(S, "hello", #{family => inet, addr => {127, 0, 0, 1}, port
=> 44444}).
{error,econnrefused}
5> socket:recvfrom(S, 0, [], nowait).
{select,{select_info,recvfrom,
                     #Ref<0.1380161320.4108320780.177852>}}
6> flush().
Shell got {'$socket',{'$socket',#Ref<0.1380161320.4108451842.177694>},
                     select,#Ref<0.1380161320.4108320780.177852>}
ok
7> socket:recvfrom(S, 0, [], nowait).
{select,{select_info,recvfrom,
                     #Ref<0.1380161320.4108320780.177867>}}
8> flush().
Shell got {'$socket',{'$socket',#Ref<0.1380161320.4108451842.177694>},
                     select,#Ref<0.1380161320.4108320780.177867>}

We get an endless stream of select_infos without any hint that there might
be something hiding in the error queue.

I think this approach might be a good one.  When getting an error;
> read the error queue with timeout 0, maybe twice to give it higher
> effective priority, then back to normal reading.
>

To avoid the endless select loop from above, the user would have to do a
*speculative / blind* read of the error queue.

The IP_RECVERR flag has to be set explicitly, such a situation can
therefore not sneak up unexpectedly on a user. So this would IMHO be
acceptable, but needs very explicit documentation somewhere (maybe near the
recverr flag?)

I still would prefer to get the POLLERR vs. POLLIN indication in the select
message. Even when it only works on Linux and not on others.
The MSG_ERRQUEUE recvmsg flag and the whole IP_RECVERR mechanism to get the
payload of ICMP errors for UDP sockets is not portable and only usable on
Linux anyhow.
The classical approach on e.g. FreeBSD seems to involve an additional RAW
socket to read the ICMPs. Not sure how/if that is supposed to work on
Windows or OSX.

The problem I see with reporting the poll flags is that they are a
bitfield. Having both POLLIN and POLLERR set is therefore possible.
Representing this as a map or proplist seems a bit wasteful.
Since we always only request `POLLIN | POLLERR` and `POLLOUT | POLLERR`,
maybe using a select_info tuple of `{select_info, select_tag(),
select_ref(), select_opt()}` with `select_opt() :: {ReadWrite :: boolean(),
Error :: boolean()}` could work ???

Regards,
Andreas


> What do you think?
>
> / Raimo
>
>
> On Fri, Dec 04, 2020 at 10:45:03AM +0100, Raimo Niskanen wrote:
> > The VM team has now planned a task to extend the NIF API to handle
> > select/poll on error i.e enif_select_error() and ERL_NIF_SELECT_ERROR.
> >
> > When that is done we can use that in the socket API, probably exactly as
> > you first suggested.
> >
> > A question is if we should just add that, and all code using
> > the socket API has to be prepared for a new message, or if we need
> > an option on the 'otp' protocol level.
> >
> > The Experimental status of the socket API allows for such a
> > backwards-incompatible change, but that does not mean that
> > we need to do one...
> >
> > Until then I have just merged, into 'master', an optimization of
> > recv with Timeout =:= 0 so it skips the select/cancel select dance.
> >
> > Cheers
> > / Raimo
> >
> >
> > On Wed, Nov 25, 2020 at 02:45:22PM +0100, Raimo Niskanen wrote:
> > > On Wed, Nov 25, 2020 at 01:42:49PM +0100, Andreas Schultz wrote:
> > > > Am Mi., 25. Nov. 2020 um 11:37 Uhr schrieb Raimo Niskanen <
> > > > raimo+erlang-questions@REDACTED>:
> > > >
> > > > > Is it so that recvmsg(fd, &msg, MSG_ERRQUEUE) only receives from
> > > > > the error queue, and never any regular data?
> > > > >
> > > >
> > > > That is my understanding from the man page. Experiments also confirm
> this.
> > >
> > > The man page is a not entirely unambigous.  But I found a stackoverflow
> > > thread that also confirms this (also referring to experiments).
> > >
> > >
> https://stackoverflow.com/questions/17326913/linux-udp-socket-recvmsg-with-msg-errqueue
> > >
> > > >
> > > > Reading the errqueue is actually quite hard to test. The behavior
> for local
> > > > errors and remote errors (e.g. reception of ICMP errors) is sometimes
> > > > different.
> > > > Small sample:
> > > >
> > > > 1> {ok, Socket} = socket:open(inet, dgram, udp).
> > > > {ok,{'$socket',#Ref<0.1577644963.140640257.162512>}}
> > > > 2> ok = socket:setopt(Socket, ip, recverr, true).
> > > > ok
> > > > 3> Dest = #{family => inet, addr => {127,0,0,1}, port => 1234}.
> > > > #{addr => {127,0,0,1},family => inet,port => 1234}
> > > > 4> socket:sendto(Socket, <<"Data">>, Dest, nowait).
> > > > ok
> > > > 5> socket:sendto(Socket, <<"Data">>, Dest, nowait).
> > > > {error,{econnrefused,[<<"Data">>]}}
> > >
> > > Weird to get econnrefused from sendto() on an unconnected dgram socket.
> > >
> > > >
> > > >
> > > > The first sendto returns `ok`, the error can be read in subsequent
> recvmsg.
> > > > The second sendto returns the error immediately because the kernel
> has
> > > > learned that the local endpoint does not exist.
> > > >
> > > >
> > > >
> > > > > Today there is no VM support for a NIF do differ between POLLIN
> > > > > and POLLERR.  I have asked the VM guys to have a look at that.
> > > > >
> > > > > Without that you can as your response to receiving
> > > > > {'$socket',Socket,select,SelectHandle}
> > > > > call socket:recvmsg(Socket, [errqueue], 0), to poll, and then
> > > > > if the poll gave {error,timeout} call socket:recvmsg(Socket, 0,
> nowait).
> > > > >
> > > >
> > > > If was actually thinking about doing `socket:recvmsg(Socket,
> [errqueue],
> > > > nowait)`
> > > > if
> > > >    a) just received a select message and
> > > >    b) socket:recvfrom returned a new select info instead of reading
> any data
> > > >
> > > > That should capture the situation where only data in the errqueue is
> > > > present without having to
> > > > use the 0 timeout.
> > >
> > > Might there not be a possibility to starve out error messages in the
> face of
> > > continously incoming data with this approach?
> > >
> > > >
> > > > Then we will have to optimize Timeout =:= 0, and maybe introduce
> > > > > Timeout =:= poll with a nicer return value for no data.
> > > > >
> > > >
> > > > I like the general idea, but the `nowait` option now feels a  bit
> wrong. We
> > > > end up with `nowait == do a select` and `poll == just check if there
> is
> > > > data`.
> > > > It might be too late for changing `nowait`, but what about adding
> `select`
> > > > as an alias to `nowait` ?
> > >
> > > I think it is too late to change 'nowait' now, and it is not entirely
> off.
> > > It states that the call will not wait, but not what happens instead.
> > > While 'select' states what to do, but only sometimes.
> > >
> > > 'poll' also states what to do.  We could add 'select' as an alias for
> > > 'nowait', some might find it more appropriate.
> > >
> > > We thought about having only polling operations and a dedicated select
> > > operation, but that would need minimum two NIF calls also when handling
> > > much data, so we decided on doing automatic select depending on the
> > > result of the operation.
> > >
> > > / Raimo
> > >
> > >
> > > >
> > > > Regards,
> > > > Andreas
> > > >
> > > > Timeout =:= 0 today causes quite some overhead, but it should work.
> > > > >
> > > > > Cheers
> > > > > / Raimo
> > > > >
> > > > >
> > > > > On Mon, Nov 23, 2020 at 05:42:23PM +0100, Andreas Schultz wrote:
> > > > > > Hi,
> > > > > >
> > > > > > If setup a socket with:
> > > > > >
> > > > > >     socket:setopt(Socket, ip, recverr, true)
> > > > > >
> > > > > > If then started a asynchronous recvmsg with:
> > > > > >
> > > > > >     {select, SelectInfo} = socket:recvfrom(Socket, 0, [], nowait)
> > > > > >
> > > > > > When now something arrives in the error queue, I'll get a select
> info
> > > > > > message with:
> > > > > >
> > > > > >     {'$socket', Socket, select, SelectInfo}
> > > > > >
> > > > > > The problem is, nothing in there tells me to read from the error
> queue.
> > > > > The
> > > > > > underlying OS poll/epoll call would have this information, but
> it is lost
> > > > > > in the Erlang message.
> > > > > >
> > > > > > When I now try to read from the socket with:
> > > > > >
> > > > > >    socket:recvfrom(Socket, 0, [], nowait)
> > > > > >
> > > > > > all I get is another `{select, SelectInfo}` tuple, followed by
> another
> > > > > > `{'$socket', Socket, select, SelectInfo}` messages.
> > > > > > This can actually end up in an endless busy loop.
> > > > > >
> > > > > > To actually clear this I would to do a:
> > > > > >
> > > > > >     socket:recvmsg(Socket, [errqueue], nowait)
> > > > > >
> > > > > > On an POSIX socket, I would have to actually poll for POLLIN |
> POLLERR to
> > > > > > get a similar behavior. But the return of the poll would tell me
> whether
> > > > > it
> > > > > > was POLLIN or POLLERR (similar for epoll).
> > > > > > For the Erlang API it might sense to always poll for both
> conditions, but
> > > > > > we then should get an indication what exactly it was.
> > > > > >
> > > > > > Would it be possible to change the $socket message to something
> like:
> > > > > >
> > > > > >     {'$socket', Socket, error, SelectInfo}
> > > > > >
> > > > > > for POLLERR/EPOLLERR ???
> > > > > >
> > > > > > Regards
> > > > > > Andreas
> > > > > > --
> > > > > >
> > > > > > Andreas Schultz
> > > > >
> > > > > --
> > > > >
> > > > > / Raimo Niskanen, Erlang/OTP, Ericsson AB
> > > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Andreas Schultz
> > >
> > > --
> > >
> > > / Raimo Niskanen, Erlang/OTP, Ericsson AB
> >
> > --
> >
> > / Raimo Niskanen, Erlang/OTP, Ericsson AB
>
> --
>
> / Raimo Niskanen, Erlang/OTP, Ericsson AB
>


-- 

Andreas Schultz

-- 

Principal Engineer

t: +49 391 819099-224

------------------------------- enabling your networks
-----------------------------

Travelping GmbH
Roentgenstraße 13
39108 Magdeburg
Germany

t: +49 391 819099-0
f: +49 391 819099-299

e: info@REDACTED
w: https://www.travelping.com/
Company registration: Amtsgericht Stendal
Managing Director: Holger Winkelmann
Reg. No.: HRB 10578
VAT ID: DE236673780
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20210108/2f47a5d2/attachment.htm>


More information about the erlang-questions mailing list