The missing UDP packets

Raimo Niskanen raimo+erlang-questions@REDACTED
Tue Feb 18 16:05:07 CET 2020


The patch is now merged to 'maint' and 'master for the upcoming releases
23.0-rc1 and 22.3.

/ Raimo


On Fri, Feb 14, 2020 at 02:33:45PM +0100, Raimo Niskanen wrote:
> On Fri, Feb 14, 2020 at 04:23:15PM +0300, Alexander Petrovsky wrote:
> > Hello!
> > 
> > We have tested this fix in the idle state when there is not so much traffic
> > on the UDP socket and also under the load. In both cases, the problem has
> > disappeared, so I guess we could say that the fix works correctly, thanks!
> 
> Good to know.  Thank you for informing!
> / Raimo
> 
> 
> > 
> > ср, 12 февр. 2020 г. в 17:13, Raimo Niskanen <
> > raimo+erlang-questions@REDACTED>:
> > 
> > > On Wed, Feb 12, 2020 at 02:31:10PM +0300, Alexander Petrovsky wrote:
> > > > Awesome, thanks a lot :)
> > > >
> > > > I'll test this fix and in case of some problems, I will let you know!
> > >
> > > Great!
> > >
> > > I would apprecite if you could verify also if the fix works,
> > > so I can merge it into ourt maint and master.
> > >
> > > / Raimo
> > >
> > >
> > > >
> > > > вт, 11 февр. 2020 г. в 21:01, Valentin Micic <v@REDACTED>:
> > > >
> > > > > Thank you Raimo, much obliged.
> > > > > I am sure this will make Alexander very happy :-)
> > > > >
> > > > > V/
> > > > >
> > > > > > On 11 Feb 2020, at 17:21, Raimo Niskanen <
> > > > > raimo+erlang-questions@REDACTED> wrote:
> > > > > >
> > > > > > On Mon, Feb 10, 2020 at 04:23:37PM +0100, Raimo Niskanen wrote:
> > > > > >> I can reproduce it on a month old 'master'.
> > > > > >> We'll take a look at it...
> > > > > >>
> > > > > >> Thank you for a pinpointed bug report!
> > > > > >> / Raimo
> > > > > >
> > > > > > I have a patch that fixes the problem:
> > > > > >
> > > > > > @@ -12609,12 +12609,8 @@ static ErlDrvSSizeT
> > > packet_inet_ctl(ErlDrvData
> > > > > e,
> > > > > > unsigned int cmd, char* buf,
> > > > > >               return ctl_error(EALREADY, rbuf, rsize);
> > > > > >
> > > > > >           if (packet_inet_input(udesc, desc->event) == 0) {
> > > > > > -             if (timeout == 0)
> > > > > > -                 async_error_am(desc, am_timeout);
> > > > > > -             else {
> > > > > > -                 if (timeout != INET_INFINITY)
> > > > > > -                     driver_set_timer(desc->port, timeout);
> > > > > > -             }
> > > > > > +                if (timeout != INET_INFINITY)
> > > > > > +                    driver_set_timer(desc->port, timeout);
> > > > > >           }
> > > > > >           return ctl_reply(INET_REP_OK, tbuf, 2, rbuf, rsize);
> > > > > >       }
> > > > > >
> > > > > > It should be possible to apply to any version.  The code lines I
> > > removed
> > > > > > have been around since we entered GitHub.
> > > > > >
> > > > > > The shortcut for timeout == 0 was flawed.  Right before this an async
> > > > > event
> > > > > > was enqueued, which is not cleared when taking this short way out,
> > > and
> > > > > the
> > > > > > packet_inet_input function had put the socket in the poll set, which
> > > the
> > > > > > shortcut also ignored.  So when the UDP packet arrives it is
> > > received and
> > > > > > delivery to the caller fails since there is no longer any caller -
> > > the
> > > > > caller
> > > > > > was already told {error, timeout}.  The packet gets wasted.
> > > > > >
> > > > > > This fix removes the shortcut for timeout == 0.  It would be
> > > possible to
> > > > > > instead fix the shortcut, which could improve latency slightly.  With
> > > > > this
> > > > > > fix code sets a timer for timeout == 0, which immediately times out
> > > > > > and then all internal state gets updated as it should.  The same code
> > > > > path
> > > > > > is used for all timeouts including 0.
> > > > > >
> > > > > > Again, thank you for the bug report and pinpointing.
> > > > > >
> > > > > > I will now create an internal ticket, write a test case, and prepare
> > > > > > for a patch...  We would prefer to just include this fix in the
> > > upcoming
> > > > > > OTP 22.3.
> > > > > >
> > > > > > Best regards
> > > > > > / Raimo Niskanen
> > > > > >
> > > > > >
> > > > > >>
> > > > > >>
> > > > > >> On Fri, Feb 07, 2020 at 04:50:54PM +0300, Alexander Petrovsky wrote:
> > > > > >>>>
> > > > > >>>> From: Valentin Micic
> > > > > >>>>
> > > > > >>>> I’ve run a couple of tests on my own, and managed to reproduce
> > > your
> > > > > >>>> “second! and only second!!” packet problem.
> > > > > >>>>
> > > > > >>>> It happens under following circumstances:
> > > > > >>>>
> > > > > >>>> 1) Socket (Rx) is instructed to provide a single packet as it
> > > arrives
> > > > > ( as
> > > > > >>>> per {active, once} );
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> 2) After receiving this packet, a call to gen_udp:recv( Rx, 0, 0
> > > ) is
> > > > > >>>> executed before another packet has ARRIVED to the kernel buffer;
> > > > > >>>>
> > > > > >>>> Then the following diagram illustrates the problem:
> > > > > >>>>
> > > > > >>>> Note:
> > > > > >>>>    - Tx is sending socket;
> > > > > >>>>    - Rx is receiving socket;
> > > > > >>>>
> > > > > >>>> *Please note that sequence diagrams below are not completely
> > > > > accurate, but
> > > > > >>>> it is reasonably illustrative of what may have happened.*
> > > > > >>>>
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> The next sequence diagram depicts a situation when a non-zero
> > > value
> > > > > for
> > > > > >>>> gen_udp:recv/3, e.g. gen_udp:recv( Rx, 0, *10* ):
> > > > > >>>>
> > > > > >>>>
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> Thus, it appears that gen_udp:recv( *Socket, Length*, *Timeout* )
> > > is
> > > > > >>>> broken when *Timeout* is set to *0*.
> > > > > >>>> For all positive non-zero values for Timeout, it behaves properly
> > > (or
> > > > > as
> > > > > >>>> intuitively expected).
> > > > > >>>>
> > > > > >>>> Therefore, this should either be fixed, or it should be
> > > documented in
> > > > > >>>> order to indicate that 0 must *not* be used as a value for the
> > > > > Timeout in
> > > > > >>>> gen_udp:recv/3 call.
> > > > > >>>>
> > > > > >>>> Aside, I fail to see a benefit of using Timeout of value 0 when
> > > > > combined
> > > > > >>>> with {active, once}. The only time when zero value may appear to
> > > be
> > > > > useful
> > > > > >>>> is when server does need to process other messages, e.g. messages
> > > not
> > > > > >>>> originated by the socket, which would be the case when gen_server
> > > is
> > > > > used
> > > > > >>>> to support the socket. In my view (which is supported by
> > > substantial
> > > > > >>>> empirical evidence), gen_server is not the most optimal behaviour
> > > to
> > > > > use
> > > > > >>>> when implementing support for sockets.
> > > > > >>>>
> > > > > >>>> Kind regards
> > > > > >>>>
> > > > > >>>> V/
> > > > > >>>>
> > > > > >>>>
> > > > > >>> So, seems like there is some buggy behavior in the erlang VM for
> > > > > reading
> > > > > >>> UDP messages between switching from {active, once} to
> > > > > gen_udp:recv(socket,
> > > > > >>> 0, 0)? Lukas, Raimo could you help with that?
> > > > > >>>
> > > > > >>> --
> > > > > >>> Петровский Александр / Alexander Petrovsky,
> > > > > >>>
> > > > > >>> Skype: askjuise
> > > > > >>> Phone: +7 931 9877991
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> --
> > > > > >>
> > > > > >> / Raimo Niskanen, Erlang/OTP, Ericsson AB
> > > > > >
> > > > > > --
> > > > > >
> > > > > > / Raimo Niskanen, Erlang/OTP, Ericsson AB
> > > > >
> > > > >
> > > >
> > > > --
> > > > Alexander Petrovsky
> > >
> > > --
> > >
> > > / Raimo Niskanen, Erlang/OTP, Ericsson AB
> > >
> > 
> > 
> > -- 
> > Alexander Petrovsky
> 
> -- 
> 
> / Raimo Niskanen, Erlang/OTP, Ericsson AB

-- 

/ Raimo Niskanen, Erlang/OTP, Ericsson AB


More information about the erlang-questions mailing list