The missing UDP packets

Alexander Petrovsky askjuise@REDACTED
Thu Feb 6 11:39:32 CET 2020


вт, 4 февр. 2020 г. в 21:07, Valentin Micic <v@REDACTED>:

>
>
> On 04 Feb 2020, at 16:53, Alexander Petrovsky <askjuise@REDACTED> wrote:
>
>
>
> вт, 4 февр. 2020 г. в 16:52, Valentin Micic <v@REDACTED>:
>
>>
>> On 04 Feb 2020, at 10:31, Alexander Petrovsky <askjuise@REDACTED> wrote:
>>
>> Hello!
>>
>> The first of all, my erlang version OTP-21.3 (without any minor patches)
>>
>> I've got the strange floating problem with UDP to receive, which I
>> couldn't debug by myself. Let me describe briefly what I have: There is UDP
>> socket which used for send packets to many peers and receives responses
>> from them (rate ~1pps):
>>
>> send request: SIP options
>> receive a response: 100
>> receive a response: 200
>>
>> In erlang it's implemented like gen_server which receives messages, the
>> sockets options are: binary, {active, once}.
>>
>> Every time, when the message is received from the mailbox, code try to do
>> some job, after that try to handle message bursts by calling in the loop
>> (by example 5 times):
>> gen_udp:recv(Socket, 0, 0), bursts and after that set the socket option {active,
>> once} back.
>>
>> So, here I'm observing floating strange behavior, from time to time the
>> second!! and always the second!! response is absent. I see this UDP missing
>> packets in tcpdump, I don't see any drops in UDP stack at all.
>>
>> When I'm trying to add some logs into my erlang code, the problem
>> disappears, when I'm run strace to recvfrom syscall the problem
>> disappears... I't try to debug VM with gdb and make some breakpoints on
>> sock_recvfrom in inet_drv.c (it's OTP-21.3), but the problem disappears
>> (((
>>
>> I can't find the problem root cause, is it in my code or is it in VM?
>> Maybe someone could help me with that?
>>
>> --
>> Петровский Александр / Alexander Petrovsky,
>>
>> Skype: askjuise
>> Phone: +7 931 9877991
>>
>>
>> So, I suppose this is related to SIP communication, and I’ll go out on a
>> limb here…
>>
>> Considering that SIP message, when transported using UDP, must fit into a
>> single UDP message, then, your statement that second and only second
>> response packet is missing, may imply that you are sending response using
>> two UDP packets.
>>
>> (A) AFAIK, SIP over UDP does not have any continuation, hence, must fit
>> into a single packet.
>> If you (by any chance) send SIP message over multiple UDP packets, this
>> shouldl not work — saying this just in case.
>>
>> (B) OTOH, if tcpdump reveals UDP fragmentation (where a single UDP
>> datagram is fragmented over multiple IP packets), then maybe your UDP
>> buffer it to small to assemble the whole message, in which case message
>> might be truncated, and your SIP parser may see it as the part of the
>> message (second!! and always second!!), that’s gone missing.
>>
>> (C) Finally, when your server processing is too slow for the size of the
>> buffer, then, there’s a chance that receive buffer may get full, in which
>> case it would drop the packet that cannot be added to the receive buffer.
>>
>> However, judging from your “second!! and always second!!” statement, I’d
>> say that (A) or (B), are more likely, barring something completely
>> different (e.g some other bug, perhaps ;-))
>>
>> Kind regards
>>
>> V/
>>
>>
>> .
>>
>>
>>
> Accordingly to tcpdump the size on packets from 300 to 400 bytes... also,
> I've double-checked, there are no fragmented packets at all, my ss -anump
> for beam shows:
>
> skmem:(r0,rb16384,t0,tb212992,f4096,w0,o0,bl0,d0)
>
> where:
>  - rb16384 - rx buffer;
>  - tb212992 - tx buffer;
>  - d0 - drop counter.
>
> So, I guess it's not an (A) nor (B). My internal erlang buffer *{buffer, *
> *Size}* (https://erlang.org/doc/man/inet.html) is 8192. It's could be
> (C), but the incoming packet rate is really low. Also, it's not a SIP
> parser for messages received over UDP, due to I don't see and packets in my
> receive process...
>
> --
> Петровский Александр / Alexander Petrovsky,
>
> Skype: askjuise
> Phone: +7 931 9877991
>
>
> We (and by that I include myself as well) have been using {active, once},
> followed by gen_udp:recv/3 construct quite often, and I do not recall ever
> loosing a packet when rates were less than, say,  80,000 UDP
> messages/second — provided that the server processing  is fast enough (but
> that's a different conversation).
>
> Transition between {active, once} and gen_udp:recv/3 will *never* loose
> any packets, as the packet will be in the UDP buffer until recalled by
> gen_udp:recv or another {active, once}
>
> Maybe you should  consider a situation when the second packet is *not* in
> the buffer yet (e.g. enroute).  In that case gen_udp:recv(sock, 0, 0 ) will
> fail with {error, timeout}, and you’ll pick up the “second” UDP message
> only next time when you call {active once}.
>
> So, why don’t you go and change this timeout value to something other than
> zero, say, 1000 (one second) just to see if the behaviour changes in any
> way. Then you could reduce it to something smaller… say 10, or even 5 ms.
>
>
> V/
>
>
Using systemtap, thanks to Andreas Schultz for a good idea, I find out,
that all UDP packets are read by erlang VM from Linux kernel buffers
without any losses.

Maybe you should  consider a situation when the second packet is *not* in
> the buffer yet (e.g. enroute).  In that case gen_udp:recv(sock, 0, 0 ) will
> fail with {error, timeout}, and you’ll pick up the “second” UDP message
> only next time when you call {active once}.


Yeah, it seems like you are right, the problem in pair {active, once}
and gen_udp:recv(sock,
0, 0). After packet delivered and read from the mailbox, gen_udp:recv(sock,
0, 0) is called (socket state is {active, false}) and it will fail with {error,
timeout}. It's doesn't matter how many times I'll call gen_udp:recv(sock,
0, 0) one time, one million times, or one million times with 1-second
interval - the result always will be {error, timeout}. But if I'll
call gen_udp:recv(sock,
0, 1) (timeout is 1ms by example) the result will be {error, timeout} too,
but! after that, each of N calls of gen_udp:recv(sock, 0, 0) (pay
attention, the timeout is 0ms) will gives UDP packet.

So, what I found next: I create UDP socket and set the {active, once} on
it, after the message is delivered, just call gen_udp:recv(sock, 0, 0) by
hand several times and see what happens in systemtap and in gdb.

a) the data evenly comes into UDP socket from network...
b) socket in state {active, once} and systemtap shows me that VM thread
read data from the socket (into the internal buffer) with recvfrom syscall
"in the loop";
c) socket in state {active, false} and  systemtap shows me that VM thread
read data from the socket (into the internal buffer) with recvfrom syscall
"in the loop";
d) call gen_udp:recv(sock, 0, 0) several times by hand, got {error, timeout}
; systemtap still shows me that VM thread read data from the socket (into
the internal buffer) with recvfrom syscall;
e) call gen_udp:recv(sock, 0, 0) with gdb breakpoint on
https://github.com/erlang/otp/blob/OTP-21.3/erts/emulator/drivers/common/inet_drv.c#L12528,
I see that call VM try to read from socket (in my mind it's strange, there
is data in the internal buffer...), recvfrom syscall return -1, the erlang
code got {error, timeout};
f) systemtap still shows me that VM thread read data from the socket (into
the internal buffer) with recvfrom syscall "in the loop";

g) call gen_udp:recv(sock, 0, 1) with gdb breakpoint once again, the
recvfrom syscall return -1, but in this case VM calls
https://github.com/erlang/otp/blob/OTP-21.3/erts/emulator/drivers/common/inet_drv.c#L12533
and
thread don't read data from the socket "in the loop" anymore. Next N
calls gen_udp:recv(sock, 0, 0) will return data, and each will call recvfrom
syscall. At this point, just for note, if incoming rate it more than
reading rate, I see UDP packet drops in the kernel buffers...
h) when N+1...N+M call of gen_udp:recv(sock, 0, 0) returns {error, timeout},
VM thread start read data from the socket "in the loop" once again...

So, this behaviour (strange in my mind) shed the light a little bit on my
missing UDP packets... I'm still wondering, why gen_udp:recv(sock, 0, 1)
moves socket into "manual control", why gen_udp:recv(sock, 0, 1) don't read
the data from the internal buffer?

-- 
Петровский Александр / Alexander Petrovsky,

Skype: askjuise
Phone: +7 931 9877991
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20200206/5d3b40bd/attachment.htm>


More information about the erlang-questions mailing list