The missing UDP packets

Valentin Micic v@REDACTED
Tue Feb 4 19:07:18 CET 2020



> On 04 Feb 2020, at 16:53, Alexander Petrovsky <askjuise@REDACTED> wrote:
> 
> 
> 
> вт, 4 февр. 2020 г. в 16:52, Valentin Micic <v@REDACTED <mailto:v@REDACTED>>:
> 
>> On 04 Feb 2020, at 10:31, Alexander Petrovsky <askjuise@REDACTED <mailto:askjuise@REDACTED>> wrote:
>> 
>> Hello!
>> 
>> The first of all, my erlang version OTP-21.3 (without any minor patches)
>> 
>> I've got the strange floating problem with UDP to receive, which I couldn't debug by myself. Let me describe briefly what I have: There is UDP socket which used for send packets to many peers and receives responses from them (rate ~1pps):
>> 
>> send request: SIP options
>> receive a response: 100
>> receive a response: 200
>> 
>> In erlang it's implemented like gen_server which receives messages, the sockets options are: binary, {active, once}.
>> 
>> Every time, when the message is received from the mailbox, code try to do some job, after that try to handle message bursts by calling in the loop (by example 5 times):
>> gen_udp:recv(Socket, 0, 0), bursts and after that set the socket option {active, once} back.
>> 
>> So, here I'm observing floating strange behavior, from time to time the second!! and always the second!! response is absent. I see this UDP missing packets in tcpdump, I don't see any drops in UDP stack at all.
>> 
>> When I'm trying to add some logs into my erlang code, the problem disappears, when I'm run strace to recvfrom syscall the problem disappears... I't try to debug VM with gdb and make some breakpoints on sock_recvfrom in inet_drv.c (it's OTP-21.3), but the problem disappears (((
>> 
>> I can't find the problem root cause, is it in my code or is it in VM? Maybe someone could help me with that? 
>> 
>> -- 
>> Петровский Александр / Alexander Petrovsky,
>> 
>> Skype: askjuise
>> Phone: +7 931 9877991
>> 
> 
> 
> So, I suppose this is related to SIP communication, and I’ll go out on a limb here…
> 
> Considering that SIP message, when transported using UDP, must fit into a single UDP message, then, your statement that second and only second response packet is missing, may imply that you are sending response using two UDP packets.
> 
> (A) AFAIK, SIP over UDP does not have any continuation, hence, must fit into a single packet.
> If you (by any chance) send SIP message over multiple UDP packets, this shouldl not work — saying this just in case.
> 
> (B) OTOH, if tcpdump reveals UDP fragmentation (where a single UDP datagram is fragmented over multiple IP packets), then maybe your UDP buffer it to small to assemble the whole message, in which case message might be truncated, and your SIP parser may see it as the part of the message (second!! and always second!!), that’s gone missing.
> 
> (C) Finally, when your server processing is too slow for the size of the buffer, then, there’s a chance that receive buffer may get full, in which case it would drop the packet that cannot be added to the receive buffer.
> 
> However, judging from your “second!! and always second!!” statement, I’d say that (A) or (B), are more likely, barring something completely different (e.g some other bug, perhaps ;-))
> 
> Kind regards
> 
> V/
> 
> 
> .
> 
> 
> 
> Accordingly to tcpdump the size on packets from 300 to 400 bytes... also, I've double-checked, there are no fragmented packets at all, my ss -anump for beam shows:
> 
> skmem:(r0,rb16384,t0,tb212992,f4096,w0,o0,bl0,d0)
> 
> where:
>  - rb16384 - rx buffer;
>  - tb212992 - tx buffer;
>  - d0 - drop counter.
> 
> So, I guess it's not an (A) nor (B). My internal erlang buffer {buffer, Size} (https://erlang.org/doc/man/inet.html <https://erlang.org/doc/man/inet.html>) is 8192. It's could be (C), but the incoming packet rate is really low. Also, it's not a SIP parser for messages received over UDP, due to I don't see and packets in my receive process...
> 
> -- 
> Петровский Александр / Alexander Petrovsky,
> 
> Skype: askjuise
> Phone: +7 931 9877991
> 

We (and by that I include myself as well) have been using {active, once}, followed by gen_udp:recv/3 construct quite often, and I do not recall ever loosing a packet when rates were less than, say,  80,000 UDP messages/second — provided that the server processing  is fast enough (but that's a different conversation).

Transition between {active, once} and gen_udp:recv/3 will *never* loose any packets, as the packet will be in the UDP buffer until recalled by gen_udp:recv or another {active, once} 

Maybe you should  consider a situation when the second packet is *not* in the buffer yet (e.g. enroute).  In that case gen_udp:recv(sock, 0, 0 ) will fail with {error, timeout}, and you’ll pick up the “second” UDP message only next time when you call {active once}.

So, why don’t you go and change this timeout value to something other than zero, say, 1000 (one second) just to see if the behaviour changes in any way. Then you could reduce it to something smaller… say 10, or even 5 ms.


V/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20200204/bf687d6c/attachment.htm>


More information about the erlang-questions mailing list