strange packet loss (solved, dtrace is your friend!)
Joel Reymont
joelr1@REDACTED
Wed Aug 12 13:43:59 CEST 2009
I solved this issue with the help of dtrace and thought I'd log the
troubleshooting process for posterity. As I suspected, it was all my
own fault.
I'm on Mac OSX Snow Leopard but it should work on Leopard just as
well. Solaris may require a few tweaks.
I used tcpdump at first to make sure "lost" packets were getting to
the receiver, e.g.
---
17:04:19.009795 IP (tos 0x0, ttl 64, id 40175, offset 0, flags [DF],
proto TCP (6), length 57, bad cksum 0 (->9fcd)!)
localhost.us-cli > localhost.47835: Flags [P.], cksum 0xfe2d
(incorrect -> 0x7df6), seq 1:6, ack 40, win 65535, options [nop,nop,TS
val 960438746 ecr 960438746], length 5
0x0000: 4500 0039 9cef 4000 4006 0000 7f00 0001 E..9..@REDACTED@.......
0x0010: 7f00 0001 1f92 badb 7b0a ad1e fc66 b940 ........{....f.@
0x0020: 8018 ffff fe2d 0000 0101 080a 393f 21da .....-......9?!.
0x0030: 393f 21da 0003 4143 4b 9?!...ACK
---
I then used dtruss to look at the system calls and noticed that Erlang
used recvfrom to receive the packets. I proceeded to use dapptrace to
look at the function calls and discovered tcp_recv. A simple grep
located tcp_recv in the inet driver (erts/emulator/drivers/common/
inet_drv.c).
I whipped out a custom dtrace script to dump the buffer populated by
recvfrom as well as to look at the i_remain field of the
tcp_descriptor structure. i_remain is what controls how much data
there's to read.
Notice in this output that there are 0 bytes remaining to read.
---
0 -> tcp_recv tcp_recv(0x1006E69B0,
0x0, 0x0), i_remain = 0
0 -> recvfrom recvfrom(0xB,
0x1006AD470, 0x5B4)
0 <- recvfrom recvfrom = 17
0 1 2 3 4 5 6 7 8 9 a b c d e f
0123456789abcdef
0: 00 03 41 43 4b 7f 00 00 00 00 00 00 00 00 00
00 ..ACK...........
---
There are bytes remaining to read when a "lost" packet gets into Erlang.
---
0 -> tcp_recv tcp_recv(0x1025C42E8,
0x0, 0x0), i_remain = 16706
0 -> recvfrom recvfrom(0x75B,
0x1006AD473, 0x4142)
0 <- recvfrom recvfrom = 17
0 1 2 3 4 5 6 7 8 9 a b c d e f
0123456789abcdef
0: 00 03 41 43 4b 00 00 00 00 00 00 00 00 e5 73
c1 ..ACK.........s.
---
Searching dtrace output from receives on the same socket (0x75B), I
saw an ACK previously received without the 2-byte length prefix.
---
-> tcp_recv tcp_recv(0x1025C42E8, 0x0, 0x0), i_remain = 0
1
->
recvfrom recvfrom(0x75B, 0x1006AD470, 0x5B4)
1
<
- recvfrom recvfrom = 17
0 1 2 3 4 5 6 7 8 9 a b c d e f
0123456789abcdef
0: 41 43 4b 43 4b 7f 00 00 00 00 00 00 00 00 00 00
ACKCK...........
---
My server sent ACK without the size prefix, i.e. {packet, 0} and
Erlang on the client size interpreted "AC" as the 2-byte packet size
since the client is in {packet, 2} mode. Erlang then proceeded to
gather network packets to reach the 16707 byte target and this looked
like packet loss on the client side.
---
<<Size:16>> = <<16#41, 16#43>>.
<<"AC">>
Size.
16707
---
Finally, here's the dtrace script that helped me take my foot out of
my own mouth.
---
#pragma D option flowindent
typedef struct {
char inet[400];
int high;
int low;
int send_timeout;
int send_timeout_close;
int busy_on_send;
int i_bufsz;
char* i_buf;
char* i_ptr;
char* i_ptr_start;
int i_remain;
int tcp_add_flags;
int http_state;
void *multi_first;
void *multi_last;
void *mtd;
} tcp_descriptor;
pid$target::recvfrom:entry {
self->ptr = arg1;
printf("%s(0x%X, 0x%X, 0x%X)\n", probefunc, arg0, arg1, arg2);
}
pid$target::recvfrom:return {
printf("%s = %d\n", probefunc, (int)arg0);
tracemem(copyin(self->ptr, (int)arg0), 16);
}
pid$target::tcp_recv:entry {
self->desc = copyin(arg0, sizeof(tcp_descriptor));
printf("%s(0x%X, 0x%X, 0x%X), i_remain = %d\n", probefunc, arg0,
arg1, arg2, ((tcp_descriptor *)self->desc)->i_remain);
}
---
I cheated by grabbing the size of 'struct inet_descriptor' in gdb (400
bytes on x86-64) and defining unessential pointers as 'void *'. Note
my use of 'tracemem' to dump memory, 'copyin' to move memory into the
local memory space (dtrace runs in the kernel) and, finally, the
casting to 'tcp_descriptor *' to grab the i_remain field.
Why the mucking about with packet size?
This particular server handles flash connections. On the one hand,
Flash doesn't know about {packet, 2} and on the other hand the only
thing that Flash does that's out of programmer control is request the
policy file.
The server starts with {packet, 0} and looks for <<"<policy-file-
request/>", 0>>, sent automatically by browser Flash. Once Flash
control is back to the programmer, <<255, 255, 255, 255>> is sent to
switch the server into {packet, 2}.
I use a gen_fsm to distinguish between packet length of 65535 (<<255,
255>>) and the <<255, 255, 255, 255>> switching sequence above.
---
Mac hacker with a performance bent
http://www.linkedin.com/in/joelreymont
More information about the erlang-questions
mailing list