[erlang-questions] how to flush a gen_tcp socket on close?

Fri Apr 6 12:39:03 CEST 2012

Matthias Lang <matthias@REDACTED> wrote:
>
>On Tuesday, April 03, Andreas Schultz wrote:
>
>> In my case the receiver is to slow to process all the data, sender
>> does 10k packets of 1k size, the receiver only gets the first 2000
>> packets. It might well be that I hit the close timeout and inet
>> discards the rest of the send queue.
>
>Ok, that's something to go on.

And I note that we have moved from the initial suspicion of "buffered
data is not flushed out, but simply discarded" to an observation of the
flushing doing a less than stellar job in specific circumstances. But
I'll admit to being unaware that it did such a bad job in circumstances
that aren't extremely unusual. Those circumstances being roughly "our
sending is so far ahead of the receiver's consumption of data that a)
the TCP window is closed, b) the kernel-level socket send buffer is
full, and c) inet_drv has written data into its user-level queue".

>I had a bit of a dig in prim_inet.erl. It sounds like you've looked
>there too. That code looks like it's intended to loop 'forever' trying
>to send the queued data, as long as some progress is made every so often
>(always sends at least something in every 5s timeout period). But running
>my program suggests that isn't happening as intended.

Yes, that code looks sort-of reasonable (modulo the bug you possibly
found), but it isn't really. I think we want to define "progress" as
"the receiver read *something* off his end of the connection". But the
check for progress is the size of the user-level queue, i.e. c) above,
and that is pretty far removed from this definition of progress: Reading
of data is not immediately reflected in opening of the TCP window (see
"silly window syndrome" - this is true for any non-broken TCP
implementation), and sending of data from the socket send buffer, once
the window *has* opened, may not immediately trigger poll(POLLOUT) (this
is true at least for the Linux version I experimented with, but is
probably common) - and this is what is required for the user-level queue
to shrink.

Finally, requiring progress at all is a rather arbitrary decision. We
might want to keep trying as long as we have evidence that the receiver
is even alive and possibly willing to read more at some point in the
future. This is what TCP/kernel does when you close(2) the OS-level TCP
socket with unsent data - even if the TCP window remains closed, it will
retain the buffer and keep sending window probes, as long as those
probes elicit ACKs from the other end. In fact this behavior is pretty
much spelled out in the venerable RFC 793, so it could be argued that
gen_tcp violates the TCP spec when it "gives up" on the close.

>> The fix should be simple, limit the send queue size.
>
>To what?
>
>Zero seems to be the only value that will work even for arbitrarily slow
>clients. And that defeats the point of having a send queue.

Exactly - which raises the question, what *is* the point of having a
send queue? I.e. the user-level queue that inet_drv maintains, and which
is causing these problems. I don't know, but since Tony has joined the
discussion, maybe he can answer.:-)

>It's late, I might have outsmarted myself, but my current feeling is
>that erlang is quietly tossing data and it shouldn't be.

I agree.

>Waiting for as long as it takes in close() seems like the right thing,
>though Per might disagree.

Well, as I wrote earlier, I don't expect close() to block until all data
is sent - in fact I don't expect it to block at all. We already have a
potentially-blocking send() call, with an optional timeout even (unlike
close()), why shouldn't that be enough? Your suggestion should get the
job done, but it would block until the user-level queue has drained,
which may in principle be forever.

> Waiting for N seconds in close() and then
>returning an error if the queue didn't empty would also be better than
>just quietly tossing it.

Maybe - but when you close(2) the OS-level socket (without SO_LINGER),
you are anyway saying that you don't want to be informed about the final
outcome, and if the receiver dies before reading all the data, you won't
be told. I don't think it makes a lot of sense to inform specifically
about the failure to drain the user-level queue - it might even give the
false impression that you will always be told about failures.

>(And: yes, I know, application-level ACKs would avoid this
>problem. But I'm not quite ready to say that this problem can't be
>fixed.)

I'm tempted to refer to the FAQ entry about Erlang message passing where
you quote other ramblings of mine - how much reliability do you want?
If someone's life depends on the receiver consuming all the data and
acting on it, you'd better have application-level ACKs despite the fact
that TCP (like Erlang message passing) is "reliable". If it's just a
question about whether a jpeg gets displayed in a web browser, probably
not.

But it's reasonable to expect that if the networks, hosts, and
applications keep running, and the user doesn't close the tab in his
browser, that jpeg *should* eventually be displayed in full even if the
user is on a slow dialup and the sender has long since completed his
close() call and gone on to other business (or even called it a day).
gen_tcp:close/1 doesn't meet this expectation.

--Per