[erlang-questions] how to flush a gen_tcp socket on close?

Fri Apr 6 16:01:28 CEST 2012

On 6 apr 2012, at 12:39, Per Hedeland wrote:

> Matthias Lang <matthias@REDACTED> wrote:
>> 
>> On Tuesday, April 03, Andreas Schultz wrote:
>> 
>>> In my case the receiver is to slow to process all the data, sender
>>> does 10k packets of 1k size, the receiver only gets the first 2000
>>> packets. It might well be that I hit the close timeout and inet
>>> discards the rest of the send queue.
>> 
>> Ok, that's something to go on.
> 
> And I note that we have moved from the initial suspicion of "buffered
> data is not flushed out, but simply discarded" to an observation of the
> flushing doing a less than stellar job in specific circumstances. But
> I'll admit to being unaware that it did such a bad job in circumstances
> that aren't extremely unusual. Those circumstances being roughly "our
> sending is so far ahead of the receiver's consumption of data that a)
> the TCP window is closed, b) the kernel-level socket send buffer is
> full, and c) inet_drv has written data into its user-level queue".
> 
I had to do some digging myself to realize the same fact :-)

>> I had a bit of a dig in prim_inet.erl. It sounds like you've looked
>> there too. That code looks like it's intended to loop 'forever' trying
>> to send the queued data, as long as some progress is made every so often
>> (always sends at least something in every 5s timeout period). But running
>> my program suggests that isn't happening as intended.
> 
> Yes, that code looks sort-of reasonable (modulo the bug you possibly
> found), but it isn't really. I think we want to define "progress" as
> "the receiver read *something* off his end of the connection". But the
> check for progress is the size of the user-level queue, i.e. c) above,
> and that is pretty far removed from this definition of progress: Reading
> of data is not immediately reflected in opening of the TCP window (see
> "silly window syndrome" - this is true for any non-broken TCP
> implementation), and sending of data from the socket send buffer, once
> the window *has* opened, may not immediately trigger poll(POLLOUT) (this
> is true at least for the Linux version I experimented with, but is
> probably common) - and this is what is required for the user-level queue
> to shrink.
> 
That is pretty much what I found as well.

> Finally, requiring progress at all is a rather arbitrary decision. We
> might want to keep trying as long as we have evidence that the receiver
> is even alive and possibly willing to read more at some point in the
> future. This is what TCP/kernel does when you close(2) the OS-level TCP
> socket with unsent data - even if the TCP window remains closed, it will
> retain the buffer and keep sending window probes, as long as those
> probes elicit ACKs from the other end. In fact this behavior is pretty
> much spelled out in the venerable RFC 793, so it could be argued that
> gen_tcp violates the TCP spec when it "gives up" on the close.
> 

Maybe reading some stats about the socket to mimic the TCP behavior is
needed then? 

>>> The fix should be simple, limit the send queue size.
>> 
>> To what?
>> 
>> Zero seems to be the only value that will work even for arbitrarily slow
>> clients. And that defeats the point of having a send queue.
> 
> Exactly - which raises the question, what *is* the point of having a
> send queue? I.e. the user-level queue that inet_drv maintains, and which
> is causing these problems. I don't know, but since Tony has joined the
> discussion, maybe he can answer.:-)
> 
I remember us hacking a proxy more that 10 years ago, then we had to tweak sndbuf size 
inorder to keep Windows not killing buffers with RST.Iif I remember correctly we also had
to keep a low memory footprint per connection  in order to have plenty of them.
(too many dialup clients at that time)

Remember that inet_drv is not only used by gen_tcp but is also used by distribution.
The internal port queues are a great place to push stuff when you should be blocking,
but are not really ready to do that yet :-)

To push wouldblock back to Erlang could be a way of handling the problem
and let Erlang get the POLLOUT signal etc. But WE did not design it that way :-)

Plenty of history around the inet_drv, not saying it is doing the correct thing or even close
but it is an explanation.

>> It's late, I might have outsmarted myself, but my current feeling is
>> that erlang is quietly tossing data and it shouldn't be.
> 
> I agree.
> 
>> Waiting for as long as it takes in close() seems like the right thing,
>> though Per might disagree.
> 
> Well, as I wrote earlier, I don't expect close() to block until all data
> is sent - in fact I don't expect it to block at all. We already have a
> potentially-blocking send() call, with an optional timeout even (unlike
> close()), why shouldn't that be enough? Your suggestion should get the
> job done, but it would block until the user-level queue has drained,
> which may in principle be forever.
> 
While data is queued in user land and some stop the node (init:stop or ^C ...)
This data will be discarded, while if data made it into kernel it will not, 
I do not see how this can be fixed (except using queues in user land)

>> Waiting for N seconds in close() and then
>> returning an error if the queue didn't empty would also be better than
>> just quietly tossing it.
> 
> Maybe - but when you close(2) the OS-level socket (without SO_LINGER),
> you are anyway saying that you don't want to be informed about the final
> outcome, and if the receiver dies before reading all the data, you won't
> be told. I don't think it makes a lot of sense to inform specifically
> about the failure to drain the user-level queue - it might even give the
> false impression that you will always be told about failures.
> 
>> (And: yes, I know, application-level ACKs would avoid this
>> problem. But I'm not quite ready to say that this problem can't be
>> fixed.)
> 
> I'm tempted to refer to the FAQ entry about Erlang message passing where
> you quote other ramblings of mine - how much reliability do you want?
> If someone's life depends on the receiver consuming all the data and
> acting on it, you'd better have application-level ACKs despite the fact
> that TCP (like Erlang message passing) is "reliable". If it's just a
> question about whether a jpeg gets displayed in a web browser, probably
> not.
> 
> But it's reasonable to expect that if the networks, hosts, and
> applications keep running, and the user doesn't close the tab in his
> browser, that jpeg *should* eventually be displayed in full even if the
> user is on a slow dialup and the sender has long since completed his
> close() call and gone on to other business (or even called it a day).
> gen_tcp:close/1 doesn't meet this expectation.
> 
Agreed.

/Tony

> --Per
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions

"Installing applications can lead to corruption over time. Applications gradually write over each other's libraries, partial upgrades occur, user and system errors happen, and minute changes may be unnoticeable and difficult to fix"

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20120406/d318a248/attachment.htm>