[erlang-questions] {error,closed} vs. {error,econnreset}

Bekes, Andras G Andras.Bekes@REDACTED
Fri Aug 31 12:43:29 CEST 2018


Hi Lukas, All,

I do not have TCP dump, but according to strace, the error is:
<... writev resumed> ) = -1 ENOTCONN (Transport endpoint is not connected)
The error indeed happens on a write, and on the Erlang side this is reported as a normal close, at the next gen_tcp:recv call.

Let us not go into discussing what can cause this error on a  socket (unix domain BTW). It is irrelevant here.
The important here is that this socket error, and any other socket error shall not be confused with an orderly shutdown of the socket.

I support the below mentioned proposal of making the default behavior report all errors, and the masking behavior explicitly selected with an option.

Regards,
   Andras

From: Lukas Larsson [mailto:lukas@REDACTED]
Sent: Thursday, August 16, 2018 2:37 PM
To: Bekes, Andras G (IST)
Cc: Erlang Questions
Subject: Re: [erlang-questions] {error,closed} vs. {error,econnreset}

Hello,
On Mon, 6 Aug 2018, 10:49 Bekes, Andras G, <Andras.Bekes@REDACTED<mailto:Andras.Bekes@REDACTED>> wrote:
Hi All,

Reviving this old thread again, because I am getting more and more convinced that we need further changes.
We're still observing connection close events when an error should be reported to gen_tcp level.
It can be a reset error somehow still not reported as 'econnreset', but I suspect it must be some other error.

I suppose that you have not managed to catch the error in a tcp dump as Rory asked for?


I checked the code in inet_drv.c. The function
static int tcp_recv(tcp_descriptor* desc, int request_len)
seems to work properly -- a reset is either reported as closed or econnreset, depending on show_econnreset, all other errors are reported as errors.

However, the function
static int tcp_send_or_shutdown_error(tcp_descriptor* desc, int err)
hides errors. Connection reset errors are properly handled (either reported as closed or econnreset, depending on show_econnreset), but all other errors are just reported as closed.
Active and passive modes have independent code paths, but I think both do the same: all errors are reported as normal close -- except for econnreset.

There is a merge of error codes happening in tcp_send_error, so some other errors get mapped into econnreset before tcp_send_or_shutdown_error is called.


Apparently I need to detect all errors.
Is it possible to implement a show_errors (or show_all_errors) flag, too?

Actually, this new flag could replace the current show_econnreset flag.
Having two separate flags for econnreset & others requires more complex code, but having a single show_errors flag would simplify the current that provides special treatment for econnreset.
I am not sure if it makes much sense to expose connection reset errors but still mask all other errors as normal close events.

From a farther point of view, it seems there are network-programming tasks (there is at least one!), for which Erlang seems not suitable. This sounds rather sad.
Luckily the fix doesn't seem difficult.

What do you think?

I agree that it should be possible to get the original error from the tcp stack. Given the discussions here https://github.com/erlang/otp/pull/731, maybe it is time to reverse the options so that returning the original error becomes the default and you have to set an option to get the backwards compatible behaviour?

We are currently in the process of a major overhaul of gen_tcp and friends, so maybe this can be changed while doing that, as we are bound to break backwards compatibility in various ways during that rewrite anyways.

Lukas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20180831/08147401/attachment.htm>


More information about the erlang-questions mailing list