[erlang-bugs] : : : Follow up: [BUG] gen_tcp:connect/3, 4 returns socket for closed port

Raimo Niskanen <>
Mon Sep 15 09:56:08 CEST 2008


On Tue, Sep 09, 2008 at 01:32:07PM +0200, Raimo Niskanen wrote:
> On Tue, Sep 09, 2008 at 06:47:25AM -0400, Edwin Fine wrote:
> > Raimo,
> > 
> > Yes, it must be a port that is open in the firewall but have no listening
> > socket.
> > I have not tried it on other targets (I only have Windows and Linux, and I
> > tried connecting from Linux to Windows XP).
> > 
> > Hope this helps.
> 
> I can reproduce the bug on a SLES 10 SP 1 x86_64 
> "Erlang (BEAM) emulator version 5.6.4 [source] [64-bit] [smp:4] [async-threads:0] [hipe] [kernel-poll:false]\n"
> both towards an XP machine and towards another SLES 10 machine,
> but oddly enough not against the machine itself neither over
> the loopback interface nor the external interface. It probably
> suggests badass timing is involved. I hope debug compiled
> still shows the symptom.
> 
> I'll be back...
> 

Well, it was not even a clear-cut problem.

It turned out to be a known problem. We ran into it a few months ago
and the solution then was to ignore the problem i.e workaround in
the testcases. It was assumed we had found a Linux kernel bug.

We do as supposed. If connect() for a non-blocking socket fails
with EINPROGRESS we put it in the poll() set and call poll().
Later poll() returns with POLLERR|POLLHUP on the socket.
We call getsockopt(,SOL_SOCKET,SO_ERROR,,) to check if
the connect succeeded, so far all is as in the manual, but sometimes
it succeeds but the socket is unusable. All recv() and sendmsg(),
etc fails.

The symptoms was also not that bad. Any subsequent usage of the
sockets fails, which a real application will have to be
prepared to anyway.

But taking a closer lock with strace reveals that we call
connect() in one thread, poll() in another and getsockopt()
in a third. Sometimes, and sometimes all in the same thread.
This task wanders between the schedulers in our SMP VM.

And when the problem starts it seems poll() returns with
POLLOUT|POLLHUP for the socket before we call connect()
in another thread, which is temporally impossible.
I have seen this in one strace and can not reproduce it.
while strace is running the bug does not show itself.

So, whe have the possibilites:
1) A bug of ours where we mess up with the locking
   and loading of data for the poll set.
2) A Linux kernel bug in this rare case of tossing
   the task between threads.
3) An strace bug for SMP. Its view of the timeline
   is not necessarily correct.

I'll dig further.

I might write a small C program to try to provoke the Linux
kernel bug, and if it does not provoke it, it is our bug.

> > 
> > On Tue, Sep 9, 2008 at 3:12 AM, Raimo Niskanen <
> >  <raimo%>>wrote:
> > 
> > > On Mon, Sep 08, 2008 at 08:43:22AM -0400, Edwin Fine wrote:
> > > > Raimo,
> > > >
> > > > Thanks for the response. Good luck finding the bug. I just confirmed that
> > > it
> > > > is still present on R12B-4. Please note that you need to connect to a
> > > port
> > > > that is open but with no program using it (e.g. one could try port 80
> > > > without httpd running). Sorry to state the obvious, it's a bad habit of
> > > > mine.
> > >
> > > Nono, please state the obvious. People often leave out the
> > > obvious, that they think. And it turns out to be non-obvous.
> > >
> > > But on the other hand it may be confusing too. Do you mean
> > > that it must be a port that is open in the firewall
> > > but have no listening socket so you get the RST response
> > > from the TCP stack on the target machine (that is supposed
> > > to be Windows XP.  Have you tried other targets? Since you
> > > report having seen SYN in RST out on the target for all
> > > connection attempts it should not matter).
> > >
> > > >
> > > > Regards,
> > > > Edwin Fine
> > > >
> > > > On Mon, Sep 8, 2008 at 6:00 AM, Raimo Niskanen <
> > > > <raimo%><
> > > raimo%<raimo%>
> > > >>wrote:
> > > >
> > > > > On Sun, Sep 07, 2008 at 03:23:21PM -0400, Edwin Fine wrote:
> > > > > > Hi OTP Team,
> > > > > >
> > > > > > I realize you have been very busy with the R12B-4 release, and this
> > > is
> > > > > not a
> > > > > > complaint or criticism, just a request for info.
> > > > >
> > > > > Perhaps it should be...
> > > > >
> > > > > >
> > > > > > I reported this bug some weeks ago and have not received an
> > > > > acknowledgment.
> > > > > > I simply want to know if you accepted it, rejected it, or fixed it
> > > > > already
> > > > >
> > > > > You are right, we have been busy with the release.
> > > > >
> > > > > Your problem (as we say in swedish) fell between the chairs.
> > > > > If it is an inet_drv bug it is one guys problem, an SMP bug
> > > > > another guys problem. But enough excuses...
> > > > > we will look into it now. It sounds serious.
> > > > >
> > > > > > (and if so, in which release the fix appears). I have had to code
> > > around
> > > > > > this and would like to know if I can remove that code.
> > > > > >
> > > > > > Link to original bug report:
> > > > > > http://www.erlang.org/pipermail/erlang-bugs/2008-August/000931.html
> > > > > >
> > > > > > Best regards,
> > > > > > Edwin Fine
> > > > >
> > > > > > _______________________________________________
> > > > > > erlang-bugs mailing list
> > > > > > 
> > > > > > http://www.erlang.org/mailman/listinfo/erlang-bugs
> > > > >
> > > > > --
> > > > >
> > > > > / Raimo Niskanen, Erlang/OTP, Ericsson AB
> > > > >
> > > > >
> > >
> > > > _______________________________________________
> > > > erlang-bugs mailing list
> > > > 
> > > > http://www.erlang.org/mailman/listinfo/erlang-bugs
> > >
> > > --
> > >
> > > / Raimo Niskanen, Erlang/OTP, Ericsson AB
> > >
> > >
> 
> > _______________________________________________
> > erlang-bugs mailing list
> > 
> > http://www.erlang.org/mailman/listinfo/erlang-bugs
> 
> -- 
> 
> / Raimo Niskanen, Erlang/OTP, Ericsson AB
> _______________________________________________
> erlang-bugs mailing list
> 
> http://www.erlang.org/mailman/listinfo/erlang-bugs

-- 

/ Raimo Niskanen, Erlang/OTP, Ericsson AB



More information about the erlang-bugs mailing list