[erlang-bugs] prim_inet:close/1 race condition

Dmitry Belyaev <>
Thu Oct 11 23:06:52 CEST 2012


Some days ago we found that we have thousands of leaked sockets in our project.

These sockets were ports with state like this:
[{name,"tcp_inet"},
 {links,[]},
 {connected,<0.54.0>},...]

We made investigation and found the cause of the leaks.

We have inets option {exit_on_close, false} to read statistics from the socket after it was closed by the peer. Process that controls the socket does not trap_exit and is linked with some another process.
At the end of connection controller calls gen_tcp:close/1 and sometimes the linked process dies at that the same moment. We found out that the gen_tcp:close/1 calls prim_inet:close/1, the first action of which is unlink from controlling process. So, when controller is unlinked from the port and is killed by the signal, port stays in the system because of exit_on_close feature.

I've made a module that sometimes may reveal the problem. https://gist.github.com/3875485
On my system I half of dozen calls to close_bug:start(1000) does find such leaked ports. 
I haven't found the right solution for the problem yet, so no patches at the moment.

Thank you for your attention.

-- 
Dmitry Belyaev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20121012/9030418d/attachment.html>


More information about the erlang-bugs mailing list