[erlang-questions] ssl_esock leaking file descriptors

Gordon Guthrie gordon@REDACTED
Wed Aug 31 16:35:50 CEST 2011


We get an intermittent ssl_esock problem which I have never successful
reproduced. It goes to 100% and the process needs to be manually killed.

Richard Andrews also reported a problem with it going to 100% CPU in 2009:
http://erlang.2086793.n4.nabble.com/ssl-esock-spinning-out-of-control-in-poll-td2117067.html

He has a patch for that.

It is on my 'long list' of things to fix but more frequent/reproducable ones
allways get in the way.

Gordon

On 31 August 2011 15:13, Justin Milam <jsmilam@REDACTED> wrote:

> I've started to notice a slow leak of file descriptors in the ssl_esock
> port. I'm running Erlang R14B and using SSL to encrypt traffic over the
> Erlang distribution protocol. The cluster has 10 nodes minimum with
> transient nodes joining and leaving the cluster regularly. From checking the
> ssl_esock process with lsof it appears to be slowly leaking file
> descriptors. The number of open file descriptors seems to increase after a
> node joins the cluster and then leaves. Eventually ssl_esock holds open
> enough file descriptors to hit the ulimit (currently 8192) in which case
> ssl_esock goes into an infinite loop using near 100% of one of the CPUs.
>
> I've been able to reproduce the issue by lowering the ulimit and
> continually connecting/disconnecting a remote shell to a local running node
> until the ulimit is reached. When ssl_esock is running in debug mode I see
> the following being logged continually:
>
> ==========LOOP=============
> MASKS SET FOR FD: 27 (read) 26 (read) 25 (read) 24 (read) 19 (read) 18
> (read) 17 (read) 16 (read) 12 (read) 11 (read) 10 (read) 9 (read) 8 (read) 7
> (read) 6 (read)
> CONNECTIONS:
>  - DEFUNCT [0x8772978] (fd = 29)
>  - DEFUNCT [0x86f9950] (fd = 28)
>  - JOINED [0x875ae30] (origin = accept)
>        (fd = 26, eof = 0, wq = 0, bp = 0)
>        (proxyfd = 27, eof = 0, wq = 0, bp = 0)
>  - JOINED [0x86fa970] (origin = accept)
>        (fd = 24, eof = 0, wq = 0, bp = 0)
>        (proxyfd = 25, eof = 0, wq = 0, bp = 0)
>  - DEFUNCT [0x8733600] (fd = 21)
>  - DEFUNCT [0x8732c38] (fd = 20)
>  - JOINED [0x8733958] (origin = accept)
>        (fd = 18, eof = 0, wq = 0, bp = 0)
>        (proxyfd = 19, eof = 0, wq = 0, bp = 0)
>  - JOINED [0x8734f78] (origin = accept)
>        (fd = 16, eof = 0, wq = 0, bp = 0)
>        (proxyfd = 17, eof = 0, wq = 0, bp = 0)
>  - CONNECTED [0x87134a8] (fd = 15)
>  - DEFUNCT [0x871f220] (fd = 13)
>  - JOINED [0x87147d0] (origin = accept)
>        (fd = 11, eof = 0, wq = 0, bp = 0)
>        (proxyfd = 12, eof = 0, wq = 0, bp = 0)
>  - JOINED [0x87083d0] (origin = connect)
>        (fd = 9, eof = 0, wq = 0, bp = 0)
>        (proxyfd = 10, eof = 0, wq = 0, bp = 0)
>  - JOINED [0x86f29e8] (origin = connect)
>        (fd = 7, eof = 0, wq = 0, bp = 0)
>        (proxyfd = 8, eof = 0, wq = 0, bp = 0)
>  - ACTIVE_LISTENING [0x86f2258] (fd = 6, acceptors = 1)
> Before poll/select: 15 descriptors (total 29)
> Error calling accept()
> accept error (proxy_listensock): emfile
>
> Has anyone else experienced such behavior?
>
> Thanks
>
> -justin
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
>


-- 
Gordon Guthrie
CEO hypernumbers

http://hypernumbers.com
t: hypernumbers
+44 7776 251669
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20110831/8adbb612/attachment.htm>


More information about the erlang-questions mailing list