<div>I've started to notice a slow leak of file descriptors in the ssl_esock port. I'm running Erlang R14B and using SSL to encrypt traffic over the Erlang distribution protocol. The cluster has 10 nodes minimum with transient nodes joining and leaving the cluster regularly. From checking the ssl_esock process with lsof it appears to be slowly leaking file descriptors. The number of open file descriptors seems to increase after a node joins the cluster and then leaves. Eventually ssl_esock holds open enough file descriptors to hit the ulimit (currently 8192) in which case ssl_esock goes into an infinite loop using near 100% of one of the CPUs.</div>
<div><br></div><div>I've been able to reproduce the issue by lowering the ulimit and continually connecting/disconnecting a remote shell to a local running node until the ulimit is reached. When ssl_esock is running in debug mode I see the following being logged continually:</div>
<div><br></div><div><div>==========LOOP=============</div><div>MASKS SET FOR FD: 27 (read) 26 (read) 25 (read) 24 (read) 19 (read) 18 (read) 17 (read) 16 (read) 12 (read) 11 (read) 10 (read) 9 (read) 8 (read) 7 (read) 6 (read) </div>
<div>CONNECTIONS:</div><div> - DEFUNCT [0x8772978] (fd = 29)</div><div> - DEFUNCT [0x86f9950] (fd = 28)</div><div> - JOINED [0x875ae30] (origin = accept)</div><div> (fd = 26, eof = 0, wq = 0, bp = 0)</div><div> (proxyfd = 27, eof = 0, wq = 0, bp = 0)</div>
<div> - JOINED [0x86fa970] (origin = accept)</div><div> (fd = 24, eof = 0, wq = 0, bp = 0)</div><div> (proxyfd = 25, eof = 0, wq = 0, bp = 0)</div><div> - DEFUNCT [0x8733600] (fd = 21)</div><div> - DEFUNCT [0x8732c38] (fd = 20)</div>
<div> - JOINED [0x8733958] (origin = accept)</div><div> (fd = 18, eof = 0, wq = 0, bp = 0)</div><div> (proxyfd = 19, eof = 0, wq = 0, bp = 0)</div><div> - JOINED [0x8734f78] (origin = accept)</div><div> (fd = 16, eof = 0, wq = 0, bp = 0)</div>
<div> (proxyfd = 17, eof = 0, wq = 0, bp = 0)</div><div> - CONNECTED [0x87134a8] (fd = 15)</div><div> - DEFUNCT [0x871f220] (fd = 13)</div><div> - JOINED [0x87147d0] (origin = accept)</div><div> (fd = 11, eof = 0, wq = 0, bp = 0)</div>
<div> (proxyfd = 12, eof = 0, wq = 0, bp = 0)</div><div> - JOINED [0x87083d0] (origin = connect)</div><div> (fd = 9, eof = 0, wq = 0, bp = 0)</div><div> (proxyfd = 10, eof = 0, wq = 0, bp = 0)</div><div>
- JOINED [0x86f29e8] (origin = connect)</div><div> (fd = 7, eof = 0, wq = 0, bp = 0)</div><div> (proxyfd = 8, eof = 0, wq = 0, bp = 0)</div><div> - ACTIVE_LISTENING [0x86f2258] (fd = 6, acceptors = 1)</div><div>
Before poll/select: 15 descriptors (total 29)</div><div>Error calling accept()</div><div>accept error (proxy_listensock): emfile</div></div><div><br></div><div>Has anyone else experienced such behavior?</div><div><br></div>
<div>Thanks</div><div><br></div><div>-justin</div>