[erlang-questions] hanging nodes

Vlad Dumitrescu vladdu55@REDACTED
Wed Aug 20 15:20:21 CEST 2014

Hi Lukas,

I was about to send this anyway. Maybe this is enough from the dump? If not
i will send the whole of it. You can also see it's R16B03-1, but it's the
same for 15 and 17.

The stacktrace is

0523f644 0f528254 02b40e88 000140cb 02b40d40 beam_smp!fd_stop+0x5a
@ 2351]
0523f660 0f529fe5 fffffffb 02b40d40 000140cb beam_smp!terminate_port+0x64
@ 3417]
0523f678 0f52ad53 0000001b 00014607 000140cb
@ 3651]
0523f694 0f5cdd99 02b40d40 0f64099c 000000e8
@ 6948]
0523f6bc 0f543c5d 00000000 0000027c 03700580 beam_smp!ready_output+0xa9
@ 2850]
0523f700 0f53dba4 03700580 0370719c 000007d0
@ 1689]
0523f740 0f5a9255 03700580 000007d1 02723fe0 beam_smp!schedule+0xb24
@ 7065]
0523f790 77bf38aa 00000017 03707100 00000f4b beam_smp!process_main+0x125

And the code is sys.c

static void fd_stop(ErlDrvData data)
  DriverData * dp = (DriverData *) data;
   * There's no way we can terminate an fd port in a consistent way.
   * Instead we let it live until it's opened again (which it is,
   * as the only FD-drivers are for 0,1 and 2 adn the only time they
   * get closed is by init:reboot).
   * So - just deselect them and let everything be as is.
   * They get woken up in fd_start again, where the DriverData is
   * remembered. /PaN
  if (dp->in.ov.hEvent != NULL) {
      (void) driver_select(dp->port_num,
    ERL_DRV_READ, 0);
  if (dp->out.ov.hEvent != NULL) {
      (void) driver_select(dp->port_num,
    ERL_DRV_WRITE, 0);
      do {
      } while (WaitForSingleObject(dp->out.flushReplyEvent, 10) ==
       || !(dp->out.flags & DF_THREAD_FLUSHED)); // this is line 2351


The code is not really hung, but it looks like the do/while loop is never
exited. (It's possible that there's another loop at a higher level, in any
case looking at the stacktrace et different moments it is most often like
the one above).

The use case is that the erlang node is started from java and the problem
happens when the java process is killed mercilessly. I will try to create a
simple test to reproduce this.


On Wed, Aug 20, 2014 at 3:08 PM, Lukas Larsson <lukas@REDACTED> wrote:

> Hello Vlad,
> This is most probably a windows only issue as the code in the stacktrace
> runs through a lot of windows specific code. I guess the easiest way would
> be if you are able to write a testcase that can reproduce the error so that
> I can debug it here. If that is not possible, maybe you can attach to the
> process and send me a windows "core file"?
> To do this you have to install WinDbg[1] and then do "File->Attach to a
> Process". After this you open the WinDbg console and type ".dump /ma
> c:\beam.smp.dmp" and then upload that file on a server where I can download
> it. Also I assume you have used one of the installers on erlang.org?
> Which version of windows+Erlang/OTP are you using?
> Lukas
> On Wed, Aug 20, 2014 at 11:18 AM, Vlad Dumitrescu <vladdu55@REDACTED>
> wrote:
>> Some details: the beam process seems to have a thread that sits and does
>> something. I don't know if the following example stack helps.
>> best regards,
>> Vlad
>> wow64cpu.dll!TurboDispatchJumpAddressEnd+0x63b
>> wow64.dll!Wow64SystemServiceEx+0x1ce
>> wow64.dll!Wow64LdrpInitialize+0x42a
>> ntdll.dll!RtlIsDosDeviceName_U+0x23a27
>> ntdll.dll!LdrInitializeThunk+0xe
>> kernel32.dll!SetEvent+0x2
>> beam.smp.dll!fd_stop+0x4f
>> beam.smp.dll!terminate_port+0x64
>> beam.smp.dll!erts_deliver_port_exit+0x225
>> beam.smp.dll!driver_failure_atom+0x93
>> beam.smp.dll!ready_output+0xa9
>> beam.smp.dll!erts_port_task_execute+0x28d
>> beam.smp.dll!schedule+0xb24
>> beam.smp.dll!_process_main+0x125
>> ntdll.dll!RtlImageNtHeader+0x716
>> beam.smp.dll!do_erts_alcu_free+0x5f
>> beam.smp.dll!erts_alcu_free_thr_spec+0x58
>> beam.smp.dll!thr_wrapper+0xa6
>> kernel32.dll!BaseThreadInitThunk+0x12
>> ntdll.dll!RtlInitializeExceptionChain+0x63
>> ntdll.dll!RtlInitializeExceptionChain+0x36
>> On Wed, Aug 20, 2014 at 10:26 AM, Vlad Dumitrescu <vladdu55@REDACTED>
>> wrote:
>>> Hi!
>>> I have some nodes that don't want to terminate and that I can't connect
>>> to. Most of the times the nodes unregister from epmd, but even when they
>>> don't they still are inaccessible.
>>> I was hoping that by connecting to the node I can see why it is hanging,
>>> but I get the feeling that the hanging is not inside user code (because
>>> epmd detects the socket being closed, so the shutdown process is started).
>>> This is on Windows, but I have some reports for something that looks
>>> similar from Linux too.
>>> I'm not used to debugging this kind of situations, so I hope that
>>> someone has some advice for me.
>>> best regards,
>>> Vlad
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-questions
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20140820/71489f1f/attachment.htm>

More information about the erlang-questions mailing list