[erlang-questions] hanging nodes

Lukas Larsson lukas@REDACTED
Wed Aug 20 15:25:54 CEST 2014


I think the dump would be very helpful. I need to know what the overlapped
io threads are doing and what values are set in the dp->out struct.

Lukas


On Wed, Aug 20, 2014 at 3:20 PM, Vlad Dumitrescu <vladdu55@REDACTED> wrote:

> Hi Lukas,
>
> I was about to send this anyway. Maybe this is enough from the dump? If
> not i will send the whole of it. You can also see it's R16B03-1, but it's
> the same for 15 and 17.
>
> The stacktrace is
>
> 0523f644 0f528254 02b40e88 000140cb 02b40d40 beam_smp!fd_stop+0x5a
> [c:\cygwin\ldisk\daily_build\r16b03-1_opu_c.2014-01-23_17\otp_src_r16b03-1\erts\emulator\sys\win32\sys.c
> @ 2351]
> 0523f660 0f529fe5 fffffffb 02b40d40 000140cb beam_smp!terminate_port+0x64
> [c:\cygwin\ldisk\daily_build\r16b03-1_opu_c.2014-01-23_17\otp_src_r16b03-1\erts\emulator\beam\io.c
> @ 3417]
> 0523f678 0f52ad53 0000001b 00014607 000140cb
> beam_smp!erts_deliver_port_exit+0x225
> [c:\cygwin\ldisk\daily_build\r16b03-1_opu_c.2014-01-23_17\otp_src_r16b03-1\erts\emulator\beam\io.c
> @ 3651]
> 0523f694 0f5cdd99 02b40d40 0f64099c 000000e8
> beam_smp!driver_failure_atom+0x93
> [c:\cygwin\ldisk\daily_build\r16b03-1_opu_c.2014-01-23_17\otp_src_r16b03-1\erts\emulator\beam\io.c
> @ 6948]
> 0523f6bc 0f543c5d 00000000 0000027c 03700580 beam_smp!ready_output+0xa9
> [c:\cygwin\ldisk\daily_build\r16b03-1_opu_c.2014-01-23_17\otp_src_r16b03-1\erts\emulator\sys\win32\sys.c
> @ 2850]
> 0523f700 0f53dba4 03700580 0370719c 000007d0
> beam_smp!erts_port_task_execute+0x28d
> [c:\cygwin\ldisk\daily_build\r16b03-1_opu_c.2014-01-23_17\otp_src_r16b03-1\erts\emulator\beam\erl_port_task.c
> @ 1689]
> 0523f740 0f5a9255 03700580 000007d1 02723fe0 beam_smp!schedule+0xb24
> [c:\cygwin\ldisk\daily_build\r16b03-1_opu_c.2014-01-23_17\otp_src_r16b03-1\erts\emulator\beam\erl_process.c
> @ 7065]
> 0523f790 77bf38aa 00000017 03707100 00000f4b beam_smp!process_main+0x125
>
> And the code is sys.c
>
> static void fd_stop(ErlDrvData data)
> {
>   DriverData * dp = (DriverData *) data;
>   /*
>    * There's no way we can terminate an fd port in a consistent way.
>    * Instead we let it live until it's opened again (which it is,
>    * as the only FD-drivers are for 0,1 and 2 adn the only time they
>    * get closed is by init:reboot).
>    * So - just deselect them and let everything be as is.
>    * They get woken up in fd_start again, where the DriverData is
>    * remembered. /PaN
>    */
>   if (dp->in.ov.hEvent != NULL) {
>       (void) driver_select(dp->port_num,
>    (ErlDrvEvent)dp->in.ov.hEvent,
>     ERL_DRV_READ, 0);
>   }
>   if (dp->out.ov.hEvent != NULL) {
>       (void) driver_select(dp->port_num,
>    (ErlDrvEvent)dp->out.ov.hEvent,
>     ERL_DRV_WRITE, 0);
>       do {
> ASSERT(dp->out.flushEvent);
> SetEvent(dp->out.flushEvent);
>       } while (WaitForSingleObject(dp->out.flushReplyEvent, 10) ==
> WAIT_TIMEOUT
>        || !(dp->out.flags & DF_THREAD_FLUSHED)); // this is line 2351
>   }
>
> }
>
> The code is not really hung, but it looks like the do/while loop is never
> exited. (It's possible that there's another loop at a higher level, in any
> case looking at the stacktrace et different moments it is most often like
> the one above).
>
> The use case is that the erlang node is started from java and the problem
> happens when the java process is killed mercilessly. I will try to create a
> simple test to reproduce this.
>
> regards,
> Vlad
>
>
>
> On Wed, Aug 20, 2014 at 3:08 PM, Lukas Larsson <lukas@REDACTED> wrote:
>
>> Hello Vlad,
>>
>> This is most probably a windows only issue as the code in the stacktrace
>> runs through a lot of windows specific code. I guess the easiest way would
>> be if you are able to write a testcase that can reproduce the error so that
>> I can debug it here. If that is not possible, maybe you can attach to the
>> process and send me a windows "core file"?
>>
>> To do this you have to install WinDbg[1] and then do "File->Attach to a
>> Process". After this you open the WinDbg console and type ".dump /ma
>> c:\beam.smp.dmp" and then upload that file on a server where I can download
>> it. Also I assume you have used one of the installers on erlang.org?
>> Which version of windows+Erlang/OTP are you using?
>>
>> Lukas
>>
>>
>> On Wed, Aug 20, 2014 at 11:18 AM, Vlad Dumitrescu <vladdu55@REDACTED>
>> wrote:
>>
>>> Some details: the beam process seems to have a thread that sits and does
>>> something. I don't know if the following example stack helps.
>>>
>>> best regards,
>>> Vlad
>>>
>>> wow64cpu.dll!TurboDispatchJumpAddressEnd+0x63b
>>> wow64.dll!Wow64SystemServiceEx+0x1ce
>>> wow64.dll!Wow64LdrpInitialize+0x42a
>>> ntdll.dll!RtlIsDosDeviceName_U+0x23a27
>>> ntdll.dll!LdrInitializeThunk+0xe
>>>
>>> kernel32.dll!SetEvent+0x2
>>> beam.smp.dll!fd_stop+0x4f
>>> beam.smp.dll!terminate_port+0x64
>>> beam.smp.dll!erts_deliver_port_exit+0x225
>>> beam.smp.dll!driver_failure_atom+0x93
>>> beam.smp.dll!ready_output+0xa9
>>> beam.smp.dll!erts_port_task_execute+0x28d
>>> beam.smp.dll!schedule+0xb24
>>> beam.smp.dll!_process_main+0x125
>>> ntdll.dll!RtlImageNtHeader+0x716
>>> beam.smp.dll!do_erts_alcu_free+0x5f
>>> beam.smp.dll!erts_alcu_free_thr_spec+0x58
>>> beam.smp.dll!thr_wrapper+0xa6
>>> kernel32.dll!BaseThreadInitThunk+0x12
>>> ntdll.dll!RtlInitializeExceptionChain+0x63
>>> ntdll.dll!RtlInitializeExceptionChain+0x36
>>>
>>>
>>>
>>> On Wed, Aug 20, 2014 at 10:26 AM, Vlad Dumitrescu <vladdu55@REDACTED>
>>> wrote:
>>>
>>>> Hi!
>>>>
>>>> I have some nodes that don't want to terminate and that I can't connect
>>>> to. Most of the times the nodes unregister from epmd, but even when they
>>>> don't they still are inaccessible.
>>>>
>>>> I was hoping that by connecting to the node I can see why it is
>>>> hanging, but I get the feeling that the hanging is not inside user code
>>>> (because epmd detects the socket being closed, so the shutdown process is
>>>> started).
>>>>
>>>> This is on Windows, but I have some reports for something that looks
>>>> similar from Linux too.
>>>> I'm not used to debugging this kind of situations, so I hope that
>>>> someone has some advice for me.
>>>>
>>>> best regards,
>>>> Vlad
>>>>
>>>>
>>>>
>>>>
>>>
>>> _______________________________________________
>>> erlang-questions mailing list
>>> erlang-questions@REDACTED
>>> http://erlang.org/mailman/listinfo/erlang-questions
>>>
>>>
>>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20140820/7c9db4de/attachment.htm>


More information about the erlang-questions mailing list