[erlang-patches] erlang node crashes in erts_gc_after_bif_call

adam chan 114999420@REDACTED
Fri Oct 19 10:42:51 CEST 2012


Hi!


I don't think the crypto NIF library was reloaded, though the mysql client(Author: Magnus Ahltorp <ahltorp@REDACTED>) in my project does use crypto library. 


When I upgrade a big data file(600K) which only includes thousands of lines like:
     get(xxx) -> #record{a = xx, b = xx};
, the crash happens.  I guess this data file has no relationship with crypto NIF library?  


Especially, while I execute 
     c:l(data_file_name)
repeatedly and quickly in the screen shell, the crash shows up frequently. 


Yesterday,  I found that the stack memory size of my application has not been set, which means, it was running on the linux default stack size (10M).  After I set the stack size to 500M using ‘ulimit -s ’ command, and split the big  data file into small sub files, the situation becomes better. Maybe the small stack size is the criminal, but I am not sure.   : (


After all, is there any way to detect whether the crypto NIF library is reloaded or not? 
I've found a discussion about "fix native code crash when calling unloaded module with on_load function":
     http://erlang.2086793.n4.nabble.com/fix-native-code-crash-when-calling-unloaded-module-with-on-load-function-td2273502.html
And I did have a suspicion on crypto module before, since the crypto module has an on_load attribute.


Cheers,
[Adam Chan]




------------------ Original ------------------
From:  "Patrik Nyblom"<pan@REDACTED>;
Date:  Thu, Oct 18, 2012 08:08 PM
To:  "erlang-patches"<erlang-patches@REDACTED>; 

Subject:  Re: [erlang-patches] erlang node crashes in erts_gc_after_bif_call



Hi!

Is the crypto NIF library reloaded during upgrade? That causes havoc 
unfortunately, due to the behaviour of the OpenSSL crypto memory 
allocation callbacks. We're working on that one.

Have you reloaded the crypto NIF library, directly or indirectly, when 
this happens?

Cheers,
/Patrik

On 10/17/2012 03:47 AM, adam chan wrote:
> hello list,
>
>      I met two random crash in this month, each crash happened more than two
> times. The causation was "Program terminated with signal 11, Segmentation
> fault" and they most likely happened while I hot update some module code
> using code:soft_purge/1 and code:load_file/1.
>      Though they take place in different code, the information from core file
> points out that function erts_gc_after_bif_call/4 was called while crash
> happened. So I guess it is related to gc operation.
>      I am using otp_src_R15B02, smp mode.
>      Here are the information from core file (I am not familiar with gdb ,
> hope the information is useful)
>
>      [First One]
> Reading symbols from /lib64/libutil.so.1...(no debugging symbols
> found)...done.
> Loaded symbols for /lib64/libutil.so.1
> Reading symbols from /lib64/libdl.so.2...(no debugging symbols
> found)...done.
> Loaded symbols for /lib64/libdl.so.2
> Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libm.so.6
> Reading symbols from /usr/lib64/libncurses.so.5...(no debugging symbols
> found)...done.
> Loaded symbols for /usr/lib64/libncurses.so.5
> Reading symbols from /lib64/libpthread.so.0...(no debugging symbols
> found)...done.
> Loaded symbols for /lib64/libpthread.so.0
> Reading symbols from /lib64/librt.so.1...(no debugging symbols
> found)...done.
> Loaded symbols for /lib64/librt.so.1
> Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libc.so.6
> Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols
> found)...done.
> Loaded symbols for /lib64/ld-linux-x86-64.so.2
> Reading symbols from
> /usr/local/lib/erlang/lib/crypto-2.2/priv/lib/crypto.so...done.
> Loaded symbols for /usr/local/lib/erlang/lib/crypto-2.2/priv/lib/crypto.so
> Reading symbols from /lib64/libcrypto.so.6...(no debugging symbols
> found)...done.
> Loaded symbols for /lib64/libcrypto.so.6
> Reading symbols from /usr/lib64/libz.so.1...(no debugging symbols
> found)...done.
> Loaded symbols for /usr/lib64/libz.so.1
> Core was generated by `/usr/local/lib/erlang/erts-5.9.2/bin/beam.smp -P
> 1024000 -K true -- -root /usr/'.
> Program terminated with signal 11, Segmentation fault.
> #0  0x0000000000541b9a in check_process_code (A__p=0x1a8c5b40,
> BIF__ARGS=<value optimized out>) at beam/beam_bif_load.c:487
> 487                 if (INSIDE((BeamInstr *) funp->fe->address)) {
> (gdb) bt
> #0  0x0000000000541b9a in check_process_code (A__p=0x1a8c5b40,
> BIF__ARGS=<value optimized out>) at beam/beam_bif_load.c:487
> #1  check_process_code_2 (A__p=0x1a8c5b40, BIF__ARGS=<value optimized out>)
> at beam/beam_bif_load.c:205
> #2  0x0000000000530782 in process_main () at beam/beam_emu.c:3392
> #3  0x00000000004a0b4f in sched_thread_func (vesdp=<value optimized out>) at
> beam/erl_process.c:5184
> #4  0x00000000005a4f14 in thr_wrapper (vtwd=<value optimized out>) at
> pthread/ethread.c:110
> #5  0x000000393fc0673d in start_thread () from /lib64/libpthread.so.0
> #6  0x000000393f4d3f6d in clone () from /lib64/libc.so.6
> (gdb) p funp
> $1 =<value optimized out>
> (gdb) p funp->fe
> Cannot access memory at address 0x8
>
>      [Second One]
> Reading symbols from /lib64/libutil.so.1...(no debugging symbols
> found)...done.
> Loaded symbols for /lib64/libutil.so.1
> Reading symbols from /lib64/libdl.so.2...(no debugging symbols
> found)...done.
> Loaded symbols for /lib64/libdl.so.2
> Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libm.so.6
> Reading symbols from /usr/lib64/libncurses.so.5...(no debugging symbols
> found)...done.
> Loaded symbols for /usr/lib64/libncurses.so.5
> Reading symbols from /lib64/libpthread.so.0...(no debugging symbols
> found)...done.
> Loaded symbols for /lib64/libpthread.so.0
> Reading symbols from /lib64/librt.so.1...(no debugging symbols
> found)...done.
> Loaded symbols for /lib64/librt.so.1
> Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libc.so.6
> Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols
> found)...done.
> Loaded symbols for /lib64/ld-linux-x86-64.so.2
> Reading symbols from
> /usr/local/lib/erlang/lib/crypto-2.2/priv/lib/crypto.so...done.
> Loaded symbols for /usr/local/lib/erlang/lib/crypto-2.2/priv/lib/crypto.so
> Reading symbols from /lib64/libcrypto.so.6...(no debugging symbols
> found)...done.
> Loaded symbols for /lib64/libcrypto.so.6
> Reading symbols from /usr/lib64/libz.so.1...(no debugging symbols
> found)...done.
> Loaded symbols for /usr/lib64/libz.so.1
> Core was generated by `/usr/local/lib/erlang/erts-5.9.2/bin/beam.smp -P
> 1024000 -K true -- -root /usr/'.
> Program terminated with signal 11, Segmentation fault.
> #0  0x00000000004f8bac in sweep_off_heap (p=0x2aaabc644b90, fullsweep=0) at
> beam/erl_gc.c:2302
> 2302                ptr = ptr->next;
> (gdb) bt
> #0  0x00000000004f8bac in sweep_off_heap (p=0x2aaabc644b90, fullsweep=0) at
> beam/erl_gc.c:2302
> #1  0x00000000004fabb8 in do_minor (p=0x2aaabc644b90, need=0,
> objv=0x44719e00, nobj=1, recl=0x44719db8) at beam/erl_gc.c:1133
> #2  minor_collection (p=0x2aaabc644b90, need=0, objv=0x44719e00, nobj=1,
> recl=0x44719db8) at beam/erl_gc.c:827
> #3  0x00000000004fc40d in erts_garbage_collect (p=0x2aaabc644b90, need=0,
> objv=0x44719e00, nobj=1) at beam/erl_gc.c:405
> #4  0x00000000004fcdcf in erts_gc_after_bif_call (p=0x2aaabc644b90,
> result=46912734893994, regs=<value optimized out>, arity=<value optimized
> out>) at beam/erl_gc.c:335
> #5  0x00000000005309c1 in process_main () at beam/beam_emu.c:2600
> #6  0x00000000004a0b4f in sched_thread_func (vesdp=<value optimized out>) at
> beam/erl_process.c:5184
> #7  0x00000000005a4f14 in thr_wrapper (vtwd=<value optimized out>) at
> pthread/ethread.c:110
> #8  0x000000393fc0673d in start_thread () from /lib64/libpthread.so.0
> #9  0x000000393f4d3f6d in clone () from /lib64/libc.so.6
> (gdb) p ptr
> $1 = (struct erl_off_heap_header *) 0x2aab02d63980
> (gdb) x/x 0x2aab02d63980
> 0x2aab02d63980: 0x000000f0
> (gdb) p ptr->next
> $2 = (struct erl_off_heap_header *) 0x2aab02d5ee80
> (gdb) x/x 0x2aab02d5ee80
> 0x2aab02d5ee80: 0x00000160
>
>      Any ideas? Thanks a lot.
>
>
>
> --
> View this message in context: http://erlang.2086793.n4.nabble.com/erlang-node-crashes-in-erts-gc-after-bif-call-tp4655148.html
> Sent from the Erlang Patches mailing list archive at Nabble.com.
> _______________________________________________
> erlang-patches mailing list
> erlang-patches@REDACTED
> http://erlang.org/mailman/listinfo/erlang-patches

_______________________________________________
erlang-patches mailing list
erlang-patches@REDACTED
http://erlang.org/mailman/listinfo/erlang-patches
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-patches/attachments/20121019/5527bed7/attachment.htm>


More information about the erlang-patches mailing list