[erlang-patches] erlang node crashes in erts_gc_after_bif_call

Patrik Nyblom pan@REDACTED
Fri Oct 19 17:08:37 CEST 2012


Hi!

On 10/19/2012 10:42 AM, adam chan wrote:
> Hi!
>
> I don't think the crypto NIF library was reloaded, though the mysql 
> client(Author: Magnus Ahltorp <ahltorp@REDACTED>) in my project 
> does use crypto library.
>
> When I upgrade a big data file(600K) which only includes thousands of 
> lines like:
>      get(xxx) -> #record{a = xx, b = xx};
> , the crash happens.  I guess this data file has no relationship with 
> crypto NIF library?
>
Oups, that sure does not look like a crypto thing!
> Especially, while I execute
>      c:l(data_file_name)
> repeatedly and quickly in the screen shell, the crash shows up frequently.
Could you send me the file and an example program + your distro/os 
environment, so I can reproduce it?
Send it in a "private" mail to pan@REDACTED, you would maybe not want 
to spread details of your system publicly...
>
> Yesterday,  I found that the stack memory size of my application has 
> not been set, which means, it was running on the linux default stack 
> size (10M).  After I set the stack size to 500M using 'ulimit -s ' 
> command, and split the big  data file into small sub files, the 
> situation becomes better. Maybe the small stack size is the criminal, 
> but I am not sure.   : (
>
Nah, that should not be a problem as long as you do not use external 
libraries which go crazy on the C stack. We usually don't (except for 
re, which relies on the PCRE library, that has a rather aggressive 
approach to the C stack when compiling regexps).
> After all, is there any way to detect whether the crypto NIF library 
> is reloaded or not?
> I've found a discussion about "fix native code crash when calling 
> unloaded module with on_load function":
>     
>  http://erlang.2086793.n4.nabble.com/fix-native-code-crash-when-calling-unloaded-module-with-on-load-function-td2273502.html
> And I did have a suspicion on crypto module before, since the crypto 
> module has an on_load attribute.
In a few days, there will be a fix for this in the master branch, that 
should remove the problem. But if you have an example you can run from 
the shell and that I could reproduce it with, things would be much 
faster. You could of course add a printout to the crypto module to track 
the calls of the on_load handler. If it's called more than twice, there 
will also be unloading (because of code purging). You could also trace 
calls to erlang:purge_module/1.
>
> Cheers,
> [Adam Chan]
Cheers,
/Patrik
>
> ------------------ Original ------------------
> *From: * "Patrik Nyblom"<pan@REDACTED>;
> *Date: * Thu, Oct 18, 2012 08:08 PM
> *To: * "erlang-patches"<erlang-patches@REDACTED>;
> *Subject: * Re: [erlang-patches] erlang node crashes in 
> erts_gc_after_bif_call
>
> Hi!
>
> Is the crypto NIF library reloaded during upgrade? That causes havoc
> unfortunately, due to the behaviour of the OpenSSL crypto memory
> allocation callbacks. We're working on that one.
>
> Have you reloaded the crypto NIF library, directly or indirectly, when
> this happens?
>
> Cheers,
> /Patrik
>
> On 10/17/2012 03:47 AM, adam chan wrote:
> > hello list,
> >
> >      I met two random crash in this month, each crash happened more 
> than two
> > times. The causation was "Program terminated with signal 11, 
> Segmentation
> > fault" and they most likely happened while I hot update some module code
> > using code:soft_purge/1 and code:load_file/1.
> >      Though they take place in different code, the information from 
> core file
> > points out that function erts_gc_after_bif_call/4 was called while crash
> > happened. So I guess it is related to gc operation.
> >      I am using otp_src_R15B02, smp mode.
> >      Here are the information from core file (I am not familiar with 
> gdb ,
> > hope the information is useful)
> >
> >      [First One]
> > Reading symbols from /lib64/libutil.so.1...(no debugging symbols
> > found)...done.
> > Loaded symbols for /lib64/libutil.so.1
> > Reading symbols from /lib64/libdl.so.2...(no debugging symbols
> > found)...done.
> > Loaded symbols for /lib64/libdl.so.2
> > Reading symbols from /lib64/libm.so.6...(no debugging symbols 
> found)...done.
> > Loaded symbols for /lib64/libm.so.6
> > Reading symbols from /usr/lib64/libncurses.so.5...(no debugging symbols
> > found)...done.
> > Loaded symbols for /usr/lib64/libncurses.so.5
> > Reading symbols from /lib64/libpthread.so.0...(no debugging symbols
> > found)...done.
> > Loaded symbols for /lib64/libpthread.so.0
> > Reading symbols from /lib64/librt.so.1...(no debugging symbols
> > found)...done.
> > Loaded symbols for /lib64/librt.so.1
> > Reading symbols from /lib64/libc.so.6...(no debugging symbols 
> found)...done.
> > Loaded symbols for /lib64/libc.so.6
> > Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols
> > found)...done.
> > Loaded symbols for /lib64/ld-linux-x86-64.so.2
> > Reading symbols from
> > /usr/local/lib/erlang/lib/crypto-2.2/priv/lib/crypto.so...done.
> > Loaded symbols for 
> /usr/local/lib/erlang/lib/crypto-2.2/priv/lib/crypto.so
> > Reading symbols from /lib64/libcrypto.so.6...(no debugging symbols
> > found)...done.
> > Loaded symbols for /lib64/libcrypto.so.6
> > Reading symbols from /usr/lib64/libz.so.1...(no debugging symbols
> > found)...done.
> > Loaded symbols for /usr/lib64/libz.so.1
> > Core was generated by `/usr/local/lib/erlang/erts-5.9.2/bin/beam.smp -P
> > 1024000 -K true -- -root /usr/'.
> > Program terminated with signal 11, Segmentation fault.
> > #0  0x0000000000541b9a in check_process_code (A__p=0x1a8c5b40,
> > BIF__ARGS=<value optimized out>) at beam/beam_bif_load.c:487
> > 487                 if (INSIDE((BeamInstr *) funp->fe->address)) {
> > (gdb) bt
> > #0  0x0000000000541b9a in check_process_code (A__p=0x1a8c5b40,
> > BIF__ARGS=<value optimized out>) at beam/beam_bif_load.c:487
> > #1  check_process_code_2 (A__p=0x1a8c5b40, BIF__ARGS=<value 
> optimized out>)
> > at beam/beam_bif_load.c:205
> > #2  0x0000000000530782 in process_main () at beam/beam_emu.c:3392
> > #3  0x00000000004a0b4f in sched_thread_func (vesdp=<value optimized 
> out>) at
> > beam/erl_process.c:5184
> > #4  0x00000000005a4f14 in thr_wrapper (vtwd=<value optimized out>) at
> > pthread/ethread.c:110
> > #5  0x000000393fc0673d in start_thread () from /lib64/libpthread.so.0
> > #6  0x000000393f4d3f6d in clone () from /lib64/libc.so.6
> > (gdb) p funp
> > $1 =<value optimized out>
> > (gdb) p funp->fe
> > Cannot access memory at address 0x8
> >
> >      [Second One]
> > Reading symbols from /lib64/libutil.so.1...(no debugging symbols
> > found)...done.
> > Loaded symbols for /lib64/libutil.so.1
> > Reading symbols from /lib64/libdl.so.2...(no debugging symbols
> > found)...done.
> > Loaded symbols for /lib64/libdl.so.2
> > Reading symbols from /lib64/libm.so.6...(no debugging symbols 
> found)...done.
> > Loaded symbols for /lib64/libm.so.6
> > Reading symbols from /usr/lib64/libncurses.so.5...(no debugging symbols
> > found)...done.
> > Loaded symbols for /usr/lib64/libncurses.so.5
> > Reading symbols from /lib64/libpthread.so.0...(no debugging symbols
> > found)...done.
> > Loaded symbols for /lib64/libpthread.so.0
> > Reading symbols from /lib64/librt.so.1...(no debugging symbols
> > found)...done.
> > Loaded symbols for /lib64/librt.so.1
> > Reading symbols from /lib64/libc.so.6...(no debugging symbols 
> found)...done.
> > Loaded symbols for /lib64/libc.so.6
> > Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols
> > found)...done.
> > Loaded symbols for /lib64/ld-linux-x86-64.so.2
> > Reading symbols from
> > /usr/local/lib/erlang/lib/crypto-2.2/priv/lib/crypto.so...done.
> > Loaded symbols for 
> /usr/local/lib/erlang/lib/crypto-2.2/priv/lib/crypto.so
> > Reading symbols from /lib64/libcrypto.so.6...(no debugging symbols
> > found)...done.
> > Loaded symbols for /lib64/libcrypto.so.6
> > Reading symbols from /usr/lib64/libz.so.1...(no debugging symbols
> > found)...done.
> > Loaded symbols for /usr/lib64/libz.so.1
> > Core was generated by `/usr/local/lib/erlang/erts-5.9.2/bin/beam.smp -P
> > 1024000 -K true -- -root /usr/'.
> > Program terminated with signal 11, Segmentation fault.
> > #0  0x00000000004f8bac in sweep_off_heap (p=0x2aaabc644b90, 
> fullsweep=0) at
> > beam/erl_gc.c:2302
> > 2302                ptr = ptr->next;
> > (gdb) bt
> > #0  0x00000000004f8bac in sweep_off_heap (p=0x2aaabc644b90, 
> fullsweep=0) at
> > beam/erl_gc.c:2302
> > #1  0x00000000004fabb8 in do_minor (p=0x2aaabc644b90, need=0,
> > objv=0x44719e00, nobj=1, recl=0x44719db8) at beam/erl_gc.c:1133
> > #2  minor_collection (p=0x2aaabc644b90, need=0, objv=0x44719e00, nobj=1,
> > recl=0x44719db8) at beam/erl_gc.c:827
> > #3  0x00000000004fc40d in erts_garbage_collect (p=0x2aaabc644b90, 
> need=0,
> > objv=0x44719e00, nobj=1) at beam/erl_gc.c:405
> > #4  0x00000000004fcdcf in erts_gc_after_bif_call (p=0x2aaabc644b90,
> > result=46912734893994, regs=<value optimized out>, arity=<value 
> optimized
> > out>) at beam/erl_gc.c:335
> > #5  0x00000000005309c1 in process_main () at beam/beam_emu.c:2600
> > #6  0x00000000004a0b4f in sched_thread_func (vesdp=<value optimized 
> out>) at
> > beam/erl_process.c:5184
> > #7  0x00000000005a4f14 in thr_wrapper (vtwd=<value optimized out>) at
> > pthread/ethread.c:110
> > #8  0x000000393fc0673d in start_thread () from /lib64/libpthread.so.0
> > #9  0x000000393f4d3f6d in clone () from /lib64/libc.so.6
> > (gdb) p ptr
> > $1 = (struct erl_off_heap_header *) 0x2aab02d63980
> > (gdb) x/x 0x2aab02d63980
> > 0x2aab02d63980: 0x000000f0
> > (gdb) p ptr->next
> > $2 = (struct erl_off_heap_header *) 0x2aab02d5ee80
> > (gdb) x/x 0x2aab02d5ee80
> > 0x2aab02d5ee80: 0x00000160
> >
> >      Any ideas? Thanks a lot.
> >
> >
> >
> > --
> > View this message in context: 
> http://erlang.2086793.n4.nabble.com/erlang-node-crashes-in-erts-gc-after-bif-call-tp4655148.html
> > Sent from the Erlang Patches mailing list archive at Nabble.com.
> > _______________________________________________
> > erlang-patches mailing list
> > erlang-patches@REDACTED
> > http://erlang.org/mailman/listinfo/erlang-patches
>
> _______________________________________________
> erlang-patches mailing list
> erlang-patches@REDACTED
> http://erlang.org/mailman/listinfo/erlang-patches
>
>
> _______________________________________________
> erlang-patches mailing list
> erlang-patches@REDACTED
> http://erlang.org/mailman/listinfo/erlang-patches

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-patches/attachments/20121019/49644a4a/attachment.htm>


More information about the erlang-patches mailing list