<div dir="ltr">Hello,<div><br></div><div>99% of the time when you see a memory corruption error in the erlang allocators it is because a nif/linked-in driver is miss managing memory it got from driver/nif_alloc. To track down the error I would either build a debug or valgrind enabled emulator and see if that shows anything. For a description on how to build a debug emulator see the INSTALL[1] howto. If you want to build a valgrind emulator just substitute debug with valgrind in the guide and it should work. </div>
<div><br></div><div>Lukas</div><div><br></div><div> [1]: <a href="https://github.com/erlang/otp/blob/master/HOWTO/INSTALL.md#how-to-build-a-debug-enabled-erlang-runtime-system">https://github.com/erlang/otp/blob/master/HOWTO/INSTALL.md#how-to-build-a-debug-enabled-erlang-runtime-system</a></div>
</div><div class="gmail_extra"><br><br><div class="gmail_quote">On Mon, Jun 30, 2014 at 12:08 PM, Puneet Ahuja <span dir="ltr"><<a href="mailto:puneet@octro.com" target="_blank">puneet@octro.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br>
<br>
<br>
We are frequently (every two days or so) encountering an issue where an erlang node crashes with the following backtrace, the same crash is replicated on more than one machine (with the same backtrace):<br>
<br>
Core was generated by `/usr/local/lib/erlang/erts-5.10.4/bin/beam.smp -zdbbl 20000 -swt very_low -sbt'.<br>
Program terminated with signal 11, Segmentation fault.<br>
#0 replace (allctr=0x130d400, size=296, cand_blk=<value optimized out>, cand_size=<value optimized out>) at beam/erl_bestfit_alloc.c:287<br>
287 else if (x == x->parent->left)<br>
#0 replace (allctr=0x130d400, size=296, cand_blk=<value optimized out>, cand_size=<value optimized out>) at beam/erl_bestfit_alloc.c:287<br>
#1 bf_unlink_free_block (allctr=0x130d400, size=296, cand_blk=<value optimized out>, cand_size=<value optimized out>) at beam/erl_bestfit_alloc.c:797<br>
#2 bf_get_free_block (allctr=0x130d400, size=296, cand_blk=<value optimized out>, cand_size=<value optimized out>) at beam/erl_bestfit_alloc.c:857<br>
#3 0x0000000000443106 in mbc_alloc_block (allctr=0x130d400, size=287) at beam/erl_alloc_util.c:1956<br>
#4 mbc_alloc (allctr=0x130d400, size=287) at beam/erl_alloc_util.c:2085<br>
#5 0x0000000000448963 in erts_alcu_alloc_thr_pref (type=170, extra=<value optimized out>, size=287) at beam/erl_alloc_util.c:4932<br>
#6 0x00000000005099a8 in erts_alloc (type=<value optimized out>, size=287) at beam/erl_alloc.h:223<br>
#7 0x0000000000509b7e in erts_bin_nrml_alloc (c_p=0x7ff5e3d85e48, reg=<value optimized out>, live=<value optimized out>, build_size_term=<value optimized out>, extra_words=<value optimized out>, unit=<value optimized out>) at beam/erl_binary.h:260<br>
#8 erts_bs_append (c_p=0x7ff5e3d85e48, reg=<value optimized out>, live=<value optimized out>, build_size_term=<value optimized out>, extra_words=<value optimized out>, unit=<value optimized out>) at beam/erl_bits.c:1373<br>
#9 0x000000000053d510 in process_main () at beam/beam_emu.c:3843<br>
#10 0x000000000049dc8b in sched_thread_func (vesdp=0x7ff5e30168c0) at beam/erl_process.c:5801<br>
#11 0x00000000005bda96 in thr_wrapper (vtwd=0x7fffc11bd510) at pthread/ethread.c:106<br>
#12 0x0000003883c079d1 in start_thread () from /lib64/libpthread.so.0<br>
#13 0x00000038838e8b6d in clone () from /lib64/libc.so.6<br>
<br>
<br>
The erlang otp version is R16B03-1 built from the source code without any customisation with respect to the configure script parameters. The exmpp library we depend on uses port drivers. We don’t encounter this problem in the nodes running on R15.<br>
The source code at the crash point seems to point to some kind of memory corruption in the erlang acquired memory in erl_bestfit_alloc. Any pointers on how to resolve this problem would help.<br>
<br>
Linux Kernel Version: 2.6.32-431.3.1.el6.x86_64<br>
Linux Flavor: Centos 6.5<br>
<br>
The crash occurs during minimal load conditions.<br>
<br>
Additionally gdb shows following libraries from the core dump:<br>
<br>
*<br>
* Libraries<br>
*<br>
>From To Syms Read Shared Object Library<br>
0x0000003885400e10 0x0000003885401688 Yes (*) /lib64/libutil.so.1<br>
0x0000003dd0c00de0 0x0000003dd0c01998 Yes (*) /lib64/libdl.so.2<br>
0x0000003884403e70 0x0000003884443fb8 Yes (*) /lib64/libm.so.6<br>
0x00000039cac06a30 0x00000039cac1cf88 Yes (*) /lib64/libncurses.so.5<br>
0x0000003883c05760 0x0000003883c110c8 Yes (*) /lib64/libpthread.so.0<br>
0x0000003884002140 0x00000038840054f8 Yes (*) /lib64/librt.so.1<br>
0x000000388381ea60 0x000000388394024c Yes (*) /lib64/libc.so.6<br>
0x00000039cc00c840 0x00000039cc015c08 Yes (*) /lib64/libtinfo.so.5<br>
0x0000003883000b00 0x00000038830198eb Yes (*) /lib64/ld-linux-x86-64.so.2<br>
0x00007ff5d9a873c0 0x00007ff5d9a8e2c8 Yes /usr/local/lib/erlang/lib/crypto-3.2/priv/lib/crypto.so<br>
0x00007ff5d970abc0 0x00007ff5d97fd9a8 Yes (*) /usr/lib64/libcrypto.so.10<br>
0x0000003884802120 0x000000388480d3a8 Yes (*) /lib64/libz.so.1<br>
0x00007ff5d94a1900 0x00007ff5d94a1bf8 Yes /usr/local/lib/erlang/lib/crypto-3.2/priv/lib/crypto_callback.so<br>
0x00007ff5d9272f00 0x00007ff5d9276708 Yes /usr/local/lib/erlang/lib/exmpp-0.9.9-10-g3d17ff4/priv/lib/exmpp_stringprep.so<br>
0x00007ff5d8c38fa0 0x00007ff5d8c3dbf8 Yes /usr/local/lib/erlang/lib/exmpp-0.9.9-10-g3d17ff4/priv/lib/exmpp_xml_expat.so<br>
0x0000003886403cd0 0x000000388641cc88 Yes (*) /lib64/libexpat.so.1<br>
0x00007ff5d8a2ef00 0x00007ff5d8a33948 Yes /usr/local/lib/erlang/lib/exmpp-0.9.9-10-g3d17ff4/priv/lib/exmpp_xml_expat_legacy.so<br>
0x00007ff5d8824f50 0x00007ff5d8829a28 Yes /usr/local/lib/erlang/lib/exmpp-0.9.9-10-g3d17ff4/priv/lib/exmpp_xml_libxml2.so<br>
0x0000003cc562c7c0 0x0000003cc5709498 Yes (*) /usr/lib64/libxml2.so.2<br>
0x00007ff5d3979950 0x00007ff5d397db18 Yes /usr/local/lib/erlang/lib/exmpp-0.9.9-10-g3d17ff4/priv/lib/exmpp_tls_openssl.so<br>
0x00007ff5d34d0180 0x00007ff5d350a938 Yes (*) /usr/lib64/libssl.so.10<br>
0x0000003cc3a0ac30 0x0000003cc3a38728 Yes (*) /lib64/libgssapi_krb5.so.2<br>
0x0000003cc4e1b430 0x0000003cc4e94a78 Yes (*) /lib64/libkrb5.so.3<br>
0x00007ff5d32b53f0 0x00007ff5d32b5fc8 Yes (*) /lib64/libcom_err.so.2<br>
0x0000003cc36043d0 0x0000003cc361d5a8 Yes (*) /lib64/libk5crypto.so.3<br>
0x0000003cc4a02a40 0x0000003cc4a080c8 Yes (*) /lib64/libkrb5support.so.0<br>
0x0000003cc3e00bf0 0x0000003cc3e011d8 Yes (*) /lib64/libkeyutils.so.1<br>
0x0000003886003930 0x0000003886012938 Yes (*) /lib64/libresolv.so.2<br>
0x0000003cc1a05850 0x0000003cc1a15cc8 Yes (*) /lib64/libselinux.so.1<br>
0x00007ff5d30af5d0 0x00007ff5d30b2a28 Yes /usr/local/lib/erlang/lib/exmpp-0.9.9-10-g3d17ff4/priv/lib/exmpp_compress_zlib.so<br>
<br>
Regards,<br>
Puneet<br>
<br>
<br>
<br>
<br>
<br>
_______________________________________________<br>
erlang-questions mailing list<br>
<a href="mailto:erlang-questions@erlang.org">erlang-questions@erlang.org</a><br>
<a href="http://erlang.org/mailman/listinfo/erlang-questions" target="_blank">http://erlang.org/mailman/listinfo/erlang-questions</a><br>
</blockquote></div><br></div>