<div dir="ltr">Hello Gene,<div><br></div><div>Without having a full core file, I cannot do much more than guess as to what might be wrong. What most likely is happened is that something somehow got corrupted on the heap of that process. Normally I would blame a badly written NIF/linked-in driver, but as you say you don't have anything like that, then we can rule that out. Without more information (i.e. a full core file) it is extremely hard to tell what has gone wrong.</div><div><br></div><div>This is most likely not due to the VM running out of memory, you would see another error if that were to happen.</div><div><br></div><div>Lukas</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Feb 11, 2015 at 11:54 PM, Gene Sher <span dir="ltr"><<a href="mailto:corticalcomputer@gmail.com" target="_blank">corticalcomputer@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Hello List,<div><br></div><div>Hardware: E5-Xeon 2697 v2, 32GB of RAM.</div><div>OSes tried: Xubuntu 14.04.1 LTS, CentOS 7, Ubuntu 12.04 LTS</div><div>Erlang versions the code was tried on: Erlang/OTP 17, R16, & R14</div><div><br></div><div>I have an issue where every time I use processes which contain within themselves large data structures (Large deep learning single process nodes), after just a minute or so Erlang core dumps. The amount of ram used is only about 2GB, so it can't be the system running out of memory, and its only using about 10 cores, since I'm only running 10 such processes. Now the same code, the same program, the same platform, functions without a problem when I keep these processes small (substantially smaller monolithic NN-module in each process). Everything is written purely in Erlang (No NIFs were involved in this particular NN code).</div><div><br></div><div>What exactly is happening? is something running out of space? Can anyone recommend what option during the Erlang startup I should perhaps modify to alleviate the issue?</div><div><br></div><div>There are no crushdump files that I can find, but I did get a core backtrace produced during one of these crashes when I was running erts-5.10.4, here is a partial paste of it:</div><div><br></div><div><div>ccpp-2015-02-11-09\:18\:41-2273/core_backtrace:</div><div>{   "signal": 11</div><div>,   "executable": "/usr/lib64/erlang/erts-5.10.4/bin/beam.smp"</div><div>,   "stacktrace":</div><div>      [ {   "crash_thread": true</div><div>        ,   "frames":</div><div>              [ {   "address": 5352736</div><div>                ,   "build_id": "69494bd95d056f5549e80b6fe507e55af574137f"</div><div>                ,   "build_id_offset": 1158432</div><div>                ,   "function_name": "sweep_one_area"</div><div>                ,   "file_name": "/usr/lib64/erlang/erts-5.10.4/bin/beam.smp"</div><div>                }</div><div>              , {   "address": 5367589</div><div>                ,   "build_id": "69494bd95d056f5549e80b6fe507e55af574137f"</div><div>                ,   "build_id_offset": 1173285</div><div>                ,   "function_name": "erts_garbage_collect"</div><div>                ,   "file_name": "/usr/lib64/erlang/erts-5.10.4/bin/beam.smp"</div><div>                }</div><div>              , {   "address": 5369251</div><div>                ,   "build_id": "69494bd95d056f5549e80b6fe507e55af574137f"</div><div>                ,   "build_id_offset": 1174947</div><div>                ,   "function_name": "erts_gc_after_bif_call"</div><div>                ,   "file_name": "/usr/lib64/erlang/erts-5.10.4/bin/beam.smp"</div><div>                }</div><div>              , {   "address": 5871217</div><div>                ,   "build_id": "69494bd95d056f5549e80b6fe507e55af574137f"</div><div>                ,   "build_id_offset": 1676913</div><div>                ,   "function_name": "nbif_3_gc_after_bif"</div><div>                ,   "file_name": "/usr/lib64/erlang/erts-5.10.4/bin/beam.smp"</div><div>                } ]</div><div>        }</div><div>      , {   "frames":</div><div>              [ {   "address": 1101651978</div><div>                ,   "build_id_offset": 1101651978</div><div>                } ]</div><div>        }</div><div>      , {   "frames":</div><div>              [ {   "address": 139994957883141</div><div>                ,   "build_id": "18562ee0363bc9bd7101610bd86469aa426d0c44"</div><div>                ,   "build_id_offset": 46853</div><div>                ,   "function_name": "pthread_cond_wait@@GLIBC_2.3.2"</div><div>                ,   "file_name": "/lib64/libpthread.so.0"</div><div>                }</div><div>              , {   "address": 6128777</div><div>                ,   "build_id": "69494bd95d056f5549e80b6fe507e55af574137f"</div><div>                ,   "build_id_offset": 1934473</div><div>                ,   "function_name": "ethr_cond_wait"</div><div>                ,   "file_name": "/usr/lib64/erlang/erts-5.10.4/bin/beam.smp"</div><div>                }</div><div>              , {   "address": 4665919</div><div>                ,   "build_id": "69494bd95d056f5549e80b6fe507e55af574137f"</div><div>                ,   "build_id_offset": 471615</div><div>                ,   "function_name": "sys_msg_dispatcher_func"</div><div>                ,   "file_name": "/usr/lib64/erlang/erts-5.10.4/bin/beam.smp"</div><div>                }</div><div>              , {   "address": 6134325</div><div>                ,   "build_id": "69494bd95d056f5549e80b6fe507e55af574137f"</div><div>                ,   "build_id_offset": 1940021</div><div>                ,   "function_name": "thr_wrapper"</div><div>                ,   "file_name": "/usr/lib64/erlang/erts-5.10.4/bin/beam.smp"</div><div>                }</div><div>              , {   "address": 139994957868531</div><div>                ,   "build_id": "18562ee0363bc9bd7101610bd86469aa426d0c44"</div><div>                ,   "build_id_offset": 32243</div><div>                ,   "function_name": "start_thread"</div><div>                ,   "file_name": "/lib64/libpthread.so.0"</div><div>                }</div><div>              , {   "address": 139994952778157</div><div>                ,   "build_id": "23d9f6f74c80c45a602094e5016f047bfc4d046c"</div><div>                ,   "build_id_offset": 1008045</div><div>                ,   "function_name": "__clone"</div><div>                ,   "file_name": "/lib64/libc.so.6"</div><div>                } ]</div><div>        }</div><div>      , {   "frames":</div><div>              [ {   "address": 139994957894237</div><div>                ,   "build_id": "18562ee0363bc9bd7101610bd86469aa426d0c44"</div><div>                ,   "build_id_offset": 57949</div><div>                ,   "function_name": "read"</div><div>                ,   "file_name": "/lib64/libpthread.so.0"</div><div>                }</div><div>              , {   "address": 5741674</div><div>                ,   "build_id": "69494bd95d056f5549e80b6fe507e55af574137f"</div><div>                ,   "build_id_offset": 1547370</div><div>                ,   "function_name": "signal_dispatcher_thread_func"</div><div>                ,   "file_name": "/usr/lib64/erlang/erts-5.10.4/bin/beam.smp"</div><div>                }</div><div>              , {   "address": 6134325</div><div>                ,   "build_id": "69494bd95d056f5549e80b6fe507e55af574137f"</div><div>                ,   "build_id_offset": 1940021</div><div>                ,   "function_name": "thr_wrapper"</div><div>                ,   "file_name": "/usr/lib64/erlang/erts-5.10.4/bin/beam.smp"</div><div>                }</div><div>              , {   "address": 139994957868531</div><div>                ,   "build_id": "18562ee0363bc9bd7101610bd86469aa426d0c44"</div><div>                ,   "build_id_offset": 32243</div><div>                ,   "function_name": "start_thread"</div><div>                ,   "file_name": "/lib64/libpthread.so.0"</div><div>                }</div><div>              , {   "address": 139994952778157</div><div>                ,   "build_id": "23d9f6f74c80c45a602094e5016f047bfc4d046c"</div><div>                ,   "build_id_offset": 1008045</div><div>                ,   "function_name": "__clone"</div><div>                ,   "file_name": "/lib64/libc.so.6"</div><div>                } ]</div><div>        }</div></div><div>...</div><div><br></div><div>Thanks in advance for any suggestions and help,</div><div>-Gene</div></div>
<br>_______________________________________________<br>
erlang-questions mailing list<br>
<a href="mailto:erlang-questions@erlang.org">erlang-questions@erlang.org</a><br>
<a href="http://erlang.org/mailman/listinfo/erlang-questions" target="_blank">http://erlang.org/mailman/listinfo/erlang-questions</a><br>
<br></blockquote></div><br></div>