<div dir="ltr">Hello List,<div><br></div><div>Hardware: E5-Xeon 2697 v2, 32GB of RAM.</div><div>OSes tried: Xubuntu 14.04.1 LTS, CentOS 7, Ubuntu 12.04 LTS</div><div>Erlang versions the code was tried on: Erlang/OTP 17, R16, & R14</div><div><br></div><div>I have an issue where every time I use processes which contain within themselves large data structures (Large deep learning single process nodes), after just a minute or so Erlang core dumps. The amount of ram used is only about 2GB, so it can't be the system running out of memory, and its only using about 10 cores, since I'm only running 10 such processes. Now the same code, the same program, the same platform, functions without a problem when I keep these processes small (substantially smaller monolithic NN-module in each process). Everything is written purely in Erlang (No NIFs were involved in this particular NN code).</div><div><br></div><div>What exactly is happening? is something running out of space? Can anyone recommend what option during the Erlang startup I should perhaps modify to alleviate the issue?</div><div><br></div><div>There are no crushdump files that I can find, but I did get a core backtrace produced during one of these crashes when I was running erts-5.10.4, here is a partial paste of it:</div><div><br></div><div><div>ccpp-2015-02-11-09\:18\:41-2273/core_backtrace:</div><div>{   "signal": 11</div><div>,   "executable": "/usr/lib64/erlang/erts-5.10.4/bin/beam.smp"</div><div>,   "stacktrace":</div><div>      [ {   "crash_thread": true</div><div>        ,   "frames":</div><div>              [ {   "address": 5352736</div><div>                ,   "build_id": "69494bd95d056f5549e80b6fe507e55af574137f"</div><div>                ,   "build_id_offset": 1158432</div><div>                ,   "function_name": "sweep_one_area"</div><div>                ,   "file_name": "/usr/lib64/erlang/erts-5.10.4/bin/beam.smp"</div><div>                }</div><div>              , {   "address": 5367589</div><div>                ,   "build_id": "69494bd95d056f5549e80b6fe507e55af574137f"</div><div>                ,   "build_id_offset": 1173285</div><div>                ,   "function_name": "erts_garbage_collect"</div><div>                ,   "file_name": "/usr/lib64/erlang/erts-5.10.4/bin/beam.smp"</div><div>                }</div><div>              , {   "address": 5369251</div><div>                ,   "build_id": "69494bd95d056f5549e80b6fe507e55af574137f"</div><div>                ,   "build_id_offset": 1174947</div><div>                ,   "function_name": "erts_gc_after_bif_call"</div><div>                ,   "file_name": "/usr/lib64/erlang/erts-5.10.4/bin/beam.smp"</div><div>                }</div><div>              , {   "address": 5871217</div><div>                ,   "build_id": "69494bd95d056f5549e80b6fe507e55af574137f"</div><div>                ,   "build_id_offset": 1676913</div><div>                ,   "function_name": "nbif_3_gc_after_bif"</div><div>                ,   "file_name": "/usr/lib64/erlang/erts-5.10.4/bin/beam.smp"</div><div>                } ]</div><div>        }</div><div>      , {   "frames":</div><div>              [ {   "address": 1101651978</div><div>                ,   "build_id_offset": 1101651978</div><div>                } ]</div><div>        }</div><div>      , {   "frames":</div><div>              [ {   "address": 139994957883141</div><div>                ,   "build_id": "18562ee0363bc9bd7101610bd86469aa426d0c44"</div><div>                ,   "build_id_offset": 46853</div><div>                ,   "function_name": "pthread_cond_wait@@GLIBC_2.3.2"</div><div>                ,   "file_name": "/lib64/libpthread.so.0"</div><div>                }</div><div>              , {   "address": 6128777</div><div>                ,   "build_id": "69494bd95d056f5549e80b6fe507e55af574137f"</div><div>                ,   "build_id_offset": 1934473</div><div>                ,   "function_name": "ethr_cond_wait"</div><div>                ,   "file_name": "/usr/lib64/erlang/erts-5.10.4/bin/beam.smp"</div><div>                }</div><div>              , {   "address": 4665919</div><div>                ,   "build_id": "69494bd95d056f5549e80b6fe507e55af574137f"</div><div>                ,   "build_id_offset": 471615</div><div>                ,   "function_name": "sys_msg_dispatcher_func"</div><div>                ,   "file_name": "/usr/lib64/erlang/erts-5.10.4/bin/beam.smp"</div><div>                }</div><div>              , {   "address": 6134325</div><div>                ,   "build_id": "69494bd95d056f5549e80b6fe507e55af574137f"</div><div>                ,   "build_id_offset": 1940021</div><div>                ,   "function_name": "thr_wrapper"</div><div>                ,   "file_name": "/usr/lib64/erlang/erts-5.10.4/bin/beam.smp"</div><div>                }</div><div>              , {   "address": 139994957868531</div><div>                ,   "build_id": "18562ee0363bc9bd7101610bd86469aa426d0c44"</div><div>                ,   "build_id_offset": 32243</div><div>                ,   "function_name": "start_thread"</div><div>                ,   "file_name": "/lib64/libpthread.so.0"</div><div>                }</div><div>              , {   "address": 139994952778157</div><div>                ,   "build_id": "23d9f6f74c80c45a602094e5016f047bfc4d046c"</div><div>                ,   "build_id_offset": 1008045</div><div>                ,   "function_name": "__clone"</div><div>                ,   "file_name": "/lib64/libc.so.6"</div><div>                } ]</div><div>        }</div><div>      , {   "frames":</div><div>              [ {   "address": 139994957894237</div><div>                ,   "build_id": "18562ee0363bc9bd7101610bd86469aa426d0c44"</div><div>                ,   "build_id_offset": 57949</div><div>                ,   "function_name": "read"</div><div>                ,   "file_name": "/lib64/libpthread.so.0"</div><div>                }</div><div>              , {   "address": 5741674</div><div>                ,   "build_id": "69494bd95d056f5549e80b6fe507e55af574137f"</div><div>                ,   "build_id_offset": 1547370</div><div>                ,   "function_name": "signal_dispatcher_thread_func"</div><div>                ,   "file_name": "/usr/lib64/erlang/erts-5.10.4/bin/beam.smp"</div><div>                }</div><div>              , {   "address": 6134325</div><div>                ,   "build_id": "69494bd95d056f5549e80b6fe507e55af574137f"</div><div>                ,   "build_id_offset": 1940021</div><div>                ,   "function_name": "thr_wrapper"</div><div>                ,   "file_name": "/usr/lib64/erlang/erts-5.10.4/bin/beam.smp"</div><div>                }</div><div>              , {   "address": 139994957868531</div><div>                ,   "build_id": "18562ee0363bc9bd7101610bd86469aa426d0c44"</div><div>                ,   "build_id_offset": 32243</div><div>                ,   "function_name": "start_thread"</div><div>                ,   "file_name": "/lib64/libpthread.so.0"</div><div>                }</div><div>              , {   "address": 139994952778157</div><div>                ,   "build_id": "23d9f6f74c80c45a602094e5016f047bfc4d046c"</div><div>                ,   "build_id_offset": 1008045</div><div>                ,   "function_name": "__clone"</div><div>                ,   "file_name": "/lib64/libc.so.6"</div><div>                } ]</div><div>        }</div></div><div>...</div><div><br></div><div>Thanks in advance for any suggestions and help,</div><div>-Gene</div></div>