[erlang-questions] Tracking down the reason for my Segmentation Fault (Core dump) problems.

Gene Sher corticalcomputer@REDACTED
Wed Feb 11 23:54:28 CET 2015


Hello List,

Hardware: E5-Xeon 2697 v2, 32GB of RAM.
OSes tried: Xubuntu 14.04.1 LTS, CentOS 7, Ubuntu 12.04 LTS
Erlang versions the code was tried on: Erlang/OTP 17, R16, & R14

I have an issue where every time I use processes which contain within
themselves large data structures (Large deep learning single process
nodes), after just a minute or so Erlang core dumps. The amount of ram used
is only about 2GB, so it can't be the system running out of memory, and its
only using about 10 cores, since I'm only running 10 such processes. Now
the same code, the same program, the same platform, functions without a
problem when I keep these processes small (substantially smaller monolithic
NN-module in each process). Everything is written purely in Erlang (No NIFs
were involved in this particular NN code).

What exactly is happening? is something running out of space? Can anyone
recommend what option during the Erlang startup I should perhaps modify to
alleviate the issue?

There are no crushdump files that I can find, but I did get a core
backtrace produced during one of these crashes when I was running
erts-5.10.4, here is a partial paste of it:

ccpp-2015-02-11-09\:18\:41-2273/core_backtrace:
{   "signal": 11
,   "executable": "/usr/lib64/erlang/erts-5.10.4/bin/beam.smp"
,   "stacktrace":
      [ {   "crash_thread": true
        ,   "frames":
              [ {   "address": 5352736
                ,   "build_id": "69494bd95d056f5549e80b6fe507e55af574137f"
                ,   "build_id_offset": 1158432
                ,   "function_name": "sweep_one_area"
                ,   "file_name":
"/usr/lib64/erlang/erts-5.10.4/bin/beam.smp"
                }
              , {   "address": 5367589
                ,   "build_id": "69494bd95d056f5549e80b6fe507e55af574137f"
                ,   "build_id_offset": 1173285
                ,   "function_name": "erts_garbage_collect"
                ,   "file_name":
"/usr/lib64/erlang/erts-5.10.4/bin/beam.smp"
                }
              , {   "address": 5369251
                ,   "build_id": "69494bd95d056f5549e80b6fe507e55af574137f"
                ,   "build_id_offset": 1174947
                ,   "function_name": "erts_gc_after_bif_call"
                ,   "file_name":
"/usr/lib64/erlang/erts-5.10.4/bin/beam.smp"
                }
              , {   "address": 5871217
                ,   "build_id": "69494bd95d056f5549e80b6fe507e55af574137f"
                ,   "build_id_offset": 1676913
                ,   "function_name": "nbif_3_gc_after_bif"
                ,   "file_name":
"/usr/lib64/erlang/erts-5.10.4/bin/beam.smp"
                } ]
        }
      , {   "frames":
              [ {   "address": 1101651978
                ,   "build_id_offset": 1101651978
                } ]
        }
      , {   "frames":
              [ {   "address": 139994957883141
                ,   "build_id": "18562ee0363bc9bd7101610bd86469aa426d0c44"
                ,   "build_id_offset": 46853
                ,   "function_name": "pthread_cond_wait@@GLIBC_2.3.2"
                ,   "file_name": "/lib64/libpthread.so.0"
                }
              , {   "address": 6128777
                ,   "build_id": "69494bd95d056f5549e80b6fe507e55af574137f"
                ,   "build_id_offset": 1934473
                ,   "function_name": "ethr_cond_wait"
                ,   "file_name":
"/usr/lib64/erlang/erts-5.10.4/bin/beam.smp"
                }
              , {   "address": 4665919
                ,   "build_id": "69494bd95d056f5549e80b6fe507e55af574137f"
                ,   "build_id_offset": 471615
                ,   "function_name": "sys_msg_dispatcher_func"
                ,   "file_name":
"/usr/lib64/erlang/erts-5.10.4/bin/beam.smp"
                }
              , {   "address": 6134325
                ,   "build_id": "69494bd95d056f5549e80b6fe507e55af574137f"
                ,   "build_id_offset": 1940021
                ,   "function_name": "thr_wrapper"
                ,   "file_name":
"/usr/lib64/erlang/erts-5.10.4/bin/beam.smp"
                }
              , {   "address": 139994957868531
                ,   "build_id": "18562ee0363bc9bd7101610bd86469aa426d0c44"
                ,   "build_id_offset": 32243
                ,   "function_name": "start_thread"
                ,   "file_name": "/lib64/libpthread.so.0"
                }
              , {   "address": 139994952778157
                ,   "build_id": "23d9f6f74c80c45a602094e5016f047bfc4d046c"
                ,   "build_id_offset": 1008045
                ,   "function_name": "__clone"
                ,   "file_name": "/lib64/libc.so.6"
                } ]
        }
      , {   "frames":
              [ {   "address": 139994957894237
                ,   "build_id": "18562ee0363bc9bd7101610bd86469aa426d0c44"
                ,   "build_id_offset": 57949
                ,   "function_name": "read"
                ,   "file_name": "/lib64/libpthread.so.0"
                }
              , {   "address": 5741674
                ,   "build_id": "69494bd95d056f5549e80b6fe507e55af574137f"
                ,   "build_id_offset": 1547370
                ,   "function_name": "signal_dispatcher_thread_func"
                ,   "file_name":
"/usr/lib64/erlang/erts-5.10.4/bin/beam.smp"
                }
              , {   "address": 6134325
                ,   "build_id": "69494bd95d056f5549e80b6fe507e55af574137f"
                ,   "build_id_offset": 1940021
                ,   "function_name": "thr_wrapper"
                ,   "file_name":
"/usr/lib64/erlang/erts-5.10.4/bin/beam.smp"
                }
              , {   "address": 139994957868531
                ,   "build_id": "18562ee0363bc9bd7101610bd86469aa426d0c44"
                ,   "build_id_offset": 32243
                ,   "function_name": "start_thread"
                ,   "file_name": "/lib64/libpthread.so.0"
                }
              , {   "address": 139994952778157
                ,   "build_id": "23d9f6f74c80c45a602094e5016f047bfc4d046c"
                ,   "build_id_offset": 1008045
                ,   "function_name": "__clone"
                ,   "file_name": "/lib64/libc.so.6"
                } ]
        }
...

Thanks in advance for any suggestions and help,
-Gene
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20150211/edd4e2eb/attachment.htm>


More information about the erlang-questions mailing list