# How to Interpret the Erlang Crash Dumps This section describes the `erl_crash.dump` file generated upon abnormal exit of the Erlang runtime system. > #### Note {: .info } > > The Erlang crash dump had a major facelift in Erlang/OTP R9C. The information > in this section is therefore not directly applicable for older dumps. However, > if you use `m:crashdump_viewer` on older dumps, the crash dumps are translated > into a format similar to this. The system writes the crash dump in the current directory of the emulator or in the file pointed out by the environment variable (whatever that means on the current operating system) `ERL_CRASH_DUMP`. For a crash dump to be written, a writable file system must be mounted. Crash dumps are written mainly for one of two reasons: either the built-in function `erlang:halt/1` is called explicitly with a string argument from running Erlang code, or the runtime system has detected an error that cannot be handled. The most usual reason that the system cannot handle the error is that the cause is external limitations, such as running out of memory. A crash dump caused by an internal error can be caused by the system reaching limits in the emulator itself (like the number of atoms in the system, or too many simultaneous ETS tables). Usually the emulator or the operating system can be reconfigured to avoid the crash, which is why interpreting the crash dump correctly is important. On systems that support OS signals, it is also possible to stop the runtime system and generate a crash dump by sending the `SIGUSR1` signal. The Erlang crash dump is a readable text file, but it can be difficult to read. Using the Crashdump Viewer tool in the `Observer` application simplifies the task. This is a wx-widget-based tool for browsing Erlang crash dumps. ## General Information The first part of the crash dump shows the following: - The creation time for the dump - A slogan indicating the reason for the dump - The system version of the node from which the dump originates - The number of atoms in the atom table - The runtime system thread that caused the crash dump ### Reasons for Crash Dumps (Slogan) The reason for the dump is shown in the beginning of the file as: ```text Slogan: ``` If the system is halted by the BIF `erlang:halt/1`, the slogan is the string parameter passed to the BIF, otherwise it is a description generated by the emulator or the (Erlang) kernel. Normally the message is enough to understand the problem, but some messages are described here. Notice that the suggested reasons for the crash are _only suggestions_. The exact reasons for the errors can vary depending on the local applications and the underlying operating system. - **_: Cannot allocate bytes of memory (of type "")_** - The system has run out of memory. is the allocator that failed to allocate memory, is the number of bytes that tried to allocate, and is the memory block type that the memory was needed for. The most common case is that a process stores huge amounts of data. In this case is most often `heap`, `old_heap`, `heap_frag`, or `binary`. For more information on allocators, see [`erts_alloc(3)`](erts_alloc.md). - **_: Cannot reallocate bytes of memory (of type "")_** - Same as above except that memory was reallocated instead of allocated when the system ran out of memory. - **_Unexpected op code _** - Error in compiled code, `beam` file damaged, or error in the compiler. - **_Module undefined `|` Function undefined `|` No function :/1 `|` No function :start/2_** - The Kernel/STDLIB applications are damaged or the start script is damaged. - **_Driver_select called with too large file descriptor `N`_** - The number of file descriptors for sockets exceeds 1024 (Unix only). The limit on file descriptors in some Unix flavors can be set to over 1024, but only 1024 sockets/pipes can be used simultaneously by Erlang (because of limitations in the Unix `select` call). The number of open regular files is not affected by this. - **_Received SIGUSR1_** - Sending the `SIGUSR1` signal to an Erlang machine (Unix only) forces a crash dump. This slogan reflects that the Erlang machine crash-dumped because of receiving that signal. - **_Kernel pid terminated () ()_** - The kernel supervisor has detected a failure, usually that the `application_controller` has shut down (`Who` = `application_controller`, `Why` = `shutdown`). The application controller can have shut down for many reasons, the most usual is that the node name of the distributed Erlang node is already in use. A complete supervisor tree "crash" (that is, the top supervisors have exited) gives about the same result. This message comes from the Erlang code and not from the virtual machine itself. It is always because of some failure in an application, either within OTP or a "user-written" one. Looking at the error log for your application is probably the first step to take. - **_Init terminating in do_boot ()_** - The primitive Erlang boot sequence was terminated, most probably because the boot script has errors or cannot be read. This is usually a configuration error; the system can have been started with a faulty `-boot` parameter or with a boot script from the wrong OTP version. - **_Could not start kernel pid () ()_** - One of the kernel processes could not start. This is probably because of faulty arguments (like errors in a `-config` argument) or faulty configuration files. Check that all files are in their correct location and that the configuration files (if any) are not damaged. Usually messages are also written to the controlling terminal and/or the error log explaining what is wrong. Other errors than these can occur, as the `erlang:halt/1` BIF can generate any message. If the message is not generated by the BIF and does not occur in the list above, it can be because of an error in the emulator. There can however be unusual messages, not mentioned here, which are still connected to an application failure. There is much more information available, so a thorough reading of the crash dump can reveal the crash reason. The size of processes, the number of ETS tables, and the Erlang data on each process stack can be useful to find the problem. ### Number of Atoms The number of atoms in the system at the time of the crash is shown as `Atoms: `. Some ten thousands atoms is perfectly normal, but more can indicate that the BIF `erlang:list_to_atom/1` is used to generate many _different_ atoms dynamically, which is never a good idea. ## Scheduler Information Under the tag _=scheduler_ is shown information about the current state and statistics of the schedulers in the runtime system. On operating systems that allow suspension of other threads, the data within this section reflects what the runtime system looks like when a crash occurs. The following fields can exist for a process: - **_=scheduler:id_** - Heading. States the scheduler identifier. - **_Scheduler Sleep Info Flags_** - If empty, the scheduler was doing some work. If not empty, the scheduler is either in some state of sleep, or suspended. - **_Scheduler Sleep Info Aux Work_** - If not empty, a scheduler internal auxiliary work is scheduled to be done. - **_Current Port_** - The port identifier of the port that is currently executed by the scheduler. - **_Current Process_** - The process identifier of the process that is currently executed by the scheduler. If there is such a process, this entry is followed by the _State_, _Internal State_, _Program Counter_, and _CP_ of that same process. The entries are described in section [Process Information](crash_dump.md#process-data). Notice that this is a snapshot of what the entries are exactly when the crash dump is starting to be generated. Therefore they are most likely different (and more telling) than the entries for the same processes found in the _=proc_ section. If there is no currently running process, only the _Current Process_ entry is shown. - **_Current Process Limited Stack Trace_** - This entry is shown only if there is a current process. It is similar to [_=proc_stack_](crash_dump.md#process-data), except that only the function frames are shown (that is, the stack variables are omitted). Also, only the top and bottom part of the stack are shown. If the stack is small (< 512 slots), the entire stack is shown. Otherwise the entry _skipping ## slots_ is shown, where `##` is replaced by the number of slots that has been skipped. - **_Run Queue_** - Shows statistics about how many processes and ports of different priorities are scheduled on this scheduler. - **\*\*\* crashed \*\*\*** - This entry is normally not shown. It signifies that getting the rest of the information about this scheduler failed for some reason. ## Memory Information Under the tag _=memory_ is shown information similar to what can be obtained on a living node with [`erlang:memory()`](`erlang:memory/0`). ## Internal Table Information Under the tags _=hash_table:_ and _=index_table:_ is shown internal tables. These are mostly of interest for runtime system developers. ## Allocated Areas Under the tag _=allocated_areas_ is shown information similar to what can be obtained on a living node with [`erlang:system_info(allocated_areas)`](`m:erlang#system_info_allocated_areas`). ## Allocator Under the tag _=allocator:_ is shown various information about allocator . The information is similar to what can be obtained on a living node with [`erlang:system_info({allocator, })`](`m:erlang#system_info_allocator_tuple`). For more information, see also [`erts_alloc(3)`](erts_alloc.md). ## Process Information The Erlang crashdump contains a listing of each living Erlang process in the system. The following fields can exist for a process: - **_=proc:_** - Heading. States the process identifier. - **_State_** - The state of the process. This can be one of the following: - **_Scheduled_** - The process was scheduled to run but is currently not running ("in the run queue"). - **_Waiting_** - The process was waiting for something (in `receive`). - **_Running_** - The process was currently running. If the BIF `erlang:halt/1` was called, this was the process calling it. - **_Exiting_** - The process was on its way to exit. - **_Garbing_** - This is bad luck, the process was garbage collecting when the crash dump was written. The rest of the information for this process is limited. - **_Suspended_** - The process is suspended, either by the BIF `erlang:suspend_process/1` or because it tries to write to a busy port. - **_Registered name_** - The registered name of the process, if any. - **_Spawned as_** - The entry point of the process, that is, what function was referenced in the `spawn` or `spawn_link` call that started the process. - **_Last scheduled in for | Current call_** - The current function of the process. These fields do not always exist. - **_Spawned by_** - The parent of the process, that is, the process that executed `spawn` or `spawn_link`. - **_Started_** - The date and time when the process was started. - **_Message queue length_** - The number of messages in the process' message queue. - **_Number of heap fragments_** - The number of allocated heap fragments. - **_Heap fragment data_** - Size of fragmented heap data, in words. This is data either created by messages sent to the process or by the Erlang BIFs. This amount depends on so many things that this field is usually uninteresting. - **_Link list_** - Process IDs of processes linked to this one. Can also contain ports. If process monitoring is used, this field also tells in which direction the monitoring is in effect. That is, a link "to" a process tells you that the "current" process was monitoring the other, and a link "from" a process tells you that the other process was monitoring the current one. - **_Reductions_** - The number of reductions consumed by the process. - **_Stack+heap_** - The size of the stack and heap, in words (they share memory segment). - **_OldHeap_** - The size of the "old heap", in words. The Erlang virtual machine uses generational garbage collection with two generations. There is one heap for new data items and one for the data that has survived two garbage collections. The assumption (which is almost always correct) is that data surviving two garbage collections can be "tenured" to a heap more seldom garbage collected, as they will live for a long period. This is a usual technique in virtual machines. The sum of the heaps and stack together constitute most of the allocated memory of the process. - **_Heap unused, OldHeap unused_** - The amount of unused memory on each heap, in words. This information is usually useless. - **_Memory_** - The total memory used by this process, in bytes. This includes call stack, heap, and internal structures. Same as [`erlang:process_info(Pid,memory)`](`erlang:process_info/2`). - **_Program counter_** - The current instruction pointer. This is only of interest for runtime system developers. The function into which the program counter points is the current function of the process. - **_CP_** - The continuation pointer, that is, the return address for the current call. Usually useless for other than runtime system developers. This can be followed by the function into which the CP points, which is the function calling the current function. - **_Arity_** - The number of live argument registers. The argument registers if any are live will follow. These can contain the arguments of the function if they are not yet moved to the stack. - **_Internal State_** - A more detailed internal representation of the state of this process. See also section [Process Data](crash_dump.md#process-data). ## Port Information This section lists the open ports, their owners, any linked processes, and the name of their driver or external process. ## ETS Tables This section contains information about all the ETS tables in the system. The following fields are of interest for each table: - **_=ets:_** - Heading. States the table owner (a process identifier). - **_Table_** - The identifier for the table. If the table is a `named_table`, this is the name. - **_Name_** - The table name, regardless of if it is a `named_table` or not. - **_Hash table, Buckets_** - If the table is a hash table, that is, if it is not an `ordered_set`. - **_Hash table, Chain Length_** - If the table is a hash table. Contains statistics about the table, such as the maximum, minimum, and average chain length. Having a maximum much larger than the average, and a standard deviation much larger than the expected standard deviation is a sign that the hashing of the terms behaves badly for some reason. - **_Ordered set (AVL tree), Elements_** - If the table is an `ordered_set`. (The number of elements is the same as the number of objects in the table.) - **_Fixed_** - If the table is fixed using `ets:safe_fixtable/2` or some internal mechanism. - **_Objects_** - The number of objects in the table. - **_Words_** - The number of words allocated to data in the table. - **_Type_** - The table type, that is, `set`, `bag`, `duplicate_bag`, or `ordered_set`. - **_Compressed_** - If the table was compressed. - **_Protection_** - The protection of the table. - **_Write Concurrency_** - If `write_concurrency` was enabled for the table. - **_Read Concurrency_** - If `read_concurrency` was enabled for the table. ## Timers This section contains information about all the timers started with the BIFs `erlang:start_timer/3` and `erlang:send_after/3`. The following fields exist for each timer: - **_=timer:_** - Heading. States the timer owner (a process identifier), that is, the process to receive the message when the timer expires. - **_Message_** - The message to be sent. - **_Time left_** - Number of milliseconds left until the message would have been sent. ## Distribution Information If the Erlang node was alive, that is, set up for communicating with other nodes, this section lists the connections that were active. The following fields can exist: - **_=node:_** - The node name. - **_no_distribution_** - If the node was not distributed. - **_=visible_node:_** - Heading for a visible node, that is, an alive node with a connection to the node that crashed. States the channel number for the node. - **_=hidden_node:_** - Heading for a hidden node. A hidden node is the same as a visible node, except that it is started with the `"-hidden"` flag. States the channel number for the node. - **_=not_connected:_** - Heading for a node that was connected to the crashed node earlier. References (that is, process or port identifiers) to the not connected node existed at the time of the crash. States the channel number for the node. - **_Name_** - The name of the remote node. - **_Controller_** - The port controlling communication with the remote node. - **_Creation_** - An integer (1-3) that together with the node name identifies a specific instance of the node. - **_Remote monitoring: _** - The local process was monitoring the remote process at the time of the crash. - **_Remotely monitored by: _** - The remote process was monitoring the local process at the time of the crash. - **_Remote link: _** - A link existed between the local process and the remote process at the time of the crash. ## Loaded Module Information This section contains information about all loaded modules. First, the memory use by the loaded code is summarized: - **_Current code_** - Code that is the current latest version of the modules. - **_Old code_** - Code where there exists a newer version in the system, but the old version is not yet purged. Then, all loaded modules are listed. The following fields exist: - **_=mod:_** - Heading. States the module name. - **_Current size_** - Memory use for the loaded code, in bytes. - **_Old size_** - Memory use for the old code, in bytes. - **_Current attributes_** - Module attributes for the current code. This field is decoded when looked at by the Crashdump Viewer tool. - **_Old attributes_** - Module attributes for the old code, if any. This field is decoded when looked at by the Crashdump Viewer tool. - **_Current compilation info_** - Compilation information (options) for the current code. This field is decoded when looked at by the Crashdump Viewer tool. - **_Old compilation info_** - Compilation information (options) for the old code, if any. This field is decoded when looked at by the Crashdump Viewer tool. ## Fun Information This section lists all funs. The following fields exist for each fun: - **_=fun_** - Heading. - **_Module_** - The name of the module where the fun was defined. - **_Uniq, Index_** - Identifiers. - **_Address_** - The address of the fun's code. - **_Refc_** - The number of references to the fun. ## Process Data For each process there is at least one _=proc_stack_ and one _=proc_heap_ tag, followed by the raw memory information for the stack and heap of the process. For each process there is also a _=proc_messages_ tag if the process message queue is non-empty, and a _=proc_dictionary_ tag if the process dictionary (the [`put/2`](`put/2`) and [`get/1`](`get/1`) thing) is non-empty. The raw memory information can be decoded by the Crashdump Viewer tool. You can then see the stack dump, the message queue (if any), and the dictionary (if any). The stack dump is a dump of the Erlang process stack. Most of the live data (that is, variables currently in use) are placed on the stack; thus this can be interesting. One has to "guess" what is what, but as the information is symbolic, thorough reading of this information can be useful. As an example, we can find the state variable of the Erlang primitive loader online `(5)` and `(6)` in the following example: ```erlang (1) 3cac44 Return addr 0x13BF58 () (2) y(0) ["/view/siri_r10_dev/clearcase/otp/erts/lib/kernel/ebin", (3) "/view/siri_r10_dev/clearcase/otp/erts/lib/stdlib/ebin"] (4) y(1) <0.1.0> (5) y(2) {state,[],none,#Fun,undefined,#Fun, (6) #Fun,#Port<0.2>,infinity,#Fun} (7) y(3) infinity ``` When interpreting the data for a process, it is helpful to know that anonymous function objects (funs) are given the following: - A name constructed from the name of the function in which they are created - A number (starting with 0) indicating the number of that fun within that function ## Atoms This section presents all the atoms in the system. This is only of interest if one suspects that dynamic generation of atoms can be a problem, otherwise this section can be ignored. Notice that the last created atom is shown first. ## Disclaimer The format of the crash dump evolves between OTP releases. Some information described here may not apply to your version. A description like this will never be complete; it is meant as an explanation of the crash dump in general and as a help when trying to find application errors, not as a complete specification.