[erlang-bugs] Segmentation Fault in check_process_code / erts_garbage_collect

Soup zachary.hueras@REDACTED
Thu May 21 18:48:54 CEST 2015


This topic, or one very similar, appears to have been discussed before in
the erlang-patches mailing list thread titled "erlang node crashes in
erts_gc_after_bif_call" from October, 2012 (
http://erlang.org/pipermail/erlang-patches/2012-October/003072.html). No
clear resolution was reached on this thread, and I am currently dealing
with it in production systems, so I have decided to address this mailing
list.

Please see the bottom of the email for system specification, as I believe
this to be largely unrelated (except possibly for multithreading).

Please feel free to request any pertinent information I may have left out,
or to make suggestions to improve future bug reports. I don't often submit
bug reports, and am not at all familiar with Erlang/OTP's particular
practices in this regard.


*## Scenario and Error ##*
The error is a segmentation fault arising out of the erts_garbage_collect
and check_process_code functions.

The scenario is as follows:
1) You must be hot-loading a module (in my case, this module is dynamically
generated) periodically.
2) You must have non-suspended processes active in the module you are
hot-loading while it is being loaded (though not necessarily *in* the code
of the module; may be using terms from the module or having function
references ot the module).
3) Purging of the *old* version of the module must be happening at the same
time as garbage collection. (in my case, the garbage collection is explicit
because of the use of large binary terms with relatively few reductions;
that does not appear to be the case in the situation laid out in the
previously mentioned thread).

It appears, at least to my untrained eye, that garbage collection sweeps
can occur at the same time as code purging, and that this seems to happen
without multithreading protection. My reason for this suspicion is that in
my production systems I began receiving one of two segmentation faults: one
occuring in the function check_process_code (of
erts/emulator/beam/beam_bif_load.c) and erts_garbage_collect (of
erts/emulator/beam/erl_gc.c). Most of the time *in production*, the
segmentation fault occured in the check_process_code function. Only
sometimes did it appear to be coming from erts_garbage_collect.

*## Reproducing the Error ##*

It took a while, but I did ultimately manage to create an app which
reliably produces this error (insofar as I can tell). Please see the app
here: https://github.com/fauxsoup/erlang-sigsegv

There are some apparent differences from what I was observing in
production, but this could possibly be related to differences between my
production environment and my testing environment (which are non-trivial),
and potentially differences between my minimal test case and the production
service. Please see the bottom of this email for pertinent details about
both environments.

For testing, and because my production deployment of Erlang does not
include debug symbols, I recompiled Erlang/OTP 17.4 with the flags "-g -O2"
to produce debug symbols and prevent aggressive optimizations which may
distort the stacktrace.

The primary difference between the *results* of the error in production
versus testing is that the segmentation fault in testing *always *comes
from erts_garbage_collect. I have not at all been able to produce a test
result in which the segmentation fault occured in check_process_code using
the minimal test case code.

Another difference, which I believe to be caused by the inclusion of debug
symbols, is that erts_garbage_collect appears earlier in the backtrace in
testing, and that the actual segmentation fault appears to come from the
function sweep_one_area (erl_gc.c again). My assumption is that the
optimization and lack of debug symbols in the production system merely
obfuscated the origin of the segmentation fault there.



*## The Backtrace ##*
Included here for your convenience (also available in test case README):

Program received signal SIGSEGV, Segmentation fault.
                                                    [Switching to Thread
0x7ffff3b3e700 (LWP 26743)]
sweep_one_area (n_hp=0x7fffe8862028, n_htop=0x7fffe8862c48,
src=src@REDACTED=0x7fffe9ec2028
"", src_size=src_size@REDACTED=600224) at beam/erl_gc.c:1816
1816 mb->base = binary_bytes(*origptr);
(gdb) bt
#0  sweep_one_area (n_hp=0x7fffe8862028, n_htop=0x7fffe8862c48,
src=src@REDACTED=0x7fffe9ec2028 "", src_size=src_size@REDACTED=600224) at
beam/erl_gc.c:1816
#1  0x0000000000527ea0 in do_minor (nobj=1, objv=0x7ffff3b3dd50,
new_sz=121536, p=0x7ffff5c80800) at beam/erl_gc.c:1160
#2  minor_collection (recl=<synthetic pointer>, nobj=1,
objv=0x7ffff3b3dd50, need=0, p=0x7ffff5c80800) at beam/erl_gc.c:876
#3  erts_garbage_collect (p=0x7ffff5c80800, need=need@REDACTED=0,
objv=objv@REDACTED=0x7ffff3b3dd50, nobj=nobj@REDACTED=1) at beam/erl_gc.c:450
#4  0x000000000052877b in erts_gc_after_bif_call (p=0x7ffff5c80800,
result=140736302308346, regs=<optimized out>, arity=<optimized out>) at
beam/erl_gc.c:370
#5  0x0000000000571951 in process_main () at beam/beam_emu.c:2787
#6  0x00000000004a9a70 in sched_thread_func (vesdp=0x7ffff51cc8c0) at
beam/erl_process.c:7743
#7  0x00000000006056fb in thr_wrapper (vtwd=0x7fffffffd9a0) at
pthread/ethread.c:106
#8  0x00007ffff704d374 in start_thread () from /usr/lib/libpthread.so.0
#9  0x00007ffff6b8327d in clone () from /usr/lib/libc.so.6

*## The Systems ##*


*PRODUCTION*Erlang/OTP 17.4 (also observed on Erlang R15B01)
Amazon EC2 c3.8xlarge (32 Virtual CPUs, ~64 GB Memory)
Debian Wheezy
uname -a: Linux rtb0.ec2.chitika.net 3.2.0-4-amd64 #1 SMP Debian 3.2.63-2
x86_64 GNU/Linux

*TESTING*
Erlang/OTP 17.4
Intel Core i5 760 @ 2.80GHz (4 Logical CPUs, 2 cores IIRC), ~16GB Memory
Arch Linux (up-to-date)
uname -a: Linux diogenes 4.0.1-1-ARCH #1 SMP PREEMPT Wed Apr 29 12:00:26
CEST 2015 x86_64 GNU/Linux
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20150521/4bf8f638/attachment.htm>


More information about the erlang-bugs mailing list