[erlang-questions] Segfault in do_allocate_logger_message

Mikael Pettersson mikpelinux@REDACTED
Sat Apr 21 10:10:04 CEST 2018


On Sat, Apr 21, 2018 at 3:25 AM, Vince Foley <vincefoley@REDACTED> wrote:
> Hi folks, I recently encountered a pretty strange segfault, and I was
> wondering if anyone could provide any insight...
>
> The process is running just fine and then disappears. I grabbed the coredump
> and opened it up and found this output:
>
> ```
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  0x00000000004c9511 in do_allocate_logger_message (p=<synthetic pointer>,
> sz=280577276613695, bp=<synthetic pointer>, ohp=<synthetic pointer>,
> hp=0x7f97ba57f7f0, gleader=128)
>     at beam/utils.c:1958
> ```
>
> There's a little more context in the backtrace...
>
> ```
> (gdb) backtrace
> #0  0x00000000004c9511 in do_allocate_logger_message (p=<synthetic pointer>,
> sz=280577276613695, bp=<synthetic pointer>, ohp=<synthetic pointer>,
> hp=0x7f97ba57f7f0, gleader=128)
>     at beam/utils.c:1958
> #1  do_send_term_to_logger (tag=843, args=140289643116560, len=-966352886,
> buf=0x7f97c1f7db72 "", gleader=128) at beam/utils.c:2058
> #2  send_error_term_to_logger (args=140289643116560, len=-966352886,
> buf=0x7f97c1f7db72 "", gleader=128) at beam/utils.c:2106
> #3  erts_send_error_term_to_logger (gleader=128, dsbufp=0x7f97c1f7db70,
> args=140289643116560) at beam/utils.c:2187

First, you're clearly on a 64-bit machine, I'm going to assume Linux/x86-64.

What I think happened here is that you generated an error term with an output
representation larger than 2GB.  In erts_send_error_term_to_logger() there is an
unchecked narrowing conversion from size_t to int as dsbufp->str_len is passed
to send_error_term_to_logger().  In the stack dump above, you see a large but
negative 32-bit value in the "len" parameters -- that's a common sign
of this kind
of problem.  If you can, please re-run gdb, "up" to the
erts_send_error_term_to_logger()
frame, and "print *dsbufp"; I suspect dsbufp->str_len will be >=2GB.

Then in do_send_term_to_logger() the negative len is use to compute the size
of the message buffer, which results in sz=280577276613695 which is bogus
(it's about 261 GB).

The actual SIGSEGV probably comes from the new_message_buffer() call on
line 1956 returning NULL for this bogus size.

Fixing this will require code changes in util.c and maybe elsewhere too.

What you can do meanwhile is to try to limit the sizes of terms sent to the
error logger so their output representation is less than 2GB.  (As I type this I
spot another bug in do_send_term_to_logger(), so you'll want to halve that
limit to be less than 1GB.)

/Mikael

> #4  0x000000000059d990 in erts_bs_put_utf8 (EBS=0x7f97c1f7db72,
> arg=140289845403658) at beam/erl_bits.c:850
> #5  0x00007f97bfb79898 in ?? ()
> #6  0x000000000000001e in ?? ()
> #7  0x00007f97b5c675a8 in ?? ()
> #8  0x00007f97bb25b5d8 in ?? ()
> #9  0x00007f97b5c25ad8 in ?? ()
> #10 0x00007f97c1f7b282 in ?? ()
> #11 0x8450345d9c222796 in ?? ()
> #12 0x00007f97bfc47008 in ?? ()
> #13 0x00007f97bc8c3338 in ?? ()
> #14 0x0000000000449c14 in process_main (x_reg_array=0x7f97b5c66d30,
> f_reg_array=0x7f97c666a00a) at x86_64-pc-linux-musl/opt/smp/beam_hot.h:943
> #15 0x00007f97bba80100 in ?? ()
> #16 0x0000000000000000 in ?? ()
> ```
>
> I do have an error_handler module added to the error_logger. Although there
> doesn't appear to be any noticable memory growth or message queue backup in
> the error_logger process before it dies.
>
> I can't quite trigger it myself but it does happen on a regular basis.
>
> Erlang/OTP 20 [erts-9.3] [source] [64-bit] [smp:16:16] [ds:16:16:10]
> [async-threads:10] [hipe] [kernel-poll:false]
> Elixir 1.6.4 (compiled with OTP 20)
> Alpine 3.7
>
> Anyone have any clues?
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>



More information about the erlang-questions mailing list