[erlang-questions] Segfault in do_allocate_logger_message

Mon Apr 23 11:14:29 CEST 2018

On Sat, Apr 21, 2018 at 10:51 PM, Vince Foley <vincefoley@REDACTED> wrote:
> Thanks for taking a look and such a quick PR! Here's the output from the gdb
> commands:
>
> ```
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  0x00000000004c9511 in do_allocate_logger_message (p=<synthetic pointer>,
> sz=280577276613695, bp=<synthetic pointer>, ohp=<synthetic pointer>,
> hp=0x7f97ba57f7f0, gleader=128) at beam/utils.c:1958
> 1958    *hp = (*bp)->mem;
> [Current thread is 1 (LWP 511)]
> (gdb) up
> #1  do_send_term_to_logger (tag=843, args=140289643116560, len=-966352886,
> buf=0x7f97c1f7db72 "", gleader=128) at beam/utils.c:2058
> 2058    gl = do_allocate_logger_message(gleader, &hp, &ohp, &bp, &p, sz);
> (gdb) up
> #2  send_error_term_to_logger (args=140289643116560, len=-966352886,
> buf=0x7f97c1f7db72 "", gleader=128) at beam/utils.c:2106
> 2106    return do_send_term_to_logger(am_error, gleader, buf, len, args);
> (gdb) up
> #3  erts_send_error_term_to_logger (gleader=128, dsbufp=0x7f97c1f7db70,
> args=140289643116560) at beam/utils.c:2187
> 2187    res = send_error_term_to_logger(gleader, dsbufp->str,
> dsbufp->str_len, args);
> (gdb) list
> 2182
> 2183 int
> 2184 erts_send_error_term_to_logger(Eterm gleader, erts_dsprintf_buf_t
> *dsbufp, Eterm args)
> 2185 {
> 2186    int res;
> 2187    res = send_error_term_to_logger(gleader, dsbufp->str,
> dsbufp->str_len, args);
> 2188    destroy_logger_dsbuf(dsbufp);
> 2189    return res;
> 2190 }
> 2191
> (gdb) print *dsbufp
> $1 = {str = 0x80 <error: Cannot access memory at address 0x80>, str_len =
> 1506955, size = 401995, grow = 0x30000000160}
> ```
>
> I'm not sure if that confirms your hypothesis or not...

The values look weird, I'm pretty sure that *dsbufp got overwritten at
some point.
(My analysis about potential misbehaviour still holds though, just not confirmed
by your core.)

>
> I did find some huge size numbers in `args` though:

"args" is an Eterm encoding a pointer to a list in the heap, so it's
expected to be large.

>
>
> ```
> #1  do_send_term_to_logger (tag=843, args=140289643116560, len=-966352886,
> buf=0x7f97c1f7db72 "", gleader=128) at beam/utils.c:2058
> 2058    gl = do_allocate_logger_message(gleader, &hp, &ohp, &bp, &p, sz);
> (gdb) list
> 2053    args_sz = size_object(args);
> 2054    sz = len * 2 /* format */ + args_sz
> 2055 + 3 /*outer 2-tuple*/ + 4 /* middle 3-tuple */ + 4 /*inner 3-tuple */;
> 2056
> 2057    /* gleader size is accounted and allocated next */
> 2058    gl = do_allocate_logger_message(gleader, &hp, &ohp, &bp, &p, sz);
> 2059
> 2060    if(is_nil(gl)) {
> 2061       /* buf *always* points to a null terminated string */
> 2062       erts_fprintf(stderr, "(no error logger present) %T: \"%s\" %T\n",
> (gdb) print args_sz
> $1 = 140289566202896
> (gdb) print sz
> $2 = 140287633497135
> ```
>
>
> Side question.. I am trying to use the etp commands to print out the Erlang
> terms
> (https://github.com/erlang/otp/blob/master/erts/etc/unix/etp-commands.in)
>
> I keep getting this error:
> ```
> (gdb) etp *bp
> Cannot access memory at address 0xb974e0
> ```
>
> And this kind of thing:
> ```
> (gdb) etp-processes
> No processes, since system isn't initialized!
> ```
>
> Am I doing something wrong, or is this just not possible with my coredump?
>
>
> On Sat, Apr 21, 2018 at 1:10 AM, Mikael Pettersson <mikpelinux@REDACTED>
> wrote:
>>
>> On Sat, Apr 21, 2018 at 3:25 AM, Vince Foley <vincefoley@REDACTED> wrote:
>> > Hi folks, I recently encountered a pretty strange segfault, and I was
>> > wondering if anyone could provide any insight...
>> >
>> > The process is running just fine and then disappears. I grabbed the
>> > coredump
>> > and opened it up and found this output:
>> >
>> > ```
>> > Program terminated with signal SIGSEGV, Segmentation fault.
>> > #0  0x00000000004c9511 in do_allocate_logger_message (p=<synthetic
>> > pointer>,
>> > sz=280577276613695, bp=<synthetic pointer>, ohp=<synthetic pointer>,
>> > hp=0x7f97ba57f7f0, gleader=128)
>> >     at beam/utils.c:1958
>> > ```
>> >
>> > There's a little more context in the backtrace...
>> >
>> > ```
>> > (gdb) backtrace
>> > #0  0x00000000004c9511 in do_allocate_logger_message (p=<synthetic
>> > pointer>,
>> > sz=280577276613695, bp=<synthetic pointer>, ohp=<synthetic pointer>,
>> > hp=0x7f97ba57f7f0, gleader=128)
>> >     at beam/utils.c:1958
>> > #1  do_send_term_to_logger (tag=843, args=140289643116560,
>> > len=-966352886,
>> > buf=0x7f97c1f7db72 "", gleader=128) at beam/utils.c:2058
>> > #2  send_error_term_to_logger (args=140289643116560, len=-966352886,
>> > buf=0x7f97c1f7db72 "", gleader=128) at beam/utils.c:2106
>> > #3  erts_send_error_term_to_logger (gleader=128, dsbufp=0x7f97c1f7db70,
>> > args=140289643116560) at beam/utils.c:2187
>>
>> First, you're clearly on a 64-bit machine, I'm going to assume
>> Linux/x86-64.
>>
>> What I think happened here is that you generated an error term with an
>> output
>> representation larger than 2GB.  In erts_send_error_term_to_logger() there
>> is an
>> unchecked narrowing conversion from size_t to int as dsbufp->str_len is
>> passed
>> to send_error_term_to_logger().  In the stack dump above, you see a large
>> but
>> negative 32-bit value in the "len" parameters -- that's a common sign
>> of this kind
>> of problem.  If you can, please re-run gdb, "up" to the
>> erts_send_error_term_to_logger()
>> frame, and "print *dsbufp"; I suspect dsbufp->str_len will be >=2GB.
>>
>> Then in do_send_term_to_logger() the negative len is use to compute the
>> size
>> of the message buffer, which results in sz=280577276613695 which is bogus
>> (it's about 261 GB).
>>
>> The actual SIGSEGV probably comes from the new_message_buffer() call on
>> line 1956 returning NULL for this bogus size.
>>
>> Fixing this will require code changes in util.c and maybe elsewhere too.
>>
>> What you can do meanwhile is to try to limit the sizes of terms sent to
>> the
>> error logger so their output representation is less than 2GB.  (As I type
>> this I
>> spot another bug in do_send_term_to_logger(), so you'll want to halve that
>> limit to be less than 1GB.)
>>
>> /Mikael
>>
>> > #4  0x000000000059d990 in erts_bs_put_utf8 (EBS=0x7f97c1f7db72,
>> > arg=140289845403658) at beam/erl_bits.c:850
>> > #5  0x00007f97bfb79898 in ?? ()
>> > #6  0x000000000000001e in ?? ()
>> > #7  0x00007f97b5c675a8 in ?? ()
>> > #8  0x00007f97bb25b5d8 in ?? ()
>> > #9  0x00007f97b5c25ad8 in ?? ()
>> > #10 0x00007f97c1f7b282 in ?? ()
>> > #11 0x8450345d9c222796 in ?? ()
>> > #12 0x00007f97bfc47008 in ?? ()
>> > #13 0x00007f97bc8c3338 in ?? ()
>> > #14 0x0000000000449c14 in process_main (x_reg_array=0x7f97b5c66d30,
>> > f_reg_array=0x7f97c666a00a) at
>> > x86_64-pc-linux-musl/opt/smp/beam_hot.h:943
>> > #15 0x00007f97bba80100 in ?? ()
>> > #16 0x0000000000000000 in ?? ()
>> > ```
>> >
>> > I do have an error_handler module added to the error_logger. Although
>> > there
>> > doesn't appear to be any noticable memory growth or message queue backup
>> > in
>> > the error_logger process before it dies.
>> >
>> > I can't quite trigger it myself but it does happen on a regular basis.
>> >
>> > Erlang/OTP 20 [erts-9.3] [source] [64-bit] [smp:16:16] [ds:16:16:10]
>> > [async-threads:10] [hipe] [kernel-poll:false]
>> > Elixir 1.6.4 (compiled with OTP 20)
>> > Alpine 3.7
>> >
>> > Anyone have any clues?
>> >
>> >
>> > _______________________________________________
>> > erlang-questions mailing list
>> > erlang-questions@REDACTED
>> > http://erlang.org/mailman/listinfo/erlang-questions
>> >
>
>