[erlang-questions] Segfault in do_allocate_logger_message

Vince Foley vincefoley@REDACTED
Sat Apr 21 22:51:38 CEST 2018


Thanks for taking a look and such a quick PR! Here's the output from the
gdb commands:

```
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00000000004c9511 in do_allocate_logger_message (p=<synthetic
pointer>, sz=280577276613695, bp=<synthetic pointer>, ohp=<synthetic
pointer>, hp=0x7f97ba57f7f0, gleader=128) at beam/utils.c:1958
1958    *hp = (*bp)->mem;
[Current thread is 1 (LWP 511)]
(gdb) up
#1  do_send_term_to_logger (tag=843, args=140289643116560, len=-966352886,
buf=0x7f97c1f7db72 "", gleader=128) at beam/utils.c:2058
2058    gl = do_allocate_logger_message(gleader, &hp, &ohp, &bp, &p, sz);
(gdb) up
#2  send_error_term_to_logger (args=140289643116560, len=-966352886,
buf=0x7f97c1f7db72 "", gleader=128) at beam/utils.c:2106
2106    return do_send_term_to_logger(am_error, gleader, buf, len, args);
(gdb) up
#3  erts_send_error_term_to_logger (gleader=128, dsbufp=0x7f97c1f7db70,
args=140289643116560) at beam/utils.c:2187
2187    res = send_error_term_to_logger(gleader, dsbufp->str,
dsbufp->str_len, args);
(gdb) list
2182
2183 int
2184 erts_send_error_term_to_logger(Eterm gleader, erts_dsprintf_buf_t
*dsbufp, Eterm args)
2185 {
2186    int res;
2187    res = send_error_term_to_logger(gleader, dsbufp->str,
dsbufp->str_len, args);
2188    destroy_logger_dsbuf(dsbufp);
2189    return res;
2190 }
2191
(gdb) print *dsbufp
$1 = {str = 0x80 <error: Cannot access memory at address 0x80>, str_len =
1506955, size = 401995, grow = 0x30000000160}
```

I'm not sure if that confirms your hypothesis or not...

I did find some huge size numbers in `args` though:


```
#1  do_send_term_to_logger (tag=843, args=140289643116560, len=-966352886,
buf=0x7f97c1f7db72 "", gleader=128) at beam/utils.c:2058
2058    gl = do_allocate_logger_message(gleader, &hp, &ohp, &bp, &p, sz);
(gdb) list
2053    args_sz = size_object(args);
2054    sz = len * 2 /* format */ + args_sz
2055 + 3 /*outer 2-tuple*/ + 4 /* middle 3-tuple */ + 4 /*inner 3-tuple */;
2056
2057    /* gleader size is accounted and allocated next */
2058    gl = do_allocate_logger_message(gleader, &hp, &ohp, &bp, &p, sz);
2059
2060    if(is_nil(gl)) {
2061       /* buf *always* points to a null terminated string */
2062       erts_fprintf(stderr, "(no error logger present) %T: \"%s\" %T\n",
(gdb) print args_sz
$1 = 140289566202896
(gdb) print sz
$2 = 140287633497135
```


Side question.. I am trying to use the etp commands to print out the Erlang
terms (
https://github.com/erlang/otp/blob/master/erts/etc/unix/etp-commands.in)

I keep getting this error:
```
(gdb) etp *bp
Cannot access memory at address 0xb974e0
```

And this kind of thing:
```
(gdb) etp-processes
No processes, since system isn't initialized!
```

Am I doing something wrong, or is this just not possible with my coredump?


On Sat, Apr 21, 2018 at 1:10 AM, Mikael Pettersson <mikpelinux@REDACTED>
wrote:

> On Sat, Apr 21, 2018 at 3:25 AM, Vince Foley <vincefoley@REDACTED> wrote:
> > Hi folks, I recently encountered a pretty strange segfault, and I was
> > wondering if anyone could provide any insight...
> >
> > The process is running just fine and then disappears. I grabbed the
> coredump
> > and opened it up and found this output:
> >
> > ```
> > Program terminated with signal SIGSEGV, Segmentation fault.
> > #0  0x00000000004c9511 in do_allocate_logger_message (p=<synthetic
> pointer>,
> > sz=280577276613695, bp=<synthetic pointer>, ohp=<synthetic pointer>,
> > hp=0x7f97ba57f7f0, gleader=128)
> >     at beam/utils.c:1958
> > ```
> >
> > There's a little more context in the backtrace...
> >
> > ```
> > (gdb) backtrace
> > #0  0x00000000004c9511 in do_allocate_logger_message (p=<synthetic
> pointer>,
> > sz=280577276613695, bp=<synthetic pointer>, ohp=<synthetic pointer>,
> > hp=0x7f97ba57f7f0, gleader=128)
> >     at beam/utils.c:1958
> > #1  do_send_term_to_logger (tag=843, args=140289643116560,
> len=-966352886,
> > buf=0x7f97c1f7db72 "", gleader=128) at beam/utils.c:2058
> > #2  send_error_term_to_logger (args=140289643116560, len=-966352886,
> > buf=0x7f97c1f7db72 "", gleader=128) at beam/utils.c:2106
> > #3  erts_send_error_term_to_logger (gleader=128, dsbufp=0x7f97c1f7db70,
> > args=140289643116560) at beam/utils.c:2187
>
> First, you're clearly on a 64-bit machine, I'm going to assume
> Linux/x86-64.
>
> What I think happened here is that you generated an error term with an
> output
> representation larger than 2GB.  In erts_send_error_term_to_logger()
> there is an
> unchecked narrowing conversion from size_t to int as dsbufp->str_len is
> passed
> to send_error_term_to_logger().  In the stack dump above, you see a large
> but
> negative 32-bit value in the "len" parameters -- that's a common sign
> of this kind
> of problem.  If you can, please re-run gdb, "up" to the
> erts_send_error_term_to_logger()
> frame, and "print *dsbufp"; I suspect dsbufp->str_len will be >=2GB.
>
> Then in do_send_term_to_logger() the negative len is use to compute the
> size
> of the message buffer, which results in sz=280577276613695 which is bogus
> (it's about 261 GB).
>
> The actual SIGSEGV probably comes from the new_message_buffer() call on
> line 1956 returning NULL for this bogus size.
>
> Fixing this will require code changes in util.c and maybe elsewhere too.
>
> What you can do meanwhile is to try to limit the sizes of terms sent to the
> error logger so their output representation is less than 2GB.  (As I type
> this I
> spot another bug in do_send_term_to_logger(), so you'll want to halve that
> limit to be less than 1GB.)
>
> /Mikael
>
> > #4  0x000000000059d990 in erts_bs_put_utf8 (EBS=0x7f97c1f7db72,
> > arg=140289845403658) at beam/erl_bits.c:850
> > #5  0x00007f97bfb79898 in ?? ()
> > #6  0x000000000000001e in ?? ()
> > #7  0x00007f97b5c675a8 in ?? ()
> > #8  0x00007f97bb25b5d8 in ?? ()
> > #9  0x00007f97b5c25ad8 in ?? ()
> > #10 0x00007f97c1f7b282 in ?? ()
> > #11 0x8450345d9c222796 in ?? ()
> > #12 0x00007f97bfc47008 in ?? ()
> > #13 0x00007f97bc8c3338 in ?? ()
> > #14 0x0000000000449c14 in process_main (x_reg_array=0x7f97b5c66d30,
> > f_reg_array=0x7f97c666a00a) at x86_64-pc-linux-musl/opt/smp/
> beam_hot.h:943
> > #15 0x00007f97bba80100 in ?? ()
> > #16 0x0000000000000000 in ?? ()
> > ```
> >
> > I do have an error_handler module added to the error_logger. Although
> there
> > doesn't appear to be any noticable memory growth or message queue backup
> in
> > the error_logger process before it dies.
> >
> > I can't quite trigger it myself but it does happen on a regular basis.
> >
> > Erlang/OTP 20 [erts-9.3] [source] [64-bit] [smp:16:16] [ds:16:16:10]
> > [async-threads:10] [hipe] [kernel-poll:false]
> > Elixir 1.6.4 (compiled with OTP 20)
> > Alpine 3.7
> >
> > Anyone have any clues?
> >
> >
> > _______________________________________________
> > erlang-questions mailing list
> > erlang-questions@REDACTED
> > http://erlang.org/mailman/listinfo/erlang-questions
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20180421/6bb3218d/attachment.htm>


More information about the erlang-questions mailing list