<div dir="ltr">Thanks for taking a look and such a quick PR! Here's the output from the gdb commands:<div><br></div><div>```</div><div><div>Program terminated with signal SIGSEGV, Segmentation fault.</div><div>#0 0x00000000004c9511 in do_allocate_logger_message (p=<synthetic pointer>, sz=280577276613695, bp=<synthetic pointer>, ohp=<synthetic pointer>, hp=0x7f97ba57f7f0, gleader=128) at beam/utils.c:1958</div><div>1958<span class="gmail-Apple-tab-span" style="white-space:pre"> </span> *hp = (*bp)->mem;</div><div>[Current thread is 1 (LWP 511)]</div><div>(gdb) up<br></div><div>#1 do_send_term_to_logger (tag=843, args=140289643116560, len=-966352886, buf=0x7f97c1f7db72 "", gleader=128) at beam/utils.c:2058</div><div>2058<span class="gmail-Apple-tab-span" style="white-space:pre"> </span> gl = do_allocate_logger_message(gleader, &hp, &ohp, &bp, &p, sz);</div><div>(gdb) up</div><div>#2 send_error_term_to_logger (args=140289643116560, len=-966352886, buf=0x7f97c1f7db72 "", gleader=128) at beam/utils.c:2106</div><div>2106<span class="gmail-Apple-tab-span" style="white-space:pre"> </span> return do_send_term_to_logger(am_error, gleader, buf, len, args);</div><div>(gdb) up</div><div>#3 erts_send_error_term_to_logger (gleader=128, dsbufp=0x7f97c1f7db70, args=140289643116560) at beam/utils.c:2187</div><div>2187<span class="gmail-Apple-tab-span" style="white-space:pre"> </span> res = send_error_term_to_logger(gleader, dsbufp->str, dsbufp->str_len, args);</div><div>(gdb) list</div><div>2182</div><div>2183<span class="gmail-Apple-tab-span" style="white-space:pre"> </span>int</div><div>2184<span class="gmail-Apple-tab-span" style="white-space:pre"> </span>erts_send_error_term_to_logger(Eterm gleader, erts_dsprintf_buf_t *dsbufp, Eterm args)</div><div>2185<span class="gmail-Apple-tab-span" style="white-space:pre"> </span>{</div><div>2186<span class="gmail-Apple-tab-span" style="white-space:pre"> </span> int res;</div><div>2187<span class="gmail-Apple-tab-span" style="white-space:pre"> </span> res = send_error_term_to_logger(gleader, dsbufp->str, dsbufp->str_len, args);</div><div>2188<span class="gmail-Apple-tab-span" style="white-space:pre"> </span> destroy_logger_dsbuf(dsbufp);</div><div>2189<span class="gmail-Apple-tab-span" style="white-space:pre"> </span> return res;</div><div>2190<span class="gmail-Apple-tab-span" style="white-space:pre"> </span>}</div><div>2191</div><div>(gdb) print *dsbufp</div><div>$1 = {str = 0x80 <error: Cannot access memory at address 0x80>, str_len = 1506955, size = 401995, grow = 0x30000000160}</div></div><div>```</div><div><br></div><div>I'm not sure if that confirms your hypothesis or not...</div><div><br></div><div>I did find some huge size numbers in `args` though:</div><div><br></div><div><br></div><div>```</div><div><div>#1 do_send_term_to_logger (tag=843, args=140289643116560, len=-966352886, buf=0x7f97c1f7db72 "", gleader=128) at beam/utils.c:2058</div><div>2058<span class="gmail-Apple-tab-span" style="white-space:pre"> </span> gl = do_allocate_logger_message(gleader, &hp, &ohp, &bp, &p, sz);</div><div>(gdb) list</div><div>2053<span class="gmail-Apple-tab-span" style="white-space:pre"> </span> args_sz = size_object(args);</div><div>2054<span class="gmail-Apple-tab-span" style="white-space:pre"> </span> sz = len * 2 /* format */ + args_sz</div><div>2055<span class="gmail-Apple-tab-span" style="white-space:pre"> </span>+ 3 /*outer 2-tuple*/ + 4 /* middle 3-tuple */ + 4 /*inner 3-tuple */;</div><div>2056</div><div>2057<span class="gmail-Apple-tab-span" style="white-space:pre"> </span> /* gleader size is accounted and allocated next */</div><div>2058<span class="gmail-Apple-tab-span" style="white-space:pre"> </span> gl = do_allocate_logger_message(gleader, &hp, &ohp, &bp, &p, sz);</div><div>2059</div><div>2060<span class="gmail-Apple-tab-span" style="white-space:pre"> </span> if(is_nil(gl)) {</div><div>2061<span class="gmail-Apple-tab-span" style="white-space:pre"> </span> /* buf *always* points to a null terminated string */</div><div>2062<span class="gmail-Apple-tab-span" style="white-space:pre"> </span> erts_fprintf(stderr, "(no error logger present) %T: \"%s\" %T\n",</div><div>(gdb) print args_sz</div><div>$1 = 140289566202896</div><div>(gdb) print sz</div><div>$2 = 140287633497135</div></div><div>```</div><div><br></div><div><br></div><div>Side question.. I am trying to use the etp commands to print out the Erlang terms (<a href="https://github.com/erlang/otp/blob/master/erts/etc/unix/etp-commands.in">https://github.com/erlang/otp/blob/master/erts/etc/unix/etp-commands.in</a>)</div><div><br></div><div>I keep getting this error:</div><div>```</div><div><div>(gdb) etp *bp</div><div>Cannot access memory at address 0xb974e0</div></div><div>```</div><div><br></div><div>And this kind of thing:</div><div>```</div><div><div>(gdb) etp-processes</div><div>No processes, since system isn't initialized!</div></div><div>```</div><div><br></div><div>Am I doing something wrong, or is this just not possible with my coredump?</div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Sat, Apr 21, 2018 at 1:10 AM, Mikael Pettersson <span dir="ltr"><<a href="mailto:mikpelinux@gmail.com" target="_blank">mikpelinux@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On Sat, Apr 21, 2018 at 3:25 AM, Vince Foley <<a href="mailto:vincefoley@gmail.com">vincefoley@gmail.com</a>> wrote:<br>
> Hi folks, I recently encountered a pretty strange segfault, and I was<br>
> wondering if anyone could provide any insight...<br>
><br>
> The process is running just fine and then disappears. I grabbed the coredump<br>
> and opened it up and found this output:<br>
><br>
> ```<br>
> Program terminated with signal SIGSEGV, Segmentation fault.<br>
> #0 0x00000000004c9511 in do_allocate_logger_message (p=<synthetic pointer>,<br>
> sz=280577276613695, bp=<synthetic pointer>, ohp=<synthetic pointer>,<br>
> hp=0x7f97ba57f7f0, gleader=128)<br>
> at beam/utils.c:1958<br>
> ```<br>
><br>
> There's a little more context in the backtrace...<br>
><br>
> ```<br>
> (gdb) backtrace<br>
> #0 0x00000000004c9511 in do_allocate_logger_message (p=<synthetic pointer>,<br>
> sz=280577276613695, bp=<synthetic pointer>, ohp=<synthetic pointer>,<br>
> hp=0x7f97ba57f7f0, gleader=128)<br>
> at beam/utils.c:1958<br>
> #1 do_send_term_to_logger (tag=843, args=140289643116560, len=-966352886,<br>
> buf=0x7f97c1f7db72 "", gleader=128) at beam/utils.c:2058<br>
> #2 send_error_term_to_logger (args=140289643116560, len=-966352886,<br>
> buf=0x7f97c1f7db72 "", gleader=128) at beam/utils.c:2106<br>
> #3 erts_send_error_term_to_logger (gleader=128, dsbufp=0x7f97c1f7db70,<br>
> args=140289643116560) at beam/utils.c:2187<br>
<br>
</span>First, you're clearly on a 64-bit machine, I'm going to assume Linux/x86-64.<br>
<br>
What I think happened here is that you generated an error term with an output<br>
representation larger than 2GB. In erts_send_error_term_to_<wbr>logger() there is an<br>
unchecked narrowing conversion from size_t to int as dsbufp->str_len is passed<br>
to send_error_term_to_logger(). In the stack dump above, you see a large but<br>
negative 32-bit value in the "len" parameters -- that's a common sign<br>
of this kind<br>
of problem. If you can, please re-run gdb, "up" to the<br>
erts_send_error_term_to_<wbr>logger()<br>
frame, and "print *dsbufp"; I suspect dsbufp->str_len will be >=2GB.<br>
<br>
Then in do_send_term_to_logger() the negative len is use to compute the size<br>
of the message buffer, which results in sz=280577276613695 which is bogus<br>
(it's about 261 GB).<br>
<br>
The actual SIGSEGV probably comes from the new_message_buffer() call on<br>
line 1956 returning NULL for this bogus size.<br>
<br>
Fixing this will require code changes in util.c and maybe elsewhere too.<br>
<br>
What you can do meanwhile is to try to limit the sizes of terms sent to the<br>
error logger so their output representation is less than 2GB. (As I type this I<br>
spot another bug in do_send_term_to_logger(), so you'll want to halve that<br>
limit to be less than 1GB.)<br>
<br>
/Mikael<br>
<span class=""><br>
> #4 0x000000000059d990 in erts_bs_put_utf8 (EBS=0x7f97c1f7db72,<br>
> arg=140289845403658) at beam/erl_bits.c:850<br>
> #5 0x00007f97bfb79898 in ?? ()<br>
> #6 0x000000000000001e in ?? ()<br>
> #7 0x00007f97b5c675a8 in ?? ()<br>
> #8 0x00007f97bb25b5d8 in ?? ()<br>
> #9 0x00007f97b5c25ad8 in ?? ()<br>
> #10 0x00007f97c1f7b282 in ?? ()<br>
> #11 0x8450345d9c222796 in ?? ()<br>
> #12 0x00007f97bfc47008 in ?? ()<br>
> #13 0x00007f97bc8c3338 in ?? ()<br>
> #14 0x0000000000449c14 in process_main (x_reg_array=0x7f97b5c66d30,<br>
> f_reg_array=0x7f97c666a00a) at x86_64-pc-linux-musl/opt/smp/<wbr>beam_hot.h:943<br>
> #15 0x00007f97bba80100 in ?? ()<br>
> #16 0x0000000000000000 in ?? ()<br>
> ```<br>
><br>
> I do have an error_handler module added to the error_logger. Although there<br>
> doesn't appear to be any noticable memory growth or message queue backup in<br>
> the error_logger process before it dies.<br>
><br>
> I can't quite trigger it myself but it does happen on a regular basis.<br>
><br>
> Erlang/OTP 20 [erts-9.3] [source] [64-bit] [smp:16:16] [ds:16:16:10]<br>
> [async-threads:10] [hipe] [kernel-poll:false]<br>
> Elixir 1.6.4 (compiled with OTP 20)<br>
> Alpine 3.7<br>
><br>
> Anyone have any clues?<br>
><br>
><br>
</span>> ______________________________<wbr>_________________<br>
> erlang-questions mailing list<br>
> <a href="mailto:erlang-questions@erlang.org">erlang-questions@erlang.org</a><br>
> <a href="http://erlang.org/mailman/listinfo/erlang-questions" rel="noreferrer" target="_blank">http://erlang.org/mailman/<wbr>listinfo/erlang-questions</a><br>
><br>
</blockquote></div><br></div>