<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML>
<HEAD><!-- Template generated by Exclaimer Template Editor on 11:35:02 Wednesday, 21 November 2012 -->
<STYLE type=text/css>P.cd987f72-e700-448b-b4e1-7fb38b81e891 {
MARGIN: 0cm 0cm 0pt
}
LI.cd987f72-e700-448b-b4e1-7fb38b81e891 {
MARGIN: 0cm 0cm 0pt
}
DIV.cd987f72-e700-448b-b4e1-7fb38b81e891 {
MARGIN: 0cm 0cm 0pt
}
TABLE.cd987f72-e700-448b-b4e1-7fb38b81e891Table {
MARGIN: 0cm 0cm 0pt
}
DIV.Section1 {
page: Section1
}
</STYLE>
<meta http-equiv="Content-Type" content="text/html; charset=koi8-r" />
<meta content="MSHTML 6.00.6000.21316" name="GENERATOR" />
</HEAD>
<BODY text="#000000" bgcolor="#ffffff">
<P>
<div dir="ltr" align="left"><span class="516193016-21112012"><font face="Arial" color="#0000ff" size="2">Something my team just noticed was that our segv occurs right after reboot of the box consistantly. After which beam appears to work alright. We are trying
to narrow down what code is triggering it but it may take some time.</font></span></div>
<br />
<div class="OutlookMessageHeader" lang="en-us" dir="ltr" align="left">
<hr tabindex="-1" />
<font face="Tahoma" size="2"><b>From:</b> Patrik Nyblom [mailto:pan@erlang.org] <br />
<b>Sent:</b> Wednesday, November 21, 2012 6:09 AM<br />
<b>To:</b> Denis Titoruk<br />
<b>Cc:</b> Musumeci, Antonio S (Enterprise Infrastructure); erlang-bugs@erlang.org<br />
<b>Subject:</b> Re: [erlang-bugs] beam core'ing<br />
</font><br />
</div>
<div></div>
<div class="moz-cite-prefix">Hi again :)<br />
<br />
Another thing that would be helpful is if you could create a crash dump instead of a fprintf when the binary is wrongly moved, i.e. call erl_exit(ERTS_DUMP_EXIT, "erts_current_bin != (pb->bytes)"); instead of the fprintf? Then you could isolate the erlang code
snippet that exercises the bug and I maybe could create a smaller testcase... A simple testcase when diving into the GC would be really helpful :)<br />
<br />
Cheers,<br />
/Patrik<br />
<br />
On 11/21/2012 11:21 AM, Denis Titoruk wrote:<br />
</div>
<blockquote cite="mid:8436D993-C4FC-4822-B0E8-7A2D6AB2E0C9@gmail.com" type="cite">
<br />
<div>
<div>21.11.2012, Χ 13:44, Patrik Nyblom ΞΑΠΙΣΑΜ(Α):</div>
<br class="Apple-interchange-newline" />
<blockquote type="cite">
<div text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">Hi!<br />
On 11/20/2012 10:40 PM, Denis Titoruk wrote:<br />
</div>
<blockquote cite="mid:79133563-669F-4FDD-8982-01DB7B321DA5@aboutecho.com" type="cite">
<base href="x-msg://13417/" />
<div>Hi,</div>
<div><br />
</div>
We've got the same error on R15B01, R15B02
<div>I've finished my investigation of this issue today & here is result:</div>
<div><br />
</div>
<div>Let's assume we have the code:</div>
<div>encode_formats(Columns) -><br />
encode_formats(Columns, 0, <<>>).<br />
<br />
encode_formats([], Count, Acc) -><br />
<<Count:?int16, Acc/binary>>;<br />
<br />
encode_formats([#column{format = Format} | T], Count, Acc) -><br />
encode_formats(T, Count + 1, <<Acc/binary, Format:?int16>>).<br />
</div>
<div><br />
</div>
<div>So, <<Acc/binary, Format:?int16>> translates to</div>
<div><br />
</div>
<div> {bs_append,{f,0},{integer,16},0,7,8,{x,2},{field_flags,[]},{x,1}}.<br />
{bs_put_integer,{f,0},{integer,16},1,{field_flags,[signed,big]},{x,6}}.<br />
</div>
<div><br />
</div>
<div>There is GC execution in bs_append and it can reallocate binary but there isn't reassigning erts_current_bin which used in bs_put_integer.</div>
<div><br />
</div>
<div>Fix:</div>
<div><br />
</div>
<div>erl_bits.c:<br />
Eterm<br />
erts_bs_append(Process* c_p, Eterm* reg, Uint live, Eterm build_size_term,<br />
Uint extra_words, Uint unit)<br />
…<br />
if (c_p->stop - c_p->htop < heap_need) {<br />
(void) erts_garbage_collect(c_p, heap_need, reg, live+1);<br />
}<br />
sb = (ErlSubBin *) c_p->htop;<br />
c_p->htop += ERL_SUB_BIN_SIZE;<br />
sb->thing_word = HEADER_SUB_BIN;<br />
sb->size = BYTE_OFFSET(used_size_in_bits);<br />
sb->bitsize = BIT_OFFSET(used_size_in_bits);<br />
sb->offs = 0;<br />
sb->bitoffs = 0;<br />
sb->is_writable = 1;<br />
sb->orig = reg[live];<br />
<br />
///////////////////////////////////////////////////////////////////</div>
<div>// add this lines</div>
<div>
<div>///////////////////////////////////////////////////////////////////</div>
</div>
<div> pb = (ProcBin *) boxed_val(sb->orig);</div>
<div> erts_current_bin = pb->bytes;<br />
erts_writable_bin = 1;<br />
///////////////////////////////////////////////////////////////////<br />
<br />
return make_binary(sb);<br />
…<br />
</div>
<div><br />
</div>
</blockquote>
Can you reproduce the bug and verify that this fix really works? The thing is that binaries should *only* be reallocated in the gc if there are no active writers, which there obviously is here ( pb->flags |= PB_ACTIVE_WRITER a few lines earlier), so the
bug would be in the detection of active writers in the gc if this code change actually removes the crash.<br />
</div>
</blockquote>
<div><br />
</div>
<div>Yes, it works in my case. I haven't simple test case for reproducing this bug (actually I run few processes to send requests to pgsql)</div>
<div><br />
</div>
<div>
<div> pb = (ProcBin *) boxed_val(sb->orig);</div>
<div> if (erts_current_bin != (pb->bytes)) {</div>
<div> fprintf(stderr, "erts_current_bin != (pb->bytes)\n");</div>
<div> fflush(stderr);</div>
<div> }</div>
<div> erts_current_bin = pb->bytes;</div>
<div> erts_writable_bin = 1;</div>
</div>
<div><br />
</div>
<div><br />
</div>
<div>
<div>(jskit@siden)1> f(F), F = fun() -> postgresql:equery('echo-customers', write, <<"some query here">>, []) end.</div>
<div>#Fun<erl_eval.20.82930912></div>
<div>(jskit@siden)2> perftest:comprehensive(1000, F).</div>
<div>Sequential 100 cycles in ~1 seconds (100 cycles/s)</div>
<div>Sequential 200 cycles in ~2 seconds (106 cycles/s)</div>
<div>Sequential 1000 cycles in ~12 seconds (85 cycles/s)</div>
<div>Parallel 2 1000 cycles in ~8 seconds (132 cycles/s)</div>
<div>Parallel 4 1000 cycles in ~8 seconds (121 cycles/s)</div>
<div>Parallel 10 1000 cycles in ~8 seconds (119 cycles/s)</div>
<div>Parallel 100 1000 cycles in ~13 seconds (74 cycles/s)</div>
<div>[85,132,121,119,74]</div>
<div>(jskit@siden)3> perftest:comprehensive(1000, F). </div>
<div>Sequential 100 cycles in ~1 seconds (83 cycles/s) </div>
<div>Sequential 200 cycles in ~2 seconds (83 cycles/s) </div>
<div>Sequential 1000 cycles in ~14 seconds (71 cycles/s) </div>
<div>Parallel 2 1000 cycles in ~11 seconds (95 cycles/s) </div>
<div>Parallel 4 1000 cycles in ~10 seconds (105 cycles/s) </div>
<div>Parallel 10 1000 cycles in ~11 seconds (91 cycles/s)</div>
<div>Parallel 100 1000 cycles in ~13 seconds (76 cycles/s)</div>
<div>"G_i[L"</div>
<div>(jskit@siden)4> perftest:comprehensive(1000, F).</div>
<div>Sequential 100 cycles in ~1 seconds (88 cycles/s)</div>
<div>Sequential 200 cycles in ~2 seconds (85 cycles/s)</div>
<div>Sequential 1000 cycles in ~13 seconds (74 cycles/s)</div>
<div>Parallel 2 1000 cycles in ~9 seconds (109 cycles/s)</div>
<div>Parallel 4 1000 cycles in ~10 seconds (101 cycles/s)</div>
<div>Parallel 10 1000 cycles in ~11 seconds (95 cycles/s)</div>
<div>erts_current_bin != (pb->bytes)</div>
<div>Parallel 100 1000 cycles in ~13 seconds (77 cycles/s)</div>
<div>"Jme_M"</div>
</div>
<br />
<blockquote type="cite">
<div text="#000000" bgcolor="#FFFFFF"><br />
<blockquote cite="mid:79133563-669F-4FDD-8982-01DB7B321DA5@aboutecho.com" type="cite">
<div><br />
</div>
<div>--</div>
<div>Cheers,</div>
<div>Denis</div>
</blockquote>
Cheers,<br />
/Patrik<br />
<blockquote cite="mid:79133563-669F-4FDD-8982-01DB7B321DA5@aboutecho.com" type="cite">
<div><br />
<div>
<div>20.11.2012, Χ 19:37, Musumeci, Antonio S ΞΑΠΙΣΑΜ(Α):</div>
<br class="Apple-interchange-newline" />
<blockquote type="cite">
<div text="#000000" bgcolor="#ffffff">
<div><br class="webkit-block-placeholder" />
</div>
<div dir="ltr" align="left"><font face="Arial" color="#0000ff" size="2">
<p align="left"><font face="Arial" color="#0000ff" size="2"><font face="Arial" color="#0000ff" size="2"><font face="Arial" color="#0000ff" size="2">I've got lots of cores... but they are all from optimized builds.</font></font></font></p>
<font face="Arial" color="#0000ff" size="2"><font face="Arial" color="#0000ff" size="2"><font face="Arial" color="#0000ff" size="2">
<p dir="ltr" align="left">Has this been seen in other versions? We are keen to solve this because it's causing us pain in production. We hit another, older, memory bug (the 32bit values used in 64bit build)... and now this.</p>
</font></font></font>
<p dir="ltr" align="left"><font face="Arial" color="#0000ff" size="2"><font face="Arial" color="#0000ff" size="2"><font face="Arial" color="#0000ff" size="2">I'm going to be building and trying R15B01 to see if we hit it as well. I'll send any additional information
I can.</font></font></font><font face="Times New Roman" color="#000000" size="3"> <span class="403263615-20112012"><font face="Arial" color="#0000ff" size="2">Any suggestions on debugging beam would be appreciated. Compile options, etc.</font></span></font></p>
<p dir="ltr" align="left">Thanks.</p>
</font>
<p dir="ltr" align="left"><font face="Arial" color="#0000ff" size="2"><font face="Arial" color="#0000ff" size="2"><font face="Arial" color="#0000ff" size="2"><font face="Arial" color="#0000ff" size="2">-antonio</font></font></font></font><br />
</p>
</div>
<div class="OutlookMessageHeader" lang="en-us" dir="ltr" align="left">
<hr tabindex="-1" />
<font face="Tahoma" size="2"><b>From:</b><span class="Apple-converted-space"> </span><a href="mailto:erlang-bugs-bounces@erlang.org" moz-do-not-send="true">erlang-bugs-bounces@erlang.org</a><span class="Apple-converted-space"> </span>[<a class="moz-txt-link-freetext" href="mailto:erlang-bugs-bounces@erlang.org" moz-do-not-send="true">mailto:erlang-bugs-bounces@erlang.org</a>]<span class="Apple-converted-space"> </span><b>On
Behalf Of<span class="Apple-converted-space"> </span></b>Patrik Nyblom<br />
<b>Sent:</b><span class="Apple-converted-space"> </span>Monday, November 19, 2012 8:55 AM<br />
<b>To:</b><span class="Apple-converted-space"> </span><a href="mailto:erlang-bugs@erlang.org" moz-do-not-send="true">erlang-bugs@erlang.org</a><br />
<b>Subject:</b><span class="Apple-converted-space"> </span>Re: [erlang-bugs] beam core'ing<br />
</font><br />
</div>
<div class="moz-cite-prefix">On 11/19/2012 02:01 PM, Musumeci, Antonio S wrote:<br />
</div>
<blockquote cite="mid:51C6F20DC46369418387C5250127649B039D96@HZWEX2014N4.msad.ms.com" type="cite">
<div><br class="webkit-block-placeholder" />
</div>
<div><span lang="EN">
<p><span class="483463912-19112012"><font face="Arial" size="2">I'm just starting to debug this but figured I'd send it along in case anyone has seen this before.</font></span></p>
<p><span class="483463912-19112012"><font face="Arial" size="2">64bit RHEL 5.0.1</font></span></p>
<p><span class="483463912-19112012"><font face="Arial" size="2">built from source beam.smp R15B02</font></span></p>
<p><span class="483463912-19112012"><font face="Arial" size="2">Happens consistently when trying to start our app and then just stops after a time. Across a few boxes. Oddly we have an identical cluster (hw and sw) and it never happens.</font></span></p>
</span></div>
</blockquote>
<font size="2"><font face="Arial">Yes! I've seen it before and have tried for several months to get a<font size="2"><span class="Apple-converted-space"> </span>reproducable example and a<font size="2"><span class="Apple-converted-space"> </span></font>core
i can analyze here. I've had one core that was<font size="2"><span class="Apple-converted-space"> </span>somewhat readable but had no luck in locating the beam code that triggered this. If you could try narrowing it down, I would be really grateful!<br />
<br />
<font size="2">Please email me any findings, theories, cores dumps<font size="2"><span class="Apple-converted-space"> </span>- anything! I really want to find this! The most interesting would be to find the snippet of erlang code that makes this happen (intermittently
probably).<br />
<br />
<font size="2">The problem is<span class="Apple-converted-space"> </span><font size="2">that<span class="Apple-converted-space"> </span><font size="2">when the allocators crash, the error is usually somewhere else<font size="2">.</font><span class="Apple-converted-space"> </span><font size="2">A</font>ccess
of freed memory, double free or something else doing horrid things to memory. Ob<font size="2">viously none of our testsui<font size="2">tes e<font size="2">xercise this bug as<span class="Apple-converted-space"> </span><font size="2">neither our debug builds,
nor our valgrind runs find it. It happens on both SMP and non SMP and is always in the context of the er<font size="2">ts</font>_bs_append</font></font></font></font></font></font></font></font></font></font></font></font></font>, so I'm pretty sure this has
a connection to the other users seeing the crash in the allocat<font size="2">ors<font size="2">...</font></font><span class="Apple-converted-space"> </span><br />
<br />
Cheers,<br />
Patrik<br />
<blockquote cite="mid:51C6F20DC46369418387C5250127649B039D96@HZWEX2014N4.msad.ms.com" type="cite">
<div><span lang="EN">
<p>#0 bf_unlink_free_block (flags=<optimized out>, block=0x6f00, allctr=<optimized out>) at beam/erl_bestfit_alloc.c:789<br />
#1 bf_get_free_block (allctr=0x6824600, size=304, cand_blk=0x0, cand_size=<optimized out>, flags=0) at beam/erl_bestfit_alloc.c:869<br />
#2 0x000000000045343c in mbc_alloc_block (alcu_flgsp=<optimized out>, blk_szp=<optimized out>, size=<optimized out>, allctr=<optimized out>) at beam/erl_alloc_util.c:1198<br />
#3 mbc_alloc (allctr=0x6824600, size=295) at beam/erl_alloc_util.c:1345<br />
#4 0x000000000045398d in do_erts_alcu_alloc (type=164, extra=0x6824600, size=295) at beam/erl_alloc_util.c:3442<br />
#5 0x0000000000453a0f in erts_alcu_alloc_thr_pref (type=164, extra=<optimized out>, size=287) at beam/erl_alloc_util.c:3520<br />
#6 0x0000000000511463 in erts_alloc (size=287, type=<optimized out>) at beam/erl_alloc.h:208<br />
#7 erts_bin_nrml_alloc (size=<optimized out>) at beam/erl_binary.h:260<br />
#8 erts_bs_append (c_p=0x69fba60, reg=<optimized out>, live=<optimized out>, build_size_term=<optimized out>, extra_words=0, unit=8)<span class="483463912-19112012"><span class="Apple-converted-space"> </span></span>at beam/erl_bits.c:1327<br />
#9 0x000000000053ffd8 in process_main () at beam/beam_emu.c:3858<span class="Apple-converted-space"> </span><br />
#10 0x00000000004ae853 in sched_thread_func (vesdp=<optimized out>) at beam/erl_process.c:5184<span class="483463912-19112012"><span class="Apple-converted-space"> </span><br />
</span>#11 0x00000000005c17e9 in thr_wrapper (vtwd=<optimized out>) at pthread/ethread.c:106<span class="483463912-19112012"><span class="Apple-converted-space"> </span><br />
</span>#12 0x00002b430f39e73d in start_thread () from /lib64/libpthread.so.0<span class="483463912-19112012"><span class="Apple-converted-space"> </span><br />
</span>#13 0x00002b430f890f6d in clone () from /lib64/libc.so.6<span class="483463912-19112012"><span class="Apple-converted-space"> </span><br />
</span>#14 0x0000000000000000 in ?? ()</p>
</span></div>
<br />
<br />
<hr id="HR1" />
<br />
<br />
<fieldset class="mimeAttachmentHeader"></fieldset> <br />
<pre wrap="">_______________________________________________
erlang-bugs mailing list
<a class="moz-txt-link-abbreviated" href="mailto:erlang-bugs@erlang.org" moz-do-not-send="true">erlang-bugs@erlang.org</a>
<a class="moz-txt-link-freetext" href="http://erlang.org/mailman/listinfo/erlang-bugs" moz-do-not-send="true">http://erlang.org/mailman/listinfo/erlang-bugs</a>
</pre>
</blockquote>
<br />
<br />
<br />
<hr id="HR1" />
<br />
<div><br class="webkit-block-placeholder" />
</div>
<div><br class="webkit-block-placeholder" />
</div>
<div><br class="webkit-block-placeholder" />
</div>
_______________________________________________<br />
erlang-bugs mailing list<br />
<a href="mailto:erlang-bugs@erlang.org" moz-do-not-send="true">erlang-bugs@erlang.org</a><br />
<a class="moz-txt-link-freetext" href="http://erlang.org/mailman/listinfo/erlang-bugs" moz-do-not-send="true">http://erlang.org/mailman/listinfo/erlang-bugs</a></div>
</blockquote>
</div>
<br />
</div>
</blockquote>
<br />
</div>
</blockquote>
</div>
<br />
</blockquote>
<br />
<BR /><BR />
<HR id=HR1 />
<P></P>
<P></P></P></BODY>
</HTML>