I knew it! =)<div><br><div>Ever since I first saw that gc in bs_append felt it was trouble.</div><div><br></div><div>I will get someone, probably me, to look over this fix tomorrow.</div><div><br></div><div>// Björn-Egil<br>
<br><div class="gmail_quote">2012/11/20 Denis Titoruk <span dir="ltr"><<a href="mailto:sidentdv@gmail.com" target="_blank">sidentdv@gmail.com</a>></span><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="word-wrap:break-word"><div>Hi,</div><div><br></div>We've got the same error on R15B01, R15B02<div>I've finished my investigation of this issue today & here is result:</div><div><br></div><div>Let's assume we have the code:</div>
<div>encode_formats(Columns) -><br> encode_formats(Columns, 0, <<>>).<br><br>encode_formats([], Count, Acc) -><br> <<Count:?int16, Acc/binary>>;<br><br>encode_formats([#column{format = Format} | T], Count, Acc) -><br>
encode_formats(T, Count + 1, <<Acc/binary, Format:?int16>>).<br></div><div><br></div><div>So, <<Acc/binary, Format:?int16>> translates to</div><div><br></div><div> {bs_append,{f,0},{integer,16},0,7,8,{x,2},{field_flags,[]},{x,1}}.<br>
{bs_put_integer,{f,0},{integer,16},1,{field_flags,[signed,big]},{x,6}}.<br></div><div><br></div><div>There is GC execution in bs_append and it can reallocate binary but there isn't reassigning erts_current_bin which used in bs_put_integer.</div>
<div><br></div><div>Fix:</div><div><br></div><div>erl_bits.c:<br>Eterm<br>erts_bs_append(Process* c_p, Eterm* reg, Uint live, Eterm build_size_term,<br> Uint extra_words, Uint unit)<br>…<br> if (c_p->stop - c_p->htop < heap_need) {<br>
(void) erts_garbage_collect(c_p, heap_need, reg, live+1);<br> }<br> sb = (ErlSubBin *) c_p->htop;<br> c_p->htop += ERL_SUB_BIN_SIZE;<br> sb->thing_word = HEADER_SUB_BIN;<br> sb->size = BYTE_OFFSET(used_size_in_bits);<br>
sb->bitsize = BIT_OFFSET(used_size_in_bits);<br> sb->offs = 0;<br> sb->bitoffs = 0;<br> sb->is_writable = 1;<br> sb->orig = reg[live];<br><br>///////////////////////////////////////////////////////////////////</div>
<div>// add this lines</div><div><div>///////////////////////////////////////////////////////////////////</div></div><div> pb = (ProcBin *) boxed_val(sb->orig);</div><div> erts_current_bin = pb->bytes;<br> erts_writable_bin = 1;<br>
///////////////////////////////////////////////////////////////////<br><br> return make_binary(sb);<br>…<br></div><div><br></div><div><br></div><div>--</div><div>Cheers,</div><div>Denis</div><div><br><div><div>20.11.2012, в 19:37, Musumeci, Antonio S написал(а):</div>
<div><div class="h5"><br><blockquote type="cite"><span style="border-collapse:separate;font-family:Helvetica;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:-webkit-auto;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;font-size:medium"><div text="#000000" bgcolor="#ffffff">
<div><br></div><div dir="ltr" align="left"><font face="Arial" color="#0000ff"><font face="Arial" color="#0000ff"><font face="Arial" color="#0000ff"><font face="Arial" color="#0000ff"></font></font></font><p align="left"><font face="Arial" color="#0000ff"><font face="Arial" color="#0000ff"><font face="Arial" color="#0000ff">I've got lots of cores... but they are all from optimized builds.</font></font></font><font face="Times New Roman" color="#000000" size="3"></font></p>
<font face="Arial" color="#0000ff"><font face="Arial" color="#0000ff"><font face="Arial" color="#0000ff"><p dir="ltr" align="left">Has this been seen in other versions? We are keen to solve this because it's causing us pain in production. We hit another, older, memory bug (the 32bit values used in 64bit build)... and now this.</p>
</font></font></font><p dir="ltr" align="left"><font face="Arial" color="#0000ff"><font face="Arial" color="#0000ff"><font face="Arial" color="#0000ff">I'm going to be building and trying R15B01 to see if we hit it as well. I'll send any additional information I can.</font></font></font><font face="Times New Roman" color="#000000" size="3"> <span><font face="Arial" color="#0000ff">Any suggestions on debugging beam would be appreciated. Compile options, etc.</font></span></font></p>
<p dir="ltr" align="left">Thanks.<font face="Times New Roman" color="#000000" size="3"></font></p><font face="Arial" color="#0000ff"><font face="Arial" color="#0000ff"><font face="Arial" color="#0000ff"></font></font></font></font><p dir="ltr" align="left">
<font face="Arial" color="#0000ff"><font face="Arial" color="#0000ff"><font face="Arial" color="#0000ff"><font face="Arial" color="#0000ff">-antonio</font></font></font></font><br></p></div><div lang="en-us" dir="ltr" align="left">
<hr><font face="Tahoma"><b>From:</b><span> </span><a href="mailto:erlang-bugs-bounces@erlang.org" target="_blank">erlang-bugs-bounces@erlang.org</a><span> </span>[mailto:<a href="mailto:erlang-bugs-bounces@erlang.org" target="_blank">erlang-bugs-bounces@erlang.org</a>]<span> </span><b>On Behalf Of<span> </span></b>Patrik Nyblom<br>
<b>Sent:</b><span> </span>Monday, November 19, 2012 8:55 AM<br><b>To:</b><span> </span><a href="mailto:erlang-bugs@erlang.org" target="_blank">erlang-bugs@erlang.org</a><br><b>Subject:</b><span> </span>Re: [erlang-bugs] beam core'ing<br>
</font><br></div><div></div><div>On 11/19/2012 02:01 PM, Musumeci, Antonio S wrote:<br></div><blockquote type="cite"><div><br></div><div><span lang="EN"><p><span><font face="Arial">I'm just starting to debug this but figured I'd send it along in case anyone has seen this before.</font></span></p>
<p><span><font face="Arial">64bit RHEL 5.0.1</font></span></p><p><span><font face="Arial">built from source beam.smp R15B02</font></span></p><p><span><font face="Arial">Happens consistently when trying to start our app and then just stops after a time. Across a few boxes. Oddly we have an identical cluster (hw and sw) and it never happens.</font></span></p>
</span></div></blockquote><font><font face="Arial">Yes! I've seen it before and have tried for several months to get a<font><span> </span>reproducable example and a<font><span> </span></font>core i can analyze here. I've had one core that was<font><span> </span>somewhat readable but had no luck in locating the beam code that triggered this. If you could try narrowing it down, I would be really grateful!<br>
<br><font>Please email me any findings, theories, cores dumps<font><span> </span>- anything! I really want to find this! The most interesting would be to find the snippet of erlang code that makes this happen (intermittently probably).<br>
<br><font>The problem is<span> </span><font>that<span> </span><font>when the allocators crash, the error is usually somewhere else<font>.</font><span> </span><font>A</font>ccess of freed memory, double free or something else doing horrid things to memory. Ob<font>viously none of our testsui<font>tes e<font>xercise this bug as<span> </span><font>neither our debug builds, nor our valgrind runs find it. It happens on both SMP and non SMP and is always in the context of the er<font>ts</font>_bs_append</font></font></font></font></font></font></font></font></font></font></font></font></font>, so I'm pretty sure this has a connection to the other users seeing the crash in the allocat<font>ors<font>...</font></font><span> </span><br>
<br>Cheers,<br>Patrik<br><blockquote type="cite"><div><span lang="EN"><p>#0 bf_unlink_free_block (flags=<optimized out>, block=0x6f00, allctr=<optimized out>) at beam/erl_bestfit_alloc.c:789<br>#1 bf_get_free_block (allctr=0x6824600, size=304, cand_blk=0x0, cand_size=<optimized out>, flags=0) at beam/erl_bestfit_alloc.c:869<br>
#2 0x000000000045343c in mbc_alloc_block (alcu_flgsp=<optimized out>, blk_szp=<optimized out>, size=<optimized out>, allctr=<optimized out>) at beam/erl_alloc_util.c:1198<br>#3 mbc_alloc (allctr=0x6824600, size=295) at beam/erl_alloc_util.c:1345<br>
#4 0x000000000045398d in do_erts_alcu_alloc (type=164, extra=0x6824600, size=295) at beam/erl_alloc_util.c:3442<br>#5 0x0000000000453a0f in erts_alcu_alloc_thr_pref (type=164, extra=<optimized out>, size=287) at beam/erl_alloc_util.c:3520<br>
#6 0x0000000000511463 in erts_alloc (size=287, type=<optimized out>) at beam/erl_alloc.h:208<br>#7 erts_bin_nrml_alloc (size=<optimized out>) at beam/erl_binary.h:260<br>#8 erts_bs_append (c_p=0x69fba60, reg=<optimized out>, live=<optimized out>, build_size_term=<optimized out>, extra_words=0, unit=8)<span><span> </span></span>at beam/erl_bits.c:1327<br>
#9 0x000000000053ffd8 in process_main () at beam/beam_emu.c:3858<span> </span><br>#10 0x00000000004ae853 in sched_thread_func (vesdp=<optimized out>) at beam/erl_process.c:5184<span><span> </span><br></span>#11 0x00000000005c17e9 in thr_wrapper (vtwd=<optimized out>) at pthread/ethread.c:106<span><span> </span><br>
</span>#12 0x00002b430f39e73d in start_thread () from /lib64/libpthread.so.0<span><span> </span><br></span>#13 0x00002b430f890f6d in clone () from /lib64/libc.so.6<span><span> </span><br></span>#14 0x0000000000000000 in ?? ()</p>
</span></div><br><br><hr><br>
<br><fieldset></fieldset><br><pre>_______________________________________________
erlang-bugs mailing list
<a href="mailto:erlang-bugs@erlang.org" target="_blank">erlang-bugs@erlang.org</a>
<a href="http://erlang.org/mailman/listinfo/erlang-bugs" target="_blank">http://erlang.org/mailman/listinfo/erlang-bugs</a>
</pre></blockquote><br><br><br><hr><br>
<div><br></div><div><br></div><div><br></div>_______________________________________________<br>erlang-bugs mailing list<br><a href="mailto:erlang-bugs@erlang.org" target="_blank">erlang-bugs@erlang.org</a><br><a href="http://erlang.org/mailman/listinfo/erlang-bugs" target="_blank">http://erlang.org/mailman/listinfo/erlang-bugs</a></div>
</span></blockquote></div></div></div><br></div></div><br>_______________________________________________<br>
erlang-bugs mailing list<br>
<a href="mailto:erlang-bugs@erlang.org">erlang-bugs@erlang.org</a><br>
<a href="http://erlang.org/mailman/listinfo/erlang-bugs" target="_blank">http://erlang.org/mailman/listinfo/erlang-bugs</a><br>
<br></blockquote></div><br></div></div>