[erlang-bugs] beam core'ing

Patrik Nyblom pan@REDACTED
Wed Nov 21 10:44:26 CET 2012


Hi!
On 11/20/2012 10:40 PM, Denis Titoruk wrote:
> Hi,
>
> We've got the same error on R15B01, R15B02
> I've finished my investigation of this issue today & here is result:
>
> Let's assume we have the code:
> encode_formats(Columns) ->
>    encode_formats(Columns, 0, <<>>).
>
> encode_formats([], Count, Acc) ->
>    <<Count:?int16, Acc/binary>>;
>
> encode_formats([#column{format = Format} | T], Count, Acc) ->
>    encode_formats(T, Count + 1, <<Acc/binary, Format:?int16>>).
>
> So, <<Acc/binary, Format:?int16>> translates to
>
>  {bs_append,{f,0},{integer,16},0,7,8,{x,2},{field_flags,[]},{x,1}}.
>    {bs_put_integer,{f,0},{integer,16},1,{field_flags,[signed,big]},{x,6}}.
>
> There is GC execution in bs_append and it can reallocate binary but 
> there isn't reassigning erts_current_bin which used in bs_put_integer.
>
> Fix:
>
> erl_bits.c:
> Eterm
> erts_bs_append(Process* c_p, Eterm* reg, Uint live, Eterm build_size_term,
>        Uint extra_words, Uint unit)
>>    if (c_p->stop - c_p->htop < heap_need) {
>        (void) erts_garbage_collect(c_p, heap_need, reg, live+1);
>    }
>    sb = (ErlSubBin *) c_p->htop;
>    c_p->htop += ERL_SUB_BIN_SIZE;
>    sb->thing_word = HEADER_SUB_BIN;
>    sb->size = BYTE_OFFSET(used_size_in_bits);
>    sb->bitsize = BIT_OFFSET(used_size_in_bits);
>    sb->offs = 0;
>    sb->bitoffs = 0;
>    sb->is_writable = 1;
>    sb->orig = reg[live];
>
> ///////////////////////////////////////////////////////////////////
> // add this lines
> ///////////////////////////////////////////////////////////////////
>    pb = (ProcBin *) boxed_val(sb->orig);
>    erts_current_bin = pb->bytes;
>    erts_writable_bin = 1;
> ///////////////////////////////////////////////////////////////////
>
>    return make_binary(sb);
>>
Can you reproduce the bug and verify that this fix really works? The 
thing is that binaries should *only* be reallocated in the gc if there 
are no active writers, which there obviously is here ( pb->flags |= 
PB_ACTIVE_WRITER  a few lines earlier), so the bug would be in the 
detection of active writers in the gc if this code change actually 
removes the crash.

>
> --
> Cheers,
> Denis
Cheers,
/Patrik
>
> 20.11.2012, в 19:37, Musumeci, Antonio S написал(а):
>
>>
>> I've got lots of cores... but they are all from optimized builds.
>>
>> Has this been seen in other versions? We are keen to solve this 
>> because it's causing us pain in production. We hit another, older, 
>> memory bug (the 32bit values used in 64bit build)... and now this.
>>
>> I'm going to be building and trying R15B01 to see if we hit it as 
>> well. I'll send any additional information I can.Any suggestions on 
>> debugging beam would be appreciated. Compile options, etc.
>>
>> Thanks.
>>
>> -antonio
>>
>> ------------------------------------------------------------------------
>> *From:*erlang-bugs-bounces@REDACTED 
>> <mailto:erlang-bugs-bounces@REDACTED>[mailto:erlang-bugs-bounces@REDACTED]*On 
>> Behalf Of*Patrik Nyblom
>> *Sent:*Monday, November 19, 2012 8:55 AM
>> *To:*erlang-bugs@REDACTED <mailto:erlang-bugs@REDACTED>
>> *Subject:*Re: [erlang-bugs] beam core'ing
>>
>> On 11/19/2012 02:01 PM, Musumeci, Antonio S wrote:
>>>
>>> I'm just starting to debug this but figured I'd send it along in 
>>> case anyone has seen this before.
>>>
>>> 64bit RHEL 5.0.1
>>>
>>> built from source beam.smp R15B02
>>>
>>> Happens consistently when trying to start our app and then just 
>>> stops after a time. Across a few boxes. Oddly we have an identical 
>>> cluster (hw and sw) and it never happens.
>>>
>> Yes! I've seen it before and have tried for several months to get 
>> areproducable example and acore i can analyze here. I've had one core 
>> that wassomewhat readable but had no luck in locating the beam code 
>> that triggered this. If you could try narrowing it down, I would be 
>> really grateful!
>>
>> Please email me any findings, theories, cores dumps- anything! I 
>> really want to find this! The most interesting would be to find the 
>> snippet of erlang code that makes this happen (intermittently probably).
>>
>> The problem isthatwhen the allocators crash, the error is usually 
>> somewhere else.Access of freed memory, double free or something else 
>> doing horrid things to memory. Obviously none of our testsuites 
>> exercise this bug asneither our debug builds, nor our valgrind runs 
>> find it. It happens on both SMP and non SMP and is always in the 
>> context of the erts_bs_append, so I'm pretty sure this has a 
>> connection to the other users seeing the crash in the allocators...
>>
>> Cheers,
>> Patrik
>>>
>>> #0 bf_unlink_free_block (flags=<optimized out>, block=0x6f00, 
>>> allctr=<optimized out>) at beam/erl_bestfit_alloc.c:789
>>> #1 bf_get_free_block (allctr=0x6824600, size=304, cand_blk=0x0, 
>>> cand_size=<optimized out>, flags=0) at beam/erl_bestfit_alloc.c:869
>>> #2 0x000000000045343c in mbc_alloc_block (alcu_flgsp=<optimized 
>>> out>, blk_szp=<optimized out>, size=<optimized out>, 
>>> allctr=<optimized out>) at beam/erl_alloc_util.c:1198
>>> #3 mbc_alloc (allctr=0x6824600, size=295) at beam/erl_alloc_util.c:1345
>>> #4 0x000000000045398d in do_erts_alcu_alloc (type=164, 
>>> extra=0x6824600, size=295) at beam/erl_alloc_util.c:3442
>>> #5 0x0000000000453a0f in erts_alcu_alloc_thr_pref (type=164, 
>>> extra=<optimized out>, size=287) at beam/erl_alloc_util.c:3520
>>> #6 0x0000000000511463 in erts_alloc (size=287, type=<optimized out>) 
>>> at beam/erl_alloc.h:208
>>> #7 erts_bin_nrml_alloc (size=<optimized out>) at beam/erl_binary.h:260
>>> #8 erts_bs_append (c_p=0x69fba60, reg=<optimized out>, 
>>> live=<optimized out>, build_size_term=<optimized out>, 
>>> extra_words=0, unit=8)at beam/erl_bits.c:1327
>>> #9 0x000000000053ffd8 in process_main () at beam/beam_emu.c:3858
>>> #10 0x00000000004ae853 in sched_thread_func (vesdp=<optimized out>) 
>>> at beam/erl_process.c:5184
>>> #11 0x00000000005c17e9 in thr_wrapper (vtwd=<optimized out>) at 
>>> pthread/ethread.c:106
>>> #12 0x00002b430f39e73d in start_thread () from /lib64/libpthread.so.0
>>> #13 0x00002b430f890f6d in clone () from /lib64/libc.so.6
>>> #14 0x0000000000000000 in ?? ()
>>>
>>>
>>> _______________________________________________
>>> erlang-bugs mailing list
>>> erlang-bugs@REDACTED
>>> http://erlang.org/mailman/listinfo/erlang-bugs
>> _______________________________________________
>> erlang-bugs mailing list
>> erlang-bugs@REDACTED <mailto:erlang-bugs@REDACTED>
>> http://erlang.org/mailman/listinfo/erlang-bugs
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20121121/f2cb189a/attachment.htm>


More information about the erlang-bugs mailing list