<div dir="ltr">since OTP 20 the <span class="gmail-code" style=""><b>-emu_type</b> flag might also work eg.:</span><div><span class="gmail-code" style="">  erl -emu_type debug</span></div><div><span class="gmail-code" style=""><br></span></div><div><span class="gmail-code" style="">and you can put it in the vm.args file too</span></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, May 29, 2018 at 2:45 PM, Lukas Larsson <span dir="ltr"><<a href="mailto:lukas@erlang.org" target="_blank">lukas@erlang.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">I don't know how to make rebar3 run the debug emulator, but a quick and dirty trick that I do when all else fails is to copy the beam.debug.smp file over the beam.smp file.<div><br></div><div>You probably also have to copy the erl_child_setup.debug file, that file should however have the .debug suffix remaining. So:</div><div><br></div><div>cp bin/`erts/autoconf/config.<wbr>guess`/beam.debug.smp path/to/release/erts-v.s.n/<wbr>bin/beam.smp</div><div>cp bin/`erts/autoconf/config.<wbr>guess`/erl_child_setup.debug path/to/release/erts-v.s.n/<wbr>bin/<div><div class="h5"><br><div class="gmail_extra"><br><div class="gmail_quote">On Tue, May 29, 2018 at 1:30 PM, Igor Clark <span dir="ltr"><<a href="mailto:igor.clark@gmail.com" target="_blank">igor.clark@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
  
    
  
  <div bgcolor="#FFFFFF">
    Thanks very much Lukas, I think the debug emulator could be what I'm
    looking for. The NIF only sometimes crashes on lists:member/2 -
    those log lines are all from different crashes (there's only one
    crashed thread each time), and sometimes it just crashes on
    process_main. So I think I might need the debug emulator to trace
    further.<br>
    <br>
    However I have a lot to learn about how to integrate C tooling with
    something so complex. When I run the debug emulator, does it just
    show more detailed info in stack traces, or will I need to attach
    gdb/lldb etc to find out what's going on? Is there any more info on
    how to set this all up?<br>
    <br>
    Also, not 100% sure how to run it, as I run my app with "rebar3
    shell" from a release layout during development, or the same inside
    the NIF-specific app when trying to track problems down there. The
    doc you linked says:<br>
    <br>
    <blockquote type="cite">
      <p>To start the debug enabled runtime
        system execute:</p>
      <pre style="box-sizing:border-box;font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;font-size:13.6px;margin-top:0px;margin-bottom:16px;word-wrap:normal;padding:16px;overflow:auto;line-height:1.45;background-color:rgb(246,248,250);border-radius:3px;color:rgb(36,41,46);font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;word-spacing:0px;text-decoration:none"><code style="box-sizing:border-box;font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;font-size:13.6px;padding:0px;margin:0px;background-color:transparent;border-radius:3px;word-break:normal;white-space:pre-wrap;border:0px;display:inline;overflow:visible;line-height:inherit;word-wrap:normal">$ $ERL_TOP/bin/cerl -debug</code></pre>
    </blockquote>
    <br>
    I realise these are more rebar3 than erlang questions, but I can't
    find much in the rebar3 docs about them:<br>
    <br>
    - How should I specify that rebar3 should run "cerl" instead of
    "erl" ?<br>
    <br>
    - Should I just add "-debug" in my "config/vm.args" or is there
    another way to do this?<br>
    <br>
    Thank you for your help!<span class="m_-8160118314680844088gmail-HOEnZb"><font color="#888888"><br>
    i</font></span><div><div class="m_-8160118314680844088gmail-h5"><br>
    <br>
    <div class="m_-8160118314680844088gmail-m_-599898042086351337moz-cite-prefix">On 29/05/2018 11:30, Lukas Larsson
      wrote:<br>
    </div>
    <blockquote type="cite">
      <div dir="ltr">Have you tried to run your code in a debug
        emulator? <a href="https://github.com/erlang/otp/blob/master/HOWTO/INSTALL.md#how-to-build-a-debug-enabled-erlang-runtime-system" target="_blank">https://github.com/e<wbr>rlang/otp/blob/master/HOWTO/IN<wbr>STALL.md#how-to-build-a-debug-<wbr>enabled-erlang-runtime-system</a>
        <div><br>
        </div>
        <div>Since it seems to be segfaulting in lists:member/2, I would
          guess that your nif somehow builds an invalid list that later
          is used by lists:member/2.</div>
      </div>
      <div class="gmail_extra"><br>
        <div class="gmail_quote">On Tue, May 29, 2018 at 11:04 AM, Igor
          Clark <span dir="ltr"><<a href="mailto:igor.clark@gmail.com" target="_blank">igor.clark@gmail.com</a>></span>
          wrote:<br>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Thanks
            Sergej - that's where I got the thread reports I pasted in
            below, from e.g. 'beam.smp_2018-05-28-212735_Ig<wbr>or-Clarks-iMac.crash'.<br>
            <br>
            Each log says the only crashed thread was a scheduler
            thread, for example "8_scheduler" running "process_main" in
            the case of the first one below. This is how I tracked down
            a bunch of errors in my own code, but the only ones that
            still happen are in the scheduler, according to the Console
            crash logs.<br>
            <br>
            The thing is, it seems really unlikely that a VM running my
            NIF code would just happen to be crashing in the scheduler
            rather than my code(!) - so that's what I'm trying to work
            out, how to find out what's actually going on, given that
            the log tells me the crashed thread is running
            "process_main" or 'lists_member_2'.<br>
            <br>
            Any suggestions welcome!<br>
            <br>
            Cheers,<br>
            Igor
            <div class="m_-8160118314680844088gmail-m_-599898042086351337HOEnZb">
              <div class="m_-8160118314680844088gmail-m_-599898042086351337h5"><br>
                <br>
                On 29/05/2018 04:16, Sergej Jurečko wrote:<br>
                <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                  On macOS there is a quick way to get a stack trace if
                  you compiled with debug symbols.<br>
                  Open /Applications/Utilities/Consol<wbr>e<br>
                  Go to: User Reports<br>
                  <br>
                  You will see beam.smp in there if it crashed. Click on
                  it and you get a report what every thread was calling
                  at the time of crash.<br>
                  <br>
                  <br>
                  Regards,<br>
                  Sergej<br>
                  <br>
                  <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                    On 28 May 2018, at 23:46, Igor Clark <<a href="mailto:igor.clark@gmail.com" target="_blank">igor.clark@gmail.com</a>>
                    wrote:<br>
                    <br>
                    Hi folks, hope all well,<br>
                    <br>
                    I have a NIF which very occasionally segfaults,
                    intermittently and apparently unpredictably,
                    bringing down the VM. I've spent a bunch of time
                    tracing allocation and dereferencing problems in my
                    NIF code, and I've got rid of what seems like 99%+
                    of the problems - but it still occasionally happens,
                    and I'm having trouble tracing further, because the
                    crash logs show the crashed threads as doing things
                    like these: (each one taken from a separate log
                    where it's the only crashed thread)<br>
                    <br>
                    <br>
                    <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                      Thread 40 Crashed:: 8_scheduler<br>
                      0   beam.smp                         
                      0x000000001c19980b process_main + 1570<br>
                      <br>
                      Thread 5 Crashed:: 3_scheduler<br>
                      0   beam.smp                         
                      0x000000001c01d80b process_main + 1570<br>
                      <br>
                      Thread 7 Crashed:: 5_scheduler<br>
                      0   beam.smp                         
                      0x000000001baff0b8 lists_member_2 + 63<br>
                      <br>
                      Thread 3 Crashed:: 1_scheduler<br>
                      0   beam.smp                         
                      0x000000001d4b780b process_main + 1570<br>
                      <br>
                      Thread 5 Crashed:: 3_scheduler<br>
                      0   beam.smp                         
                      0x000000001fcf280b process_main + 1570<br>
                      <br>
                      Thread 6 Crashed:: 4_scheduler<br>
                      0   beam.smp                         
                      0x000000001ae290b8 lists_member_2 + 63<br>
                    </blockquote>
                    <br>
                    I'm very confident that the problems are in my code,
                    not in the scheduler ;-) But without more detail, I
                    don't know how to trace where they're happening.
                    When they do, there are sometimes other threads
                    doing things in my code (maybe 20% of the time) -
                    but mostly not, and on the occasions when they are,
                    I've been unable to see what the problem might be on
                    the lines referenced.<br>
                    <br>
                    It seems like it's some kind of cross-thread data
                    access issue, but I don't know how to track it down.<br>
                    <br>
                    Some more context about what's going on. My NIF
                    load() function starts a thread which passes a
                    callback function to a library that talks to some
                    hardware, which calls the callback when it has a
                    message. It's a separate thread because the library
                    only calls back to the thread that initialized it;
                    when I ran it directly in NIF load(), it didn't call
                    back, but in the VM-managed thread, it works as
                    expected. The thread sits and waits for stuff to
                    happen, and callbacks come when they should.<br>
                    <br>
                    I use enif_thread_create/enif_thread<wbr>_opts_create
                    to start the thread, and use enif_alloc/enif_free
                    everywhere. I keep a static pointer in the NIF to a
                    couple of members of the state struct, as that seems
                    the only way to reference them in the callback
                    function. The struct is kept in NIF private data: I
                    pass **priv from load() to the thread_main function,
                    allocate the state struct using enif_alloc in
                    thread_main, and set priv pointing to the state
                    struct, also in the thread. Other NIF functions do
                    access members of the state struct, but only ever
                    through enif_priv_data( env ).<br>
                    <br>
                    The vast majority of the time it all works
                    perfectly, humming along very nicely, but every now
                    and then, without any real pattern I can see, it
                    just segfaults and the VM comes down. It's only
                    happened 3 times in the last 20+ hours of working on
                    the app, testing & running all the while, doing
                    VM starts, stops, code reloads, etc. But when it
                    happens, it's kind of a showstopper, and I'd really
                    like to nail it down.<br>
                    <br>
                    This is all happening in Erlang 20.3.4 on MacOS
                    10.12.6 / Apple LLVM version 9.0.0 (clang-900.0.38).<br>
                    <br>
                    Any ideas on how/where to look next to try to track
                    this down? Hope it's not something structural in the
                    above which just won't work.<br>
                    <br>
                    Cheers,<br>
                    Igor<br>
                    <br>
                    <br>
                    ______________________________<wbr>_________________<br>
                    erlang-questions mailing list<br>
                    <a href="mailto:erlang-questions@erlang.org" target="_blank">erlang-questions@erlang.org</a><br>
                    <a href="http://erlang.org/mailman/listinfo/erlang-questions" rel="noreferrer" target="_blank">http://erlang.org/mailman/list<wbr>info/erlang-questions</a><br>
                  </blockquote>
                </blockquote>
                <br>
                ______________________________<wbr>_________________<br>
                erlang-questions mailing list<br>
                <a href="mailto:erlang-questions@erlang.org" target="_blank">erlang-questions@erlang.org</a><br>
                <a href="http://erlang.org/mailman/listinfo/erlang-questions" rel="noreferrer" target="_blank">http://erlang.org/mailman/list<wbr>info/erlang-questions</a><br>
              </div>
            </div>
          </blockquote>
        </div>
        <br>
      </div>
    </blockquote>
    <br>
  </div></div></div>

<br>______________________________<wbr>_________________<br>
erlang-questions mailing list<br>
<a href="mailto:erlang-questions@erlang.org" target="_blank">erlang-questions@erlang.org</a><br>
<a href="http://erlang.org/mailman/listinfo/erlang-questions" rel="noreferrer" target="_blank">http://erlang.org/mailman/list<wbr>info/erlang-questions</a><br>
<br></blockquote></div><br></div></div></div></div></div>
<br>______________________________<wbr>_________________<br>
erlang-questions mailing list<br>
<a href="mailto:erlang-questions@erlang.org">erlang-questions@erlang.org</a><br>
<a href="http://erlang.org/mailman/listinfo/erlang-questions" rel="noreferrer" target="_blank">http://erlang.org/mailman/<wbr>listinfo/erlang-questions</a><br>
<br></blockquote></div><br></div>