<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    Thanks very much Lukas, I think the debug emulator could be what I'm

    looking for. The NIF only sometimes crashes on lists:member/2 -

    those log lines are all from different crashes (there's only one

    crashed thread each time), and sometimes it just crashes on

    process_main. So I think I might need the debug emulator to trace

    further.<br>

    <br>

    However I have a lot to learn about how to integrate C tooling with

    something so complex. When I run the debug emulator, does it just

    show more detailed info in stack traces, or will I need to attach

    gdb/lldb etc to find out what's going on? Is there any more info on

    how to set this all up?<br>

    <br>

    Also, not 100% sure how to run it, as I run my app with "rebar3

    shell" from a release layout during development, or the same inside

    the NIF-specific app when trying to track problems down there. The

    doc you linked says:<br>

    <br>

    <blockquote type="cite">

      <p style="box-sizing: border-box; margin-top: 0px; margin-bottom:

        16px; caret-color: rgb(36, 41, 46); color: rgb(36, 41, 46);

        font-family: -apple-system, BlinkMacSystemFont, "Segoe

        UI", Helvetica, Arial, sans-serif, "Apple Color

        Emoji", "Segoe UI Emoji", "Segoe UI

        Symbol"; font-size: 16px; font-style: normal;

        font-variant-caps: normal; font-weight: normal; letter-spacing:

        normal; orphans: auto; text-align: start; text-indent: 0px;

        text-transform: none; white-space: normal; widows: auto;

        word-spacing: 0px; -webkit-text-size-adjust: auto;

        -webkit-text-stroke-width: 0px; background-color: rgb(255, 255,

        255); text-decoration: none;">To start the debug enabled runtime

        system execute:</p>

      <pre style="box-sizing: border-box; font-family: SFMono-Regular, Consolas, "Liberation Mono", Menlo, Courier, monospace; font-size: 13.600000381469727px; margin-top: 0px; margin-bottom: 16px; word-wrap: normal; padding: 16px; overflow: auto; line-height: 1.45; background-color: rgb(246, 248, 250); border-top-left-radius: 3px; border-top-right-radius: 3px; border-bottom-right-radius: 3px; border-bottom-left-radius: 3px; caret-color: rgb(36, 41, 46); color: rgb(36, 41, 46); font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; text-decoration: none;"><code style="box-sizing: border-box; font-family: SFMono-Regular, Consolas, "Liberation Mono", Menlo, Courier, monospace; font-size: 13.600000381469727px; padding: 0px; margin: 0px; background-color: transparent; border-top-left-radius: 3px; border-top-right-radius: 3px; border-bottom-right-radius: 3px; border-bottom-left-radius: 3px; word-break: normal; white-space: pre; border: 0px; display: inline; overflow: visible; line-height: inherit; word-wrap: normal; background-position: initial initial; background-repeat: initial initial;">$ $ERL_TOP/bin/cerl -debug</code></pre>

    </blockquote>

    <br>

    I realise these are more rebar3 than erlang questions, but I can't

    find much in the rebar3 docs about them:<br>

    <br>

    - How should I specify that rebar3 should run "cerl" instead of

    "erl" ?<br>

    <br>

    - Should I just add "-debug" in my "config/vm.args" or is there

    another way to do this?<br>

    <br>

    Thank you for your help!<br>

    i<br>

    <br>

    <div class="moz-cite-prefix">On 29/05/2018 11:30, Lukas Larsson

      wrote:<br>

    </div>

    <blockquote type="cite"

cite="mid:CAP3zBqO5fDqe6iic75BM4TB=ipgJ1NbEidafh_snkOSxURpwww@mail.gmail.com">

      <div dir="ltr">Have you tried to run your code in a debug

        emulator? <a

href="https://github.com/erlang/otp/blob/master/HOWTO/INSTALL.md#how-to-build-a-debug-enabled-erlang-runtime-system"

          moz-do-not-send="true">https://github.com/erlang/otp/blob/master/HOWTO/INSTALL.md#how-to-build-a-debug-enabled-erlang-runtime-system</a>

        <div><br>

        </div>

        <div>Since it seems to be segfaulting in lists:member/2, I would

          guess that your nif somehow builds an invalid list that later

          is used by lists:member/2.</div>

      </div>

      <div class="gmail_extra"><br>

        <div class="gmail_quote">On Tue, May 29, 2018 at 11:04 AM, Igor

          Clark <span dir="ltr"><<a

              href="mailto:igor.clark@gmail.com" target="_blank"

              moz-do-not-send="true">igor.clark@gmail.com</a>></span>

          wrote:<br>

          <blockquote class="gmail_quote" style="margin:0 0 0

            .8ex;border-left:1px #ccc solid;padding-left:1ex">Thanks

            Sergej - that's where I got the thread reports I pasted in

            below, from e.g. 'beam.smp_2018-05-28-212735_Ig<wbr>or-Clarks-iMac.crash'.<br>

            <br>

            Each log says the only crashed thread was a scheduler

            thread, for example "8_scheduler" running "process_main" in

            the case of the first one below. This is how I tracked down

            a bunch of errors in my own code, but the only ones that

            still happen are in the scheduler, according to the Console

            crash logs.<br>

            <br>

            The thing is, it seems really unlikely that a VM running my

            NIF code would just happen to be crashing in the scheduler

            rather than my code(!) - so that's what I'm trying to work

            out, how to find out what's actually going on, given that

            the log tells me the crashed thread is running

            "process_main" or 'lists_member_2'.<br>

            <br>

            Any suggestions welcome!<br>

            <br>

            Cheers,<br>

            Igor

            <div class="HOEnZb">

              <div class="h5"><br>

                <br>

                On 29/05/2018 04:16, Sergej Jurečko wrote:<br>

                <blockquote class="gmail_quote" style="margin:0 0 0

                  .8ex;border-left:1px #ccc solid;padding-left:1ex">

                  On macOS there is a quick way to get a stack trace if

                  you compiled with debug symbols.<br>

                  Open /Applications/Utilities/Consol<wbr>e<br>

                  Go to: User Reports<br>

                  <br>

                  You will see beam.smp in there if it crashed. Click on

                  it and you get a report what every thread was calling

                  at the time of crash.<br>

                  <br>

                  <br>

                  Regards,<br>

                  Sergej<br>

                  <br>

                  <blockquote class="gmail_quote" style="margin:0 0 0

                    .8ex;border-left:1px #ccc solid;padding-left:1ex">

                    On 28 May 2018, at 23:46, Igor Clark <<a

                      href="mailto:igor.clark@gmail.com" target="_blank"

                      moz-do-not-send="true">igor.clark@gmail.com</a>>

                    wrote:<br>

                    <br>

                    Hi folks, hope all well,<br>

                    <br>

                    I have a NIF which very occasionally segfaults,

                    intermittently and apparently unpredictably,

                    bringing down the VM. I've spent a bunch of time

                    tracing allocation and dereferencing problems in my

                    NIF code, and I've got rid of what seems like 99%+

                    of the problems - but it still occasionally happens,

                    and I'm having trouble tracing further, because the

                    crash logs show the crashed threads as doing things

                    like these: (each one taken from a separate log

                    where it's the only crashed thread)<br>

                    <br>

                    <br>

                    <blockquote class="gmail_quote" style="margin:0 0 0

                      .8ex;border-left:1px #ccc solid;padding-left:1ex">

                      Thread 40 Crashed:: 8_scheduler<br>

                      0   beam.smp                         

                      0x000000001c19980b process_main + 1570<br>

                      <br>

                      Thread 5 Crashed:: 3_scheduler<br>

                      0   beam.smp                         

                      0x000000001c01d80b process_main + 1570<br>

                      <br>

                      Thread 7 Crashed:: 5_scheduler<br>

                      0   beam.smp                         

                      0x000000001baff0b8 lists_member_2 + 63<br>

                      <br>

                      Thread 3 Crashed:: 1_scheduler<br>

                      0   beam.smp                         

                      0x000000001d4b780b process_main + 1570<br>

                      <br>

                      Thread 5 Crashed:: 3_scheduler<br>

                      0   beam.smp                         

                      0x000000001fcf280b process_main + 1570<br>

                      <br>

                      Thread 6 Crashed:: 4_scheduler<br>

                      0   beam.smp                         

                      0x000000001ae290b8 lists_member_2 + 63<br>

                    </blockquote>

                    <br>

                    I'm very confident that the problems are in my code,

                    not in the scheduler ;-) But without more detail, I

                    don't know how to trace where they're happening.

                    When they do, there are sometimes other threads

                    doing things in my code (maybe 20% of the time) -

                    but mostly not, and on the occasions when they are,

                    I've been unable to see what the problem might be on

                    the lines referenced.<br>

                    <br>

                    It seems like it's some kind of cross-thread data

                    access issue, but I don't know how to track it down.<br>

                    <br>

                    Some more context about what's going on. My NIF

                    load() function starts a thread which passes a

                    callback function to a library that talks to some

                    hardware, which calls the callback when it has a

                    message. It's a separate thread because the library

                    only calls back to the thread that initialized it;

                    when I ran it directly in NIF load(), it didn't call

                    back, but in the VM-managed thread, it works as

                    expected. The thread sits and waits for stuff to

                    happen, and callbacks come when they should.<br>

                    <br>

                    I use enif_thread_create/enif_thread<wbr>_opts_create

                    to start the thread, and use enif_alloc/enif_free

                    everywhere. I keep a static pointer in the NIF to a

                    couple of members of the state struct, as that seems

                    the only way to reference them in the callback

                    function. The struct is kept in NIF private data: I

                    pass **priv from load() to the thread_main function,

                    allocate the state struct using enif_alloc in

                    thread_main, and set priv pointing to the state

                    struct, also in the thread. Other NIF functions do

                    access members of the state struct, but only ever

                    through enif_priv_data( env ).<br>

                    <br>

                    The vast majority of the time it all works

                    perfectly, humming along very nicely, but every now

                    and then, without any real pattern I can see, it

                    just segfaults and the VM comes down. It's only

                    happened 3 times in the last 20+ hours of working on

                    the app, testing & running all the while, doing

                    VM starts, stops, code reloads, etc. But when it

                    happens, it's kind of a showstopper, and I'd really

                    like to nail it down.<br>

                    <br>

                    This is all happening in Erlang 20.3.4 on MacOS

                    10.12.6 / Apple LLVM version 9.0.0 (clang-900.0.38).<br>

                    <br>

                    Any ideas on how/where to look next to try to track

                    this down? Hope it's not something structural in the

                    above which just won't work.<br>

                    <br>

                    Cheers,<br>

                    Igor<br>

                    <br>

                    <br>

                    ______________________________<wbr>_________________<br>

                    erlang-questions mailing list<br>

                    <a href="mailto:erlang-questions@erlang.org"

                      target="_blank" moz-do-not-send="true">erlang-questions@erlang.org</a><br>

                    <a

                      href="http://erlang.org/mailman/listinfo/erlang-questions"

                      rel="noreferrer" target="_blank"

                      moz-do-not-send="true">http://erlang.org/mailman/list<wbr>info/erlang-questions</a><br>

                  </blockquote>

                </blockquote>

                <br>

                ______________________________<wbr>_________________<br>

                erlang-questions mailing list<br>

                <a href="mailto:erlang-questions@erlang.org"

                  target="_blank" moz-do-not-send="true">erlang-questions@erlang.org</a><br>

                <a

                  href="http://erlang.org/mailman/listinfo/erlang-questions"

                  rel="noreferrer" target="_blank"

                  moz-do-not-send="true">http://erlang.org/mailman/list<wbr>info/erlang-questions</a><br>

              </div>

            </div>

          </blockquote>

        </div>

        <br>

      </div>

    </blockquote>

    <br>

  </body>

</html>