<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
Thanks Lukas and Peti, that's great. "erl -emu_type debug"
definitely works - I haven't made the debug build yet but I get
"erlexec: The emulator
'/usr/local/Cellar/erlang/20.3.4/lib/erlang/erts-9.3/bin/beam.debug.smp'
does not exist", which is what I want. I'll get onto the debug build
and see what I can find out.<br>
<br>
In case anyone else wants to use that in rebar3 shell, I found
<a class="moz-txt-link-freetext" href="http://www.rebar3.org/v3.0/discuss/5745fb105528582000dfb47f">http://www.rebar3.org/v3.0/discuss/5745fb105528582000dfb47f</a> which
shows you can set ERL_FLAGS to just set -emu_type directly, or
specify vm.args so you can then set it in there, e.g.:<br>
<br>
<blockquote type="cite">ERL_FLAGS=" -args_file config/vm.args
-config config/sys.config" rebar3 shell</blockquote>
<br>
Cheers,<br>
Igor<br>
<br>
<div class="moz-cite-prefix">On 29/05/2018 14:09, Peti Gömöri wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CAEhaAyGNh6aT7VwY0bnG32W2xoLTfTk-Vm9zXnGXT-R0H4AidQ@mail.gmail.com">
<div dir="ltr">since OTP 20 the <span class="gmail-code" style=""><b>-emu_type</b> flag
might also work eg.:</span>
<div><span class="gmail-code" style=""> erl -emu_type debug</span></div>
<div><span class="gmail-code" style=""><br>
</span></div>
<div><span class="gmail-code" style="">and you can put it in the
vm.args file too</span></div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Tue, May 29, 2018 at 2:45 PM, Lukas
Larsson <span dir="ltr"><<a href="mailto:lukas@erlang.org"
target="_blank" moz-do-not-send="true">lukas@erlang.org</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">I don't know how to make rebar3 run the debug
emulator, but a quick and dirty trick that I do when all
else fails is to copy the beam.debug.smp file over the
beam.smp file.
<div><br>
</div>
<div>You probably also have to copy the
erl_child_setup.debug file, that file should however
have the .debug suffix remaining. So:</div>
<div><br>
</div>
<div>cp bin/`erts/autoconf/config.<wbr>guess`/beam.debug.smp
path/to/release/erts-v.s.n/<wbr>bin/beam.smp</div>
<div>cp bin/`erts/autoconf/config.<wbr>guess`/erl_child_setup.debug
path/to/release/erts-v.s.n/<wbr>bin/
<div>
<div class="h5"><br>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Tue, May 29, 2018 at
1:30 PM, Igor Clark <span dir="ltr"><<a
href="mailto:igor.clark@gmail.com"
target="_blank" moz-do-not-send="true">igor.clark@gmail.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF"> Thanks very much
Lukas, I think the debug emulator could be
what I'm looking for. The NIF only sometimes
crashes on lists:member/2 - those log lines
are all from different crashes (there's only
one crashed thread each time), and sometimes
it just crashes on process_main. So I think
I might need the debug emulator to trace
further.<br>
<br>
However I have a lot to learn about how to
integrate C tooling with something so
complex. When I run the debug emulator, does
it just show more detailed info in stack
traces, or will I need to attach gdb/lldb
etc to find out what's going on? Is there
any more info on how to set this all up?<br>
<br>
Also, not 100% sure how to run it, as I run
my app with "rebar3 shell" from a release
layout during development, or the same
inside the NIF-specific app when trying to
track problems down there. The doc you
linked says:<br>
<br>
<blockquote type="cite">
<p>To start the debug enabled runtime
system execute:</p>
<pre style="box-sizing:border-box;font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;font-size:13.6px;margin-top:0px;margin-bottom:16px;word-wrap:normal;padding:16px;overflow:auto;line-height:1.45;background-color:rgb(246,248,250);border-radius:3px;color:rgb(36,41,46);font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;word-spacing:0px;text-decoration:none"><code style="box-sizing:border-box;font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;font-size:13.6px;padding:0px;margin:0px;background-color:transparent;border-radius:3px;word-break:normal;white-space:pre-wrap;border:0px;display:inline;overflow:visible;line-height:inherit;word-wrap:normal">$ $ERL_TOP/bin/cerl -debug</code></pre>
</blockquote>
<br>
I realise these are more rebar3 than erlang
questions, but I can't find much in the
rebar3 docs about them:<br>
<br>
- How should I specify that rebar3 should
run "cerl" instead of "erl" ?<br>
<br>
- Should I just add "-debug" in my
"config/vm.args" or is there another way to
do this?<br>
<br>
Thank you for your help!<span
class="m_-8160118314680844088gmail-HOEnZb"><font
color="#888888"><br>
i</font></span>
<div>
<div
class="m_-8160118314680844088gmail-h5"><br>
<br>
<div
class="m_-8160118314680844088gmail-m_-599898042086351337moz-cite-prefix">On
29/05/2018 11:30, Lukas Larsson wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">Have you tried to run
your code in a debug emulator? <a
href="https://github.com/erlang/otp/blob/master/HOWTO/INSTALL.md#how-to-build-a-debug-enabled-erlang-runtime-system"
target="_blank"
moz-do-not-send="true">https://github.com/e<wbr>rlang/otp/blob/master/HOWTO/IN<wbr>STALL.md#how-to-build-a-debug-<wbr>enabled-erlang-runtime-system</a>
<div><br>
</div>
<div>Since it seems to be
segfaulting in lists:member/2, I
would guess that your nif somehow
builds an invalid list that later
is used by lists:member/2.</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Tue, May
29, 2018 at 11:04 AM, Igor Clark <span
dir="ltr"><<a
href="mailto:igor.clark@gmail.com"
target="_blank"
moz-do-not-send="true">igor.clark@gmail.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">Thanks
Sergej - that's where I got the
thread reports I pasted in
below, from e.g.
'beam.smp_2018-05-28-212735_Ig<wbr>or-Clarks-iMac.crash'.<br>
<br>
Each log says the only crashed
thread was a scheduler thread,
for example "8_scheduler"
running "process_main" in the
case of the first one below.
This is how I tracked down a
bunch of errors in my own code,
but the only ones that still
happen are in the scheduler,
according to the Console crash
logs.<br>
<br>
The thing is, it seems really
unlikely that a VM running my
NIF code would just happen to be
crashing in the scheduler rather
than my code(!) - so that's what
I'm trying to work out, how to
find out what's actually going
on, given that the log tells me
the crashed thread is running
"process_main" or
'lists_member_2'.<br>
<br>
Any suggestions welcome!<br>
<br>
Cheers,<br>
Igor
<div
class="m_-8160118314680844088gmail-m_-599898042086351337HOEnZb">
<div
class="m_-8160118314680844088gmail-m_-599898042086351337h5"><br>
<br>
On 29/05/2018 04:16, Sergej
Jurečko wrote:<br>
<blockquote
class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px
solid
rgb(204,204,204);padding-left:1ex">
On macOS there is a quick
way to get a stack trace
if you compiled with debug
symbols.<br>
Open
/Applications/Utilities/Consol<wbr>e<br>
Go to: User Reports<br>
<br>
You will see beam.smp in
there if it crashed. Click
on it and you get a report
what every thread was
calling at the time of
crash.<br>
<br>
<br>
Regards,<br>
Sergej<br>
<br>
<blockquote
class="gmail_quote"
style="margin:0px 0px
0px
0.8ex;border-left:1px
solid
rgb(204,204,204);padding-left:1ex">
On 28 May 2018, at
23:46, Igor Clark <<a
href="mailto:igor.clark@gmail.com" target="_blank"
moz-do-not-send="true">igor.clark@gmail.com</a>>
wrote:<br>
<br>
Hi folks, hope all well,<br>
<br>
I have a NIF which very
occasionally segfaults,
intermittently and
apparently
unpredictably, bringing
down the VM. I've spent
a bunch of time tracing
allocation and
dereferencing problems
in my NIF code, and I've
got rid of what seems
like 99%+ of the
problems - but it still
occasionally happens,
and I'm having trouble
tracing further, because
the crash logs show the
crashed threads as doing
things like these: (each
one taken from a
separate log where it's
the only crashed thread)<br>
<br>
<br>
<blockquote
class="gmail_quote"
style="margin:0px 0px
0px
0.8ex;border-left:1px
solid
rgb(204,204,204);padding-left:1ex">
Thread 40 Crashed::
8_scheduler<br>
0 beam.smp
0x000000001c19980b
process_main + 1570<br>
<br>
Thread 5 Crashed::
3_scheduler<br>
0 beam.smp
0x000000001c01d80b
process_main + 1570<br>
<br>
Thread 7 Crashed::
5_scheduler<br>
0 beam.smp
0x000000001baff0b8
lists_member_2 + 63<br>
<br>
Thread 3 Crashed::
1_scheduler<br>
0 beam.smp
0x000000001d4b780b
process_main + 1570<br>
<br>
Thread 5 Crashed::
3_scheduler<br>
0 beam.smp
0x000000001fcf280b
process_main + 1570<br>
<br>
Thread 6 Crashed::
4_scheduler<br>
0 beam.smp
0x000000001ae290b8
lists_member_2 + 63<br>
</blockquote>
<br>
I'm very confident that
the problems are in my
code, not in the
scheduler ;-) But
without more detail, I
don't know how to trace
where they're happening.
When they do, there are
sometimes other threads
doing things in my code
(maybe 20% of the time)
- but mostly not, and on
the occasions when they
are, I've been unable to
see what the problem
might be on the lines
referenced.<br>
<br>
It seems like it's some
kind of cross-thread
data access issue, but I
don't know how to track
it down.<br>
<br>
Some more context about
what's going on. My NIF
load() function starts a
thread which passes a
callback function to a
library that talks to
some hardware, which
calls the callback when
it has a message. It's a
separate thread because
the library only calls
back to the thread that
initialized it; when I
ran it directly in NIF
load(), it didn't call
back, but in the
VM-managed thread, it
works as expected. The
thread sits and waits
for stuff to happen, and
callbacks come when they
should.<br>
<br>
I use
enif_thread_create/enif_thread<wbr>_opts_create
to start the thread, and
use enif_alloc/enif_free
everywhere. I keep a
static pointer in the
NIF to a couple of
members of the state
struct, as that seems
the only way to
reference them in the
callback function. The
struct is kept in NIF
private data: I pass
**priv from load() to
the thread_main
function, allocate the
state struct using
enif_alloc in
thread_main, and set
priv pointing to the
state struct, also in
the thread. Other NIF
functions do access
members of the state
struct, but only ever
through enif_priv_data(
env ).<br>
<br>
The vast majority of the
time it all works
perfectly, humming along
very nicely, but every
now and then, without
any real pattern I can
see, it just segfaults
and the VM comes down.
It's only happened 3
times in the last 20+
hours of working on the
app, testing &
running all the while,
doing VM starts, stops,
code reloads, etc. But
when it happens, it's
kind of a showstopper,
and I'd really like to
nail it down.<br>
<br>
This is all happening in
Erlang 20.3.4 on MacOS
10.12.6 / Apple LLVM
version 9.0.0
(clang-900.0.38).<br>
<br>
Any ideas on how/where
to look next to try to
track this down? Hope
it's not something
structural in the above
which just won't work.<br>
<br>
Cheers,<br>
Igor<br>
<br>
<br>
______________________________<wbr>_________________<br>
erlang-questions mailing
list<br>
<a
href="mailto:erlang-questions@erlang.org"
target="_blank"
moz-do-not-send="true">erlang-questions@erlang.org</a><br>
<a
href="http://erlang.org/mailman/listinfo/erlang-questions"
rel="noreferrer"
target="_blank"
moz-do-not-send="true">http://erlang.org/mailman/list<wbr>info/erlang-questions</a><br>
</blockquote>
</blockquote>
<br>
______________________________<wbr>_________________<br>
erlang-questions mailing
list<br>
<a
href="mailto:erlang-questions@erlang.org"
target="_blank"
moz-do-not-send="true">erlang-questions@erlang.org</a><br>
<a
href="http://erlang.org/mailman/listinfo/erlang-questions"
rel="noreferrer"
target="_blank"
moz-do-not-send="true">http://erlang.org/mailman/list<wbr>info/erlang-questions</a><br>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</div>
</div>
</div>
<br>
______________________________<wbr>_________________<br>
erlang-questions mailing list<br>
<a href="mailto:erlang-questions@erlang.org"
target="_blank" moz-do-not-send="true">erlang-questions@erlang.org</a><br>
<a
href="http://erlang.org/mailman/listinfo/erlang-questions"
rel="noreferrer" target="_blank"
moz-do-not-send="true">http://erlang.org/mailman/list<wbr>info/erlang-questions</a><br>
<br>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</div>
</div>
<br>
______________________________<wbr>_________________<br>
erlang-questions mailing list<br>
<a href="mailto:erlang-questions@erlang.org"
moz-do-not-send="true">erlang-questions@erlang.org</a><br>
<a
href="http://erlang.org/mailman/listinfo/erlang-questions"
rel="noreferrer" target="_blank" moz-do-not-send="true">http://erlang.org/mailman/<wbr>listinfo/erlang-questions</a><br>
<br>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</body>
</html>