<div dir="ltr">Have you tried to run your code in a debug emulator? <a href="https://github.com/erlang/otp/blob/master/HOWTO/INSTALL.md#how-to-build-a-debug-enabled-erlang-runtime-system">https://github.com/erlang/otp/blob/master/HOWTO/INSTALL.md#how-to-build-a-debug-enabled-erlang-runtime-system</a><div><br></div><div>Since it seems to be segfaulting in lists:member/2, I would guess that your nif somehow builds an invalid list that later is used by lists:member/2.</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, May 29, 2018 at 11:04 AM, Igor Clark <span dir="ltr"><<a href="mailto:igor.clark@gmail.com" target="_blank">igor.clark@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Thanks Sergej - that's where I got the thread reports I pasted in below, from e.g. 'beam.smp_2018-05-28-212735_Ig<wbr>or-Clarks-iMac.crash'.<br>
<br>
Each log says the only crashed thread was a scheduler thread, for example "8_scheduler" running "process_main" in the case of the first one below. This is how I tracked down a bunch of errors in my own code, but the only ones that still happen are in the scheduler, according to the Console crash logs.<br>
<br>
The thing is, it seems really unlikely that a VM running my NIF code would just happen to be crashing in the scheduler rather than my code(!) - so that's what I'm trying to work out, how to find out what's actually going on, given that the log tells me the crashed thread is running "process_main" or 'lists_member_2'.<br>
<br>
Any suggestions welcome!<br>
<br>
Cheers,<br>
Igor<div class="HOEnZb"><div class="h5"><br>
<br>
On 29/05/2018 04:16, Sergej Jurečko wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
On macOS there is a quick way to get a stack trace if you compiled with debug symbols.<br>
Open /Applications/Utilities/Consol<wbr>e<br>
Go to: User Reports<br>
<br>
You will see beam.smp in there if it crashed. Click on it and you get a report what every thread was calling at the time of crash.<br>
<br>
<br>
Regards,<br>
Sergej<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
On 28 May 2018, at 23:46, Igor Clark <<a href="mailto:igor.clark@gmail.com" target="_blank">igor.clark@gmail.com</a>> wrote:<br>
<br>
Hi folks, hope all well,<br>
<br>
I have a NIF which very occasionally segfaults, intermittently and apparently unpredictably, bringing down the VM. I've spent a bunch of time tracing allocation and dereferencing problems in my NIF code, and I've got rid of what seems like 99%+ of the problems - but it still occasionally happens, and I'm having trouble tracing further, because the crash logs show the crashed threads as doing things like these: (each one taken from a separate log where it's the only crashed thread)<br>
<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Thread 40 Crashed:: 8_scheduler<br>
0 beam.smp 0x000000001c19980b process_main + 1570<br>
<br>
Thread 5 Crashed:: 3_scheduler<br>
0 beam.smp 0x000000001c01d80b process_main + 1570<br>
<br>
Thread 7 Crashed:: 5_scheduler<br>
0 beam.smp 0x000000001baff0b8 lists_member_2 + 63<br>
<br>
Thread 3 Crashed:: 1_scheduler<br>
0 beam.smp 0x000000001d4b780b process_main + 1570<br>
<br>
Thread 5 Crashed:: 3_scheduler<br>
0 beam.smp 0x000000001fcf280b process_main + 1570<br>
<br>
Thread 6 Crashed:: 4_scheduler<br>
0 beam.smp 0x000000001ae290b8 lists_member_2 + 63<br>
</blockquote>
<br>
I'm very confident that the problems are in my code, not in the scheduler ;-) But without more detail, I don't know how to trace where they're happening. When they do, there are sometimes other threads doing things in my code (maybe 20% of the time) - but mostly not, and on the occasions when they are, I've been unable to see what the problem might be on the lines referenced.<br>
<br>
It seems like it's some kind of cross-thread data access issue, but I don't know how to track it down.<br>
<br>
Some more context about what's going on. My NIF load() function starts a thread which passes a callback function to a library that talks to some hardware, which calls the callback when it has a message. It's a separate thread because the library only calls back to the thread that initialized it; when I ran it directly in NIF load(), it didn't call back, but in the VM-managed thread, it works as expected. The thread sits and waits for stuff to happen, and callbacks come when they should.<br>
<br>
I use enif_thread_create/enif_thread<wbr>_opts_create to start the thread, and use enif_alloc/enif_free everywhere. I keep a static pointer in the NIF to a couple of members of the state struct, as that seems the only way to reference them in the callback function. The struct is kept in NIF private data: I pass **priv from load() to the thread_main function, allocate the state struct using enif_alloc in thread_main, and set priv pointing to the state struct, also in the thread. Other NIF functions do access members of the state struct, but only ever through enif_priv_data( env ).<br>
<br>
The vast majority of the time it all works perfectly, humming along very nicely, but every now and then, without any real pattern I can see, it just segfaults and the VM comes down. It's only happened 3 times in the last 20+ hours of working on the app, testing & running all the while, doing VM starts, stops, code reloads, etc. But when it happens, it's kind of a showstopper, and I'd really like to nail it down.<br>
<br>
This is all happening in Erlang 20.3.4 on MacOS 10.12.6 / Apple LLVM version 9.0.0 (clang-900.0.38).<br>
<br>
Any ideas on how/where to look next to try to track this down? Hope it's not something structural in the above which just won't work.<br>
<br>
Cheers,<br>
Igor<br>
<br>
<br>
______________________________<wbr>_________________<br>
erlang-questions mailing list<br>
<a href="mailto:erlang-questions@erlang.org" target="_blank">erlang-questions@erlang.org</a><br>
<a href="http://erlang.org/mailman/listinfo/erlang-questions" rel="noreferrer" target="_blank">http://erlang.org/mailman/list<wbr>info/erlang-questions</a><br>
</blockquote></blockquote>
<br>
______________________________<wbr>_________________<br>
erlang-questions mailing list<br>
<a href="mailto:erlang-questions@erlang.org" target="_blank">erlang-questions@erlang.org</a><br>
<a href="http://erlang.org/mailman/listinfo/erlang-questions" rel="noreferrer" target="_blank">http://erlang.org/mailman/list<wbr>info/erlang-questions</a><br>
</div></div></blockquote></div><br></div>