[erlang-questions] Scheduling issues with Erlang on RTEMS

Jesper Louis Andersen jesper.louis.andersen@REDACTED
Tue Feb 13 13:52:53 CET 2018


The "old" way of debugging something like this is to create a ring-buffer
in the binary which tracks the latest K events and then have a way to grab
that ring buffer (UNIX Signal, etc). My bet is that you have some kind of
deadlock situation which stems from an assumption about threads/mutexes in
Erlang and how RTEMS implements the abstraction, leading to a leaky
abstraction. A way to inspect the reduction counter could also be good to
have.

In general, write down what you assume in the VM state and start sprinkling
assertions in. The goal is to be scientific, so verify your assumptions.
The bugs often lurk where your intuition is leading you astray and you take
a giant leap of faith where minute details matter and turn out to be
different from what you expect.

The VM can be built in several debug modes, but I'm not sure they verify
the underlying fabric is as expected.

On Tue, Feb 13, 2018 at 1:04 PM Sébastien Merle <s.merle@REDACTED> wrote:

> Hi,
>
> We are working on GRiSP (grisp.org) and we are porting the Erlang VM to
> PowerPC/RTEMS. Everything works fine with 19.3.6 without threading (
> `--disable-threads`). But with PLAIN or SMP build of either Erlang 19.3.6
> or 20.2 we found a strange scheduling issue. Any hints and ideas on how to
> debug it would be so greatly appreciated!
>
> We have a very simple project to test the issue, it has a single
> supervisor starting a `proc_lib` worker that stay in a busy loop after
> calling `proc_lib:init_ack` and a second worker that is a normal
> `gen_server` doing nothing. The symptom is that the supervisor starts the
> first worker and never get to start the second one, we never get to the
> Erlang console. When tracing the supervisor module (with `dbg`) we can
> see it "blocks" on `supervisor:do_start_child`, and when enabling verbose
> logging with the debug build we can see no processes gets started. This
> appends all the time, it is 100% reproducible.
>
> What makes us think it is a scheduling issue is that adding `receive after
> 1 -> ok end` in the busy loop seems to fix the issue and properly start the
> second process and get us to the Erlang console.
>
> Our port of the same code to ARM/RTEMS is working fine, we only have this
> issue on PowerPC.
>
> We cannot use VM probes because we don't have DTrace on RTEMS, printing
> debug in `erl_process.c` is probably not a good idea and there is no clear
> place where to start debugging from with a hardware debugger.
>
> Any guidelines, hints or ideas on how to debug this?
>
> Thank you very much.
>
> Regards,
> Sebastien Merle.
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20180213/2547a6fc/attachment.htm>


More information about the erlang-questions mailing list