[PATCH] Fix hang when calling functions in a module with an on_load attribute from a native module

Tuncer Ayaz tuncer.ayaz@REDACTED
Mon Oct 4 14:53:47 CEST 2010


On Mon, Oct 4, 2010 at 2:28 PM, Mikael Pettersson wrote:
> Paul Guyot writes:
>  > >> --- otp_src_R14B/lib/kernel/src/error_handler.erl.~1~   2010-09-13
>  > >> 19:00:22.000000000 +0200
>  > >> +++ otp_src_R14B/lib/kernel/src/error_handler.erl       2010-09-24
>  > >> 13:44:09.000000000 +0200
>  > >> @@ -17,6 +17,7 @@
>  > >>  %% %CopyrightEnd%
>  > >>  %%
>  > >>  -module(error_handler).
>  > >> +-compile(no_native).
>  > >>
>  > >>  %% A simple error handler.
>  > >
>  > > Any objections to applying this fix to dev? Otherwise let's
>  > > include it as a trivial workaround that makes enable-native-libs
>  > > useable.
>  >
>  >
>  > This is no trivial workaround and it works by sheer luck.
>
> The loops always involve a native-mode error_handler, so by
> eliminating that the loops are eliminated. Â That's not "by sheer
> luck".
>
>> What this bug reveals is that there is a major design flaw with
>> on_load and HiPE, as illustrated by the fact that the emulator
>> goes into an infinite loop if error_handler is natively compiled:
>> every time there is a remote call to a function in a module that
>> has an on_load attribute, the emulator goes through
>> error_handler:undefined_function! I think it doesn't infinite loop
>> when error_handler is not native because it goes through the apply
>> BIF which works around the linkage issue.
>
> The short explanation is that BEAM-mode code doesn't loop in this
> case because BEAM has an additional level of indirection between
> caller and callee. Â The BEAM code loader updates the middle level
> (the code address in Export entries) which allows callers to
> immediately see the updates.
>
>  > A quick benchmark shows that (on dev, without --enable-native-libs)
>  > calling a remote function in an on_load attribute currently takes
>  > 100+ times longer from HiPE than from BEAM.
>
> Modules with on_load attributes are supposed to be prevented from
> being compiled to native code, so the "from HiPE" case shouldn't
> exist.
>
> If you meant to write that a call from native code in module N1 to
> BEAM code in module B2, where B2 also has an on_load, is slow, then
> yes I know that and it's because of the extra roundtrip through
> error_handler. In the long term my change to how native code links
> to BEAM code will fix that. In the short term you can work around it
> by splitting modules into on_load and non-on_load bits.

Speaking of short term fixes, what would you prefer to see in the dev
branch?
 S) split error_handler into on_load/non-on_load bits
 D) assuming you see no drawback add no_native directive to
    error_handler
 N) none of the above two options

>  > The way things work for BEAM is that final_touch disables
>  > export entries if there is an on_load handler, and
>  > finish_after_on_load_2 BIF, which is called from
>  > init/code_server, fixes them by using the saved address.
>  > The same should be done for HiPE, but of course it's more
>  > complicated because dynamic linking works differently in HiPE.
>  >
>  > I can only see three solutions:
>  > - fix the pointers in each callers (which doesn't sound doable);
>
> Not doable in the short term.
>
>  > - change the way dynamic linking works in HiPE as Mikael suggests;
>  > - or add a small stub, as implemented in my patch -- this is
>  >   still three to five times slower than BEAM.
>  >
>  > Simply ignoring the problem will not make it go away.
>
> It's not being ignored.


More information about the erlang-patches mailing list