[PATCH] Fix hang when calling functions in a module with an on_load attribute from a native module
Mon Oct 4 14:28:44 CEST 2010
Paul Guyot writes:
> >> --- otp_src_R14B/lib/kernel/src/error_handler.erl.~1~ 2010-09-13
> >> 19:00:22.000000000 +0200
> >> +++ otp_src_R14B/lib/kernel/src/error_handler.erl 2010-09-24
> >> 13:44:09.000000000 +0200
> >> @@ -17,6 +17,7 @@
> >> %% %CopyrightEnd%
> >> %%
> >> -module(error_handler).
> >> +-compile(no_native).
> >> %% A simple error handler.
> > Any objections to applying this fix to dev? Otherwise let's include it
> > as a trivial workaround that makes enable-native-libs useable.
> This is no trivial workaround and it works by sheer luck.
The loops always involve a native-mode error_handler, so by eliminating
that the loops are eliminated. That's not "by sheer luck".
> What this bug reveals is that there is a major design flaw with on_load and HiPE, as illustrated by the fact that the emulator goes into an infinite loop if error_handler is natively compiled: every time there is a remote call to a function in a module that has an on_load attribute, the emulator goes through error_handler:undefined_function! I think it doesn't infinite loop when error_handler is not native because it goes through the apply BIF which works around the linkage issue.
The short explanation is that BEAM-mode code doesn't loop in this case
because BEAM has an additional level of indirection between caller and
callee. The BEAM code loader updates the middle level (the code address
in Export entries) which allows callers to immediately see the updates.
> A quick benchmark shows that (on dev, without --enable-native-libs) calling a remote function in an on_load attribute currently takes 100+ times longer from HiPE than from BEAM.
Modules with on_load attributes are supposed to be prevented from
being compiled to native code, so the "from HiPE" case shouldn't exist.
If you meant to write that a call from native code in module N1 to BEAM
code in module B2, where B2 also has an on_load, is slow, then yes I
know that and it's because of the extra roundtrip through error_handler.
In the long term my change to how native code links to BEAM code will
fix that. In the short term you can work around it by splitting modules
into on_load and non-on_load bits.
> The way things work for BEAM is that final_touch disables export entries if there is an on_load handler, and finish_after_on_load_2 BIF, which is called from init/code_server, fixes them by using the saved address. The same should be done for HiPE, but of course it's more complicated because dynamic linking works differently in HiPE.
> I can only see three solutions:
> - fix the pointers in each callers (which doesn't sound doable);
Not doable in the short term.
> - change the way dynamic linking works in HiPE as Mikael suggests;
> - or add a small stub, as implemented in my patch -- this is still three to five times slower than BEAM.
> Simply ignoring the problem will not make it go away.
It's not being ignored.
More information about the erlang-patches