[PATCH] Fix hang when calling functions in a module with an on_load attribute from a native module
Tuncer Ayaz
tuncer.ayaz@REDACTED
Mon Oct 4 14:53:47 CEST 2010
On Mon, Oct 4, 2010 at 2:28 PM, Mikael Pettersson wrote:
> Paul Guyot writes:
> > >> --- otp_src_R14B/lib/kernel/src/error_handler.erl.~1~ 2010-09-13
> > >> 19:00:22.000000000 +0200
> > >> +++ otp_src_R14B/lib/kernel/src/error_handler.erl 2010-09-24
> > >> 13:44:09.000000000 +0200
> > >> @@ -17,6 +17,7 @@
> > >> %% %CopyrightEnd%
> > >> %%
> > >> -module(error_handler).
> > >> +-compile(no_native).
> > >>
> > >> %% A simple error handler.
> > >
> > > Any objections to applying this fix to dev? Otherwise let's
> > > include it as a trivial workaround that makes enable-native-libs
> > > useable.
> >
> >
> > This is no trivial workaround and it works by sheer luck.
>
> The loops always involve a native-mode error_handler, so by
> eliminating that the loops are eliminated. Â That's not "by sheer
> luck".
>
>> What this bug reveals is that there is a major design flaw with
>> on_load and HiPE, as illustrated by the fact that the emulator
>> goes into an infinite loop if error_handler is natively compiled:
>> every time there is a remote call to a function in a module that
>> has an on_load attribute, the emulator goes through
>> error_handler:undefined_function! I think it doesn't infinite loop
>> when error_handler is not native because it goes through the apply
>> BIF which works around the linkage issue.
>
> The short explanation is that BEAM-mode code doesn't loop in this
> case because BEAM has an additional level of indirection between
> caller and callee. Â The BEAM code loader updates the middle level
> (the code address in Export entries) which allows callers to
> immediately see the updates.
>
> > A quick benchmark shows that (on dev, without --enable-native-libs)
> > calling a remote function in an on_load attribute currently takes
> > 100+ times longer from HiPE than from BEAM.
>
> Modules with on_load attributes are supposed to be prevented from
> being compiled to native code, so the "from HiPE" case shouldn't
> exist.
>
> If you meant to write that a call from native code in module N1 to
> BEAM code in module B2, where B2 also has an on_load, is slow, then
> yes I know that and it's because of the extra roundtrip through
> error_handler. In the long term my change to how native code links
> to BEAM code will fix that. In the short term you can work around it
> by splitting modules into on_load and non-on_load bits.
Speaking of short term fixes, what would you prefer to see in the dev
branch?
S) split error_handler into on_load/non-on_load bits
D) assuming you see no drawback add no_native directive to
error_handler
N) none of the above two options
> > The way things work for BEAM is that final_touch disables
> > export entries if there is an on_load handler, and
> > finish_after_on_load_2 BIF, which is called from
> > init/code_server, fixes them by using the saved address.
> > The same should be done for HiPE, but of course it's more
> > complicated because dynamic linking works differently in HiPE.
> >
> > I can only see three solutions:
> > - fix the pointers in each callers (which doesn't sound doable);
>
> Not doable in the short term.
>
> > - change the way dynamic linking works in HiPE as Mikael suggests;
> > - or add a small stub, as implemented in my patch -- this is
> > still three to five times slower than BEAM.
> >
> > Simply ignoring the problem will not make it go away.
>
> It's not being ignored.
More information about the erlang-patches
mailing list