[erlang-bugs] hipe crash with compiler modules

Wed Nov 4 09:18:18 CET 2009

Hello Mikael,

Thank you for your reply.

> 64-bit native code on OSX has not been validated by the HiPE group,
> so it is unsupported. 32-bit native code on OSX 10.5 seems to work,
> but has been only very lightly tested by us.

I have been using the patches from MacPorts (http://trac.macports.org/browser/trunk/dports/lang/erlang/files/ 
), which I authored, so I realize they're not supported :)

>> The compiler modules have been recompiled with some code that goes
>> like this :
>>
>>             {_, Beam, Path} = code:get_object_code(Module),
>>             {ok, _, Chunks} = beam_lib:all_chunks(Beam),
>>             {ok, {Target, HipeBinary}} = hipe:compile(Module),
>>             ChunkName = hipe_unified_loader:chunk_name(Target),
>>             {ok, NewBeam} = beam_lib:build_module(Chunks ++
>> [{ChunkName, HipeBinary}]),
>
> The proper way to compile modules is to pass 'native' as
> an option to the BEAM compiler. I do not consider hipe:compile
> or hipe_unified_loader:chunk_name to be public APIs.
>
> So why do you do it in this awkward way?

These lines were inspired from what dialyzer does. My first goal was  
to factorize the 1 or 2 minutes when dialyzer has to process more than  
20 modules and decides to natively recompile "key modules" (by calling  
hipe:compile/1). It seems such a waste to recompile those modules over  
and over, so I wrote some code that recompile those modules once and  
for all, and saves the altered beam. These are the 5 lines above, and  
indeed, I call hipe_unified_loader:chunk_name/1 to avoid putting a  
constant in the code there, so the code works on all development and  
continuous integration machines. I did it this way because it seemed  
easier than recompiling OTP modules in an OTP binary deployment. Of  
course, I realize this doesn't use public API.

I thought I could natively recompile more modules than those selected  
by dialyzer. This is how I ended up recompiling all compiler modules.  
It seems useless to recompile several key OTP modules (e.g. lists)  
because they are loaded before HiPE is actually loaded, but compiler  
modules are a good target.

Everything went fine as long as the process consisted in running erlc  
for each of our module and then dialyzer. Then we moved to a new  
toolchain that calls compile:file/2 and dialyzer from a single VM,  
with all calls to compile:file/2 through a rpc:rmap, and this is when  
we started to observe those crashes.

>> In this case, the code that gets compiled natively is just part of
>> OTP. Do you have any hint about what can be done to track down the  
>> bug ?
>
> There is a known problem with concurrent invokations of the HiPE  
> compiler.
> It looks like the serialization of code loading that the BEAM loader  
> is
> supposed to do isn't happening, or it is bypassed. This corrupts  
> certain
> runtime system data structures causing crashes during GC. I'm  
> currently
> trying to debug this problem.

Great. I was just asking how we could help fixing this bug. I realize  
a VM crash is high priority. We're not observing this crash in  
production (since it's purely related to compiling), and we definitely  
don't use unsupported HiPE patches such as MacOS X 10.6/64bits on  
production servers.

Thanks again,

Paul
-- 
Semiocast                       http://titema.com/
+33.175000290 - 62 bis rue Gay-Lussac, 75005 Paris