I haven't reviewed the code in detail but if it increases performance and is correct it is something we would like to have. I will let Björn know (since I think he got this handed to him earlier this week). He has a lot of other things on his plate also at the moment though. He was thinking of separating them into multiple bifs.<div>
<br></div><div>Regarding the ~1s delays. I guess this is under load. This is probably due to the fact that we still need to wait for the other schedulers. This can be up to 2000 (or was it 4000) reductions. Other factors that increases this delay might be a gc at the wrong moment (one scheduler just started one before the block), NIFs or something else that doesn't take reductions into account. </div>
<div><br></div><div>Ultimately we probably want to separate code loading completely. That would be after R15 release though. </div><div><br></div><div>// Björn-Egil</div><div><br><br><div class="gmail_quote">Den 8 november 2011 18:59 skrev Bob Ippolito <span dir="ltr"><<a href="mailto:bob@redivi.com">bob@redivi.com</a>></span>:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">Would it help if we wrote some patches to make more of this work<br>
happen before the block? Ideally there would be no block at all, but I<br>
don't know enough about the internals to really make that happen.<br>
<br>
Here's a branch (from R14B04) where I've moved most of the heavy<br>
lifting (especially decoding the literal table) to before the block:<br>
<br>
(tree)<br>
<a href="https://github.com/etrepum/otp/tree/async-load_module-R14B04-20872" target="_blank">https://github.com/etrepum/otp/tree/async-load_module-R14B04-20872</a><br>
<br>
(diff)<br>
<a href="https://github.com/etrepum/otp/compare/OTP_R14B04...async-load_module-R14B04-20872" target="_blank">https://github.com/etrepum/otp/compare/OTP_R14B04...async-load_module-R14B04-20872</a><br>
<br>
Most of the work is just putting the LoaderState on the heap instead<br>
of the stack so it can be made part of the API without moving all of<br>
the types out as well.<br>
<br>
This helps in some tests I've done, but it seems that maybe just the<br>
action of blocking and unblocking can introduce a significant pause<br>
(~1 sec) under some circumstances.<br>
<br>
2011/11/5 Björn-Egil Dahlberg <<a href="mailto:wallentin.dahlberg@gmail.com">wallentin.dahlberg@gmail.com</a>>:<br>
<div class="HOEnZb"><div class="h5">> There is no other locking for code loading than blocking. This is an<br>
> optimization of course since locking mechanism overhead is removed from the<br>
> equation. Code loading is not used all that often in the normal cases<br>
> besides startups and upgrades.<br>
> That being said, there are plans to remove this "stop-the-world" strategy<br>
> since it is blocking other strategies and optimizations. Also, we are well<br>
> aware of that blocking does degrade performance when loading new modules and<br>
> does not agree with our concurrency policy.<br>
> I think we can lessen the time blocked in the current implementation but the<br>
> blocking strategy should (and probably will) be removed. Nothing planned as<br>
> of yet though.<br>
> Regards,<br>
> Björn-Egil<br>
><br>
> 2011/11/5 Bob Ippolito <<a href="mailto:bob@redivi.com">bob@redivi.com</a>><br>
>><br>
>> We've found a bottleneck in some of our systems, when we load in large<br>
>> new modules there is a noticeable pause (1+ seconds) that blocks all<br>
>> of the schedulers. It looks like this is because the<br>
>> erlang:load_binary/2 BIF blocks SMP before it does anything at all.<br>
>><br>
>> It would be a big win for us if more of this happened without blocking<br>
>> the VM, there's a lot of busy work in loading a module that shouldn't<br>
>> need any locking. For example, decompressing and decoding the literal<br>
>> table is probably where our code spends almost all of its time.<br>
>><br>
>> There aren't a lot of comments for why it needs to lock the VM,<br>
>> especially for the whole of load_binary. Are there any hidden gotchas<br>
>> in here that I should know about before giving it a try? I'm unable to<br>
>> find much where the block is actually necessary, but I am not very<br>
>> familiar with the BEAM implementation yet.<br>
>><br>
>> I expect that the erts_export_consolidate, insert_new_code and<br>
>> final_touch are really the only things that need so much<br>
>> serialization, and maybe the set_default_trace_pattern… is there<br>
>> anything big that I'm missing? It seems that breaking up<br>
>> erts_load_module into two functions (one to do all the decoding<br>
>> without the erts_smp_block_system(0), and the other to do the<br>
>> integration work with the block) would be straightforward.<br>
>><br>
>> -bob<br>
>> _______________________________________________<br>
>> erlang-questions mailing list<br>
>> <a href="mailto:erlang-questions@erlang.org">erlang-questions@erlang.org</a><br>
>> <a href="http://erlang.org/mailman/listinfo/erlang-questions" target="_blank">http://erlang.org/mailman/listinfo/erlang-questions</a><br>
><br>
><br>
</div></div></blockquote></div><br></div>