[erlang-questions] Non-blocking BEAM code loading?

Mon Nov 7 09:48:31 CET 2011

As Paolo mentioned there is a optimization of the code upgrade strategy on the 
roadmap for R15B AFAIR. Not only is currently the actual code upgrade only
performed by a single core, but also any other tasks in the system. Meaning
on a well loaded multi-core system all load will need to be handled by that 
single core for the time of the upgrade. This might cause the delay you are
seeing.

T

On Nov 6, 2011, at 4:33 PM, Paolo Negri wrote:

> We run an application which runs thousands of long lived processes and
> we see the system blocking on code purge during code updates.
> I remember that Kenneth Lundin at the recent Erlang User Conference
> announced that something related to code loading optimization is in
> the erlang roadmap, hopefully slides will be published soon [1], if I
> remember well the change was related to spreading code purge across
> all the available cores while currently a single core is actually used
> to perform the operation.
> 
> We also use the trick of compiling data in modules in order to push
> data in the constant pool but we actually have thousands of small
> terms (rendered as one function clause per term) and loading these
> modules doesn't seem to block, but in our case I guess that the
> overall size is much less than 60MB.
> 
> [1] http://www.erlang-factory.com/conference/ErlangUserConference2011/speakers/KennethLundin
> 
> Paolo
> 
> On Sun, Nov 6, 2011 at 5:02 AM, Bob Ippolito <bob@REDACTED> wrote:
>> Normally just a few hundred, purge isn't the slow part for us and I don't
>> believe that it blocks at all (not that I noticed).
>> 
>> On Saturday, November 5, 2011, Robert Virding
>> <robert.virding@REDACTED> wrote:
>>> If you have many processes then code loading can take a noticeable time.
>>> The code server must purge old versions of a module which it does by going
>>> through all processes checking each one if it running the old code and if so
>>> killing it. I don't know if this blocks all the schedulers and if so why,
>>> but it can take a noticeable time to do.
>>> 
>>> Robert
>>> 
>>> 
>>> ________________________________
>>> 
>>> ETS is no good for our use case, we have ~60MB worth of uncompressed
>>> serialized terms (nested gb_trees mostly) that we need live in a given
>>> request. We traverse it very quickly and end up with a very small list of
>>> terms as the result (essentially a filter on a nested structure). A no-copy
>>> ets would work, but since the work is so short lived and code is tightly
>>> associated to this structure I think that our current solution is
>>> appropriate as long as we can fix the blocking.
>>> 
>>> "declare constant" may also work, but I think it is more practical to just
>>> make code loading better in the short term (which has other benefits). You
>>> could implement "declare constant" on top of the code loader, we have a
>>> mochiglobal module in mochiweb that basically serves that purpose.
>>> 
>>> Using a module is a convenient way to give concurrent access to the data
>>> to hundreds of simultaneous processes with minimal serialization.
>>> 
>>> -bob
>>> 
>>> On Saturday, November 5, 2011, Björn-Egil Dahlberg
>>> <wallentin.dahlberg@REDACTED> wrote:
>>>> Yes, it is a simple (and currently only way) to push data to the constant
>>>> pool. You could use ETS instead. It would of course also remove data from
>>>> the heap and reduce GC copy strain but introduce copy on any read.
>>>> Björn Gustavsson talked about introducing a "declare constant" function
>>>> earlier but i don't know if he has done any work on it. The use case was the
>>>> same as for you, pushing lookup structures from gb_trees and gb_sets. But,
>>>> solving code loading would probably be a better prioritization.
>>>> I would like to think that the garbage collector should solve this. Data
>>>> sets which are read only and live are tenured to a generational heap and not
>>>> included in minor gc phases. Putting it in a constant removes it all
>>>> together of course but i would like the garbage collector to identify and
>>>> handle this with generational strategies. The trade off is generational
>>>> heaps linger and may hold dead data longer than necessary.
>>>> 
>>>> 
>>>> Den 5 november 2011 21:30 skrev Bob Ippolito <bob@REDACTED>:
>>>>> 
>>>>> We abuse code loading "upgrades" so that we can share memory and reduce
>>>>> GC pressure for large data structures that do not change quickly (once every
>>>>> few minutes). Works great except for all the blocking!
>>>>> 
>>>>> On Saturday, November 5, 2011, Björn-Egil Dahlberg
>>>>> <wallentin.dahlberg@REDACTED> wrote:
>>>>>> There is no other locking for code loading than blocking. This is an
>>>>>> optimization of course since locking mechanism overhead is removed from the
>>>>>> equation. Code loading is not used all that often in the normal cases
>>>>>> besides startups and upgrades.
>>>>>> That being said, there are plans to remove this "stop-the-world"
>>>>>> strategy since it is blocking other strategies and optimizations. Also, we
>>>>>> are well aware of that blocking does degrade performance when loading new
>>>>>> modules and does not agree with our concurrency policy.
>>>>>> I think we can lessen the time blocked in the current implementation
>>>>>> but the blocking strategy should (and probably will) be removed. Nothing
>>>>>> planned as of yet though.
>>>>>> Regards,
>>>>>> Björn-Egil
>>>>>> 
>>>>>> 2011/11/5 Bob Ippolito <bob@REDACTED>
>>>>>>> 
>>>>>>> We've found a bottleneck in some of our systems, when we load in
>>>>>>> large
>>>>>>> new modules there is a noticeable pause (1+ seconds) that blocks all
>>>>>>> of the schedulers. It looks like this is because the
>>>>>>> erlang:load_binary/2 BIF blocks SMP before it does anything at all.
>>>>>>> 
>>>>>>> It would be a big win for us if more of this happened without
>>>>>>> blocking
>>>>>>> the VM, there's a lot of busy work in loading a module that shouldn't
>>>>>>> need any locking. For example, decompressing and decoding the literal
>>>>>>> table is probably where our code spends almost all of its time.
>>>>>>> 
>>>>>>> There aren't a lot of comments for why it needs to lock the VM,
>>>>>>> especially for the whole of load_binary. Are there any hidden gotchas
>>>>>>> in here that I should know about before giving it a try? I'm unable
>>>>>>> to
>>>>>>> find much where the block is actually necessary, but I am not very
>>> 
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-questions
>> 
>> 
> 
> 
> 
> -- 
> Engineering
> http://www.wooga.com | phone +49-30-8962 5058  | fax +49-30-8964 9064
> 
> wooga GmbH | Saarbruecker Str. 38 | 10405 Berlin | Germany
> Sitz der Gesellschaft: Berlin; HRB 117846 B
> Registergericht Berlin-Charlottenburg
> Geschaeftsfuehrung: Jens Begemann, Philipp Moeser
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions