[erlang-questions] Max heap size

Thu Apr 28 10:31:45 CEST 2016

On Thu, Apr 28, 2016 at 1:27 AM, Richard A. O'Keefe <ok@REDACTED>
wrote:

>
>
> On 27/04/16 7:12 PM, Lukas Larsson wrote:
>
>>
>> https://github.com/erlang/otp/pull/1032
>>
>
> Where is the documentation?
>

The documentation is part of the pull request, the options and behavior are
described here:
https://github.com/erlang/otp/pull/1032/commits/629c6c0a4aea094bea43a74ca1c1664ec1041e43#diff-0fc9fb0d3e12721dd1574a543916e8c6R4349

>    - what are the units in which 'size' is measured?
>     It would be *very* nasty if a program that worked fine in a 32-bit
>     environment stopped working in a 64-bit environment because the
>     size was in bytes.
>
>
It is the same as min_heap_size and all other options that we that effect
the process heap, the internal word size,
erlang:system_info({wordsize,internal}).

>   - are size, kill, and error_logger the only suboptions?
>
>
at the moment yes.

>   - what is the performance cost of this?
>

If not enabled, it costs one branch per gc. If enabled, most of the
calculations have to be done anyways in order to allocate the new to space
for the collector, so not much extra calculations needed there either. It
also costs one machine word of memory per process.

>    (Presumably it gets checked only after a garbage collection,
>    prior to increasing the size of a process.)
>

It gets checked after what may be called the initialization phase of the
GC. The first thing the GC does it to calculate how large a to space is
needed for the GC to do it's job. After this calculation is done, the new
code checks to see if the total heap size during collection will exceed the
max heap size, if it does the appropriate action is taken before the
collector starts. If that action is that the process should be killed, the
collection does not start at all.

>   - I get twitchy when a parameter that can result in the death of
>    a process is defined as the sum of something that *is* under
>    the process's control and something that is *not*, and further,
>    how much of that uncontrolled stuff is counted is determined
>    by yet another flag that wasn't there in 18.2.  As it is, the
>    effect is that a process can be killed because *another* process
>    is doing something bad.  What, if anything, can be done to
>    prevent that needs to be explained.
>
>
Yes indeed. This is one of the reasons that I'm skeptical about the
usefulness of the option. It can be used to effectively protect against
heap growth caused by bad code in the process, for instance doing
binary_to_term on something unexpectedly large that someone on the internet
sent you. It may catch some cases when the message queue grows huge, but if
you have processes that may grow to have huge message queues you probably
want to use the new `off_heap` message queue data option anyways which
means that the messages in the queue are guaranteed to not be part of the
heap which in turn means that they will not be counted towards the
max_heap_size.

>  - Guidelines about how to choose sensible sizes would be valuable.
>    No, wait. They are *indispensable*.
>
> Of *course* this is useful, but it's starting to smell like
> pthread_attr_setstacksize(), where there is *no* way, given even
> complete knowledge of the intended behaviour of the code and
> sensible bounds on the amount of data to be processed, that
> you can determine a *safe* stack size by anything other than
> experiment.  You are entirely at the mercy of the implementation,
> and the C implementation has no mercy.
>
>
The main reason that setstacksize is so hard to do is that you don't want
to put the limit too high as you would then waste that memory. So you want
to put it as close as possible to you actual max stack size, but have to
make very sure that you don't give too little. The analogy to ulimit seems
more appropriate, as you can put the limit well above (one or two orders of
magnitude) what you expect the process to use and still catch it before the
VM is brought down due to out of memory.

> I'd personally be happier with something like ulimit(), where there
> is a hard limit (the one where you kill the process) and a soft
> limit (where you raise() an exception in the process to let it know
> there's going to be a problem soon).
>
>
We talked quite a lot about having the option of raising an exception when
the max heap size is reached but decided against it as it would mean that
all code that you write and libraries you use has to expect the exception.
So any old code that has a catch all would catch the max_heap_size
exception and possibly hide that the error happened. The semantics also
become very convoluted once we started looking at the details of how such
an exception might work.

I'm unsure how useful having a softlimit that sends a message would be. It
is however something that we may add in the future if a good use-case for
it is presented.

Lukas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20160428/fb536b91/attachment.htm>