Author:
Richard A. O'Keefe <ok(at)cs(dot)otago(dot)ac(dot)nz>
Status:
Draft
Type:
Standards Track
Created:
07-Feb-2013
Erlang-Version:
OTP_R16A

EEP 42: setrlimit(2) analogue for Erlang #

Abstract #

A new function erlang:set_process_info_limit/2 is added, allowing a process to set limits on its own memory use.

Specification #

erlang:set_process_info_limit(Item, Limit :: integer()) ->
    Old_Limit :: integer()

An Old_Limit of 0 means that no limit has been set. A Limit of 0 means that no limit is to be set.

The Items that can be set are

  • memory, the number of bytes that may be used for stack, heap, and internal structures.

  • memory_words, the same value as memory, but expressed in words to be consistent with heap_size, total_heap_size, and stack_size in erlang:process_info/[1,2]. Those functions should also be revised to accept this Item.

  • message_queue_len, the number of unprocessed messages. (Aside.) The documentation refers to message_queue_len but the system generates and recognises message_queue_len. (End aside.)

The values that are actually set may be less than Limit as long as any physically realisable system in the node as configured cannot exceed the value used.

A non-zero memory limit is checked after each garbage collection or other memory restructuring; setting the memory limit to a non-zero value less than the current process_info(self(), 'memory') forces an immediate garbage collection.

If the memory required by a process exceeds its limit, the process is exited with reason memory.

A non-zero message queue length limit is checked whenever the message queue length is about to be incremented, or when such a limit is set.

If the message queue length limit is exceeded, the process is exited with reason message_queue_len.

Motivation #

It is currently possible for an Erlang process to grab memory without limit and eventually take down the whole node. This problem is frequently reported in the Erlang mailing list.

It is also possible for the message queue of an Erlang process to grow without limit. This problem too is reported often enough to be recognised.

This is an EEP rather than a library change request because it requires low level runtime system changes to support it. For example, the limits have to be stored somewhere, suggesting a change in the data structure holding information about a process. The limits have to be checked, meaning that changes to the garbage collector and the message sending core are required. The limits have to be enforced, meaning that two new exit reasons have to be supported. And the new exit reasons and the new function have to be documented, requiring changes to more than one document.

The need is long-standing, and at least the presence of this EEP may provoke discussion leading to some resolution.

Rationale #

The function that sets a limit should have a name based in the name of the function that reports current use. That function is process_info, hence the name I’ve chosen here is set_process_info_limit.

The erlang:process_info/[1,2] functions can report on any Erlang process. But it is clearly dangerous to let a process set another process’s limits. All we actually need is for a process to be able to set its own limits in a startup phase. That could be done by adding {'memory',Size} and {'message_queue_len',Count} options to spawn_opt/[2,3]. However,

spawn_opt(Fun, [{memory,128*1024}])

can be mimicked by

spawn(fun () ->
    erlang:set_process_info_limit(memory, 128*1024),
    Fun()
end)

and set_process_info_limit/2 allows a process to set different limits at different times. If you are trying to protect the system from malice, setting limits in spawn_opt/[2,3] is the way to go. If you are trying to protect the system from accident, letting a process set its own limits is the way to go.

The names for the Items to be set clearly must be identical to the names used for the same thing elsewhere.

The memory_words item is introduced because in a world where some of my programs run 32-bit and some run 64-bit it is just too confusing to count bytes, especially as stack size and so on are measured in words, not bytes.

I considered allowing total_heap_size and stack_size and so on to be given their own limits, but on finding that total_heap_size “currently includes the stack of the process”, decided that it was “currently” a bad idea.

I would have liked to allow setting a limit on the number of reductions, so that a process that doesn’t intend to live forever can ensure its own eventual death. However, the documentation warns that erlang:bump_reductions/1 “might be removed”, so presumably reduction counting per se might well disappear.

The point of letting the value actually set for a limit to be smaller is to allow an implementation to use only values that will fit in a size_t, so that low level code does not need to deal with bignums. On a POSIX-compliant operating system, an Erlang implementation may use the RLIMIT_DATA value for the UNIX process if the memory limit is bigger, and may use RLIMIT_DATA divided by the minimum size of a message for the message queue length. Or it might use other appropriate system limits.

The biggest question is “what happens if a limit is exceeded?”

For memory, we could exit with system_limit as reason, but that wouldn’t make clear what limit had been exceeded. It seems advisable to introduce a new reason, and making the reason the same as the name of the limit seems the least confusing approach.

For message queue length, I would prefer it if the process that sends the message were the one to get a run time error. However, the Erlang documentation guarantees that sending a message to a pid always succeeds, whether the process is live or dead, and I don’t want to change too much. A message queue might build up because some process(es) is(are) sending junk; we would prefer exiting junk senders to exiting junk receivers. However, if the junk receiver isn’t cleaning out the junk, that junk is never going away, so it’s always going to be costing time to skip over as well as memory to store. That argument applies to another option that I considered: just discarding messages that would make the queue too long. The Erlang Way is to let subsystems that get into trouble crash. So let it be.

Backwards Compatibility #

The new function is not exported by default so it cannot be called accidentally.

Reference Implementation #

None in this draft.

Copyright #

This document has been placed in the public domain.