binaries vs lists
Raimo Niskanen
raimo@REDACTED
Wed Nov 12 10:13:01 CET 2003
About stack space vs heap space.
There is always room on the stack in Erlang, at least compared to the
heap. They lie in the same memory block, grow towards each other, and
when they collide the garbage collector gets them a larger memory block
to grow in. So the only way to run out of stack or heap space is that
the virtual machine runs out of memory. One single Erlang process can
not run out of memory alone.
If you have a large stack, it might slow down the garbage collector. It
it rarely a problem, but I think this is why:
When garbage collecting, the garbage collector takes a new clean memory
block and copies all live data from the old one to the new. It finds all
live data by starting from the root set, and the root set is chiefly the
stack. The grabage collector scans the stack linearily and when it finds
a term that is complex, i.e lies on the heap, it traverses the term and
copies it over to the new heap. So, if there is much data on the stack
that is not complex terms, for example integers and atoms, there is much
of the stack that is scanned in vain - wasting time.
An old truth in the Erlang community is that you always should build a
list reversed and then reverse it at the end - lists:reverse() is a BIF,
it is fast.
Benchmarks I have made indicates that this truth is not quite as true
nowdays. Building the list on the stack often wins. Use the solution
that gives most readable code, and change if it becomes a problem. There
are certainly extreme cases that are bad for any strategy.
Some warnings then.
Appending to a list is expensive. The whole list must be traversed to
find the tail. If you do this in a tight loop it is absolutely better to
build the list backwards and reverse at the end. You will go from
quadratic time complexity down to linear.
Concatenating binaries is really expensive. To create the result binary
all data must be copied. If you append data to a binary by concatenating
it is absolutely better to build a list of bytes (it can also contain
sublists or binaries to any depth) and do list_to_binary/1 at the end.
This is also an improvement from quadratic to linear time complexity (at
least!).
--
/ Raimo Niskanen, Erlang/OTP, Ericsson AB
Bengt Kleberg wrote:
> Serge Aleynikov wrote:
>
>> Bengt Kleberg wrote:
>>
>>> Serge Aleynikov wrote:
>>>
> ...deleted
>
>>>> I have an Erlang TCP client that processes a binary stream which
>>>> needs to be post-processed by removing escaped bytes. Let's say,
>>>> that byte 16$FF is escaped as <<16#FE, 16$01>>, and the 16#FF value
>>>> is used as a message separator. The variable hex messages are
>>>> within 512 bytes each.
>>>>
>>>
>>> if i understood the 512 bytes statement correctly, it is ok to build
>>> the list on the stack, thus avoiding lists:reverse/1 in solution 1.
>>
>>
>>
>> Do I have a control over where the list is being built (heap/stack)?
>> Or this is purely dependent on the list size? If so, are you implying
>> that I should append items at the end instead of doing it in front,
>> and using lists:reverse/1 ?
>>
> if you build a list in an accumulator this is (usually :-) done using
> heap space:
> fn([], Acc) ->
> lists:reverse(Acc);
> fn([H | T], Acc) ->
> fn(T, [H | Acc]).
>
>
> if you build a list using 'cons' this is ''usually'' done using stack
> space:
> fn([]) ->
> [];
> fn([H | T]) ->
> [H | fn(T)]).
>
> you can crash your process if it uses too much space. stack space is
> usually smaller than heap space. however, in this case you seemed to say
> that there was a maximum of 512 bytes to a list. which usually is
> available on your stack.
>
>
> bengt
>
More information about the erlang-questions
mailing list