binaries vs lists

Wed Nov 12 10:13:01 CET 2003

About stack space vs heap space.

There is always room on the stack in Erlang, at least compared to the 
heap. They lie in the same memory block, grow towards each other, and 
when they collide the garbage collector gets them a larger memory block 
to grow in. So the only way to run out of stack or heap space is that 
the virtual machine runs out of memory. One single Erlang process can 
not run out of memory alone.

If you have a large stack, it might slow down the garbage collector. It 
it rarely a problem, but I think this is why:

When garbage collecting, the garbage collector takes a new clean memory 
block and copies all live data from the old one to the new. It finds all 
live data by starting from the root set, and the root set is chiefly the 
stack. The grabage collector scans the stack linearily and when it finds 
a term that is complex, i.e lies on the heap, it traverses the term and 
copies it over to the new heap. So, if there is much data on the stack 
that is not complex terms, for example integers and atoms, there is much 
of the stack that is scanned in vain - wasting time.

An old truth in the Erlang community is that you always should build a 
list reversed and then reverse it at the end - lists:reverse() is a BIF, 
it is fast.

Benchmarks I have made indicates that this truth is not quite as true 
nowdays. Building the list on the stack often wins. Use the solution 
that gives most readable code, and change if it becomes a problem. There 
are certainly extreme cases that are bad for any strategy.

Some warnings then.

Appending to a list is expensive. The whole list must be traversed to 
find the tail. If you do this in a tight loop it is absolutely better to 
build the list backwards and reverse at the end. You will go from 
quadratic time complexity down to linear.

Concatenating binaries is really expensive. To create the result binary 
all data must be copied. If you append data to a binary by concatenating 
it is absolutely better to build a list of bytes (it can also contain 
sublists or binaries to any depth) and do list_to_binary/1 at the end. 
This is also an improvement from quadratic to linear time complexity (at 
least!).

-- 
/ Raimo Niskanen, Erlang/OTP, Ericsson AB

Bengt Kleberg wrote:
> Serge Aleynikov wrote:
> 
>> Bengt Kleberg wrote:
>>
>>> Serge Aleynikov wrote:
>>>
> ...deleted
> 
>>>> I have an Erlang TCP client that processes a binary stream which 
>>>> needs to be post-processed by removing escaped bytes.  Let's say, 
>>>> that byte 16$FF is escaped as <<16#FE, 16$01>>, and the 16#FF value 
>>>> is used as a message separator.  The variable hex messages are 
>>>> within 512 bytes each.
>>>>
>>>
>>> if i understood the 512 bytes statement correctly, it is ok to build 
>>> the list on the stack, thus avoiding lists:reverse/1 in solution 1.
>>
>>
>>
>> Do I have a control over where the list is being built (heap/stack)?  
>> Or this is purely dependent on the list size?  If so, are you implying 
>> that I should append items at the end instead of doing it in front, 
>> and using lists:reverse/1 ?
>>
> if you build a list in an accumulator this is (usually :-) done using 
> heap space:
> fn([], Acc) ->
>     lists:reverse(Acc);
> fn([H | T], Acc) ->
>     fn(T, [H | Acc]).
> 
> 
> if you build a list using 'cons' this is ''usually'' done using stack 
> space:
> fn([]) ->
>     [];
> fn([H | T]) ->
>     [H | fn(T)]).
> 
> you can crash your process if it uses too much space. stack space is 
> usually smaller than heap space. however, in this case you seemed to say 
> that there was a maximum of 512 bytes to a list. which usually is 
> available on your stack.
> 
> 
> bengt
>