[erlang-questions] Speeding up text file I/O

Fri Jan 18 18:26:06 CET 2008

Dmitrii 'Mamut' Dimandt wrote:
> Per Gustafsson wrote:
>> Christian S wrote:
>>> Do you know if this has been benchmarked against the more attractive 
>>> code
>>> that R12B makes more efficient? Because those functions make me want to
>>> change profession to something where you get to hurt people.
>>>
>>
>>
>>
>> I found that the following function is about 10% faster then the 
>> unrolled function when using BEAM R12B:
>>
>> find_8(Buffer, Char, Pos) ->
>>   case Buffer of
>>     << Char, _/bits >> -> Pos;
>>     << _, Rest/bits >> ->
>>       find_8(Rest, Char, Pos+1);
>>     _ ->
>>       not_found
>>   end.
>>
>> It might depend a little on the input though and when both functions 
>> were native compiled there was no major difference between them.
>>
>> Per
> 
> Here's the reply:
> 
> """start quote""
> 
> This is incorrect. My unrolled function is twice as fast as the 
> one-liner when it is compiled with the native flag (I didn't translate 
> the flag part - my bad, D.) The number of lines has been carefully 
> measured through various tests.
> 
> Without the native flag there shouldn't be a significant difference in 
> speed. That's because unrolling only helps the machine code for 
> superpiplined processors. Since the VM isn't superpiplined there's not 
> much point in unrolling the loop
> 
> The only simplification that can be done for R12 is:
> 
> find_8( Buffer, Char, Pos ) ->
> case Buffer of
> << Char:8, _/bytes >> -> Pos;
> << _:1/bytes, Char:8, _/bytes >> -> Pos + 1;
> << _:2/bytes, Char:8, _/bytes >> -> Pos + 2;
> ...
> << _:32/bytes, Rest/bytes >> -> find_8( Rest, Char, Pos + 32 );
> _ -> not_found
> end.
> 
> That's it.
> 
> 
> """end quote"""
> 
> Sorry for missing the "compile with the native flag" in my translation

I've been away for a while so sorry for being so slow to respond.

It is true that the unrolled version is faster when compiling with the 
native flag. It does not however have to do with processor architecture 
but with a performance problem for the new binary optimizations when 
compiling to native code.

I've fixed this problem in the HiPE-repository and the fix will be 
included in the next release of Erlang/OTP. With this fix the shorter 
version of the program is significantly faster than the unrolled one 
(about 50 % faster in that specific part of the program and the whole 
program becomes about 20-25 % faster)

Per