[erlang-questions] Speeding up text file I/O

Dmitrii 'Mamut' Dimandt dmitriid@REDACTED
Sat Jan 19 08:22:30 CET 2008


Per Gustafsson wrote:
> Dmitrii 'Mamut' Dimandt wrote:
>> Per Gustafsson wrote:
>>> Christian S wrote:
>>>> Do you know if this has been benchmarked against the more 
>>>> attractive code
>>>> that R12B makes more efficient? Because those functions make me 
>>>> want to
>>>> change profession to something where you get to hurt people.
>>>>
>>>
>>>
>>>
>>> I found that the following function is about 10% faster then the 
>>> unrolled function when using BEAM R12B:
>>>
>>> find_8(Buffer, Char, Pos) ->
>>>   case Buffer of
>>>     << Char, _/bits >> -> Pos;
>>>     << _, Rest/bits >> ->
>>>       find_8(Rest, Char, Pos+1);
>>>     _ ->
>>>       not_found
>>>   end.
>>>
>>> It might depend a little on the input though and when both functions 
>>> were native compiled there was no major difference between them.
>>>
>>> Per
>>
>> Here's the reply:
>>
>> """start quote""
>>
>> This is incorrect. My unrolled function is twice as fast as the 
>> one-liner when it is compiled with the native flag (I didn't 
>> translate the flag part - my bad, D.) The number of lines has been 
>> carefully measured through various tests.
>>
>> Without the native flag there shouldn't be a significant difference 
>> in speed. That's because unrolling only helps the machine code for 
>> superpiplined processors. Since the VM isn't superpiplined there's 
>> not much point in unrolling the loop
>>
>> The only simplification that can be done for R12 is:
>>
>> find_8( Buffer, Char, Pos ) ->
>> case Buffer of
>> << Char:8, _/bytes >> -> Pos;
>> << _:1/bytes, Char:8, _/bytes >> -> Pos + 1;
>> << _:2/bytes, Char:8, _/bytes >> -> Pos + 2;
>> ...
>> << _:32/bytes, Rest/bytes >> -> find_8( Rest, Char, Pos + 32 );
>> _ -> not_found
>> end.
>>
>> That's it.
>>
>>
>> """end quote"""
>>
>> Sorry for missing the "compile with the native flag" in my translation
>
> I've been away for a while so sorry for being so slow to respond.
>
> It is true that the unrolled version is faster when compiling with the 
> native flag. It does not however have to do with processor 
> architecture but with a performance problem for the new binary 
> optimizations when compiling to native code.
>
> I've fixed this problem in the HiPE-repository and the fix will be 
> included in the next release of Erlang/OTP. With this fix the shorter 
> version of the program is significantly faster than the unrolled one 
> (about 50 % faster in that specific part of the program and the whole 
> program becomes about 20-25 % faster)
>
> Per
>

Nice!

Actually, the author of the blog post I translated said he's going to 
propose an EEP that includes both the fast input adn the fast output 
function that could be included in Erlang. He said that if the functions 
he's going to propose are included on the low level, you could get even 
faster read/write speeds. Though 20 MB/s is already fine by me :)



More information about the erlang-questions mailing list