[erlang-questions] interesting I/O bottleneck

Tony Rogvall <>
Tue Jun 1 16:59:27 CEST 2010


Interesting stuff ;-)

I tried some tricks, and I was a bit surprised.

Replace 
	"when is_integer(C), C > 0, C =< 255"
with 
	"when (C band -256) =:= 0, C =/= 0"

This will speed up the function a bit more than 50% ! 
Without the check for zero, the code is even a bit faster.

The code check that the characters is in range 1-255, avoid to pass zeros to the C driver layer.

side note:
 One problem with this code is that it accepts atoms on form 'hello\0world'. Either the code
 must be updated to check the atoms or the code may be speed up a bit further ;-)
 Test this:
	file:write_file("hello\0world", <<1,2,3,4>>).
  and then:
	file:write_file('hello\0world', <<1,2,3,4>>).	
 The first case generates a badarg while the second case generates a file "hello" with the content <<1,2,3,4>>.


An other approach to speed it up is to check for the common case without building a new string:

file_name(N) ->
   try
       case is_flat_file_name(N) of
	   true -> N;
	   false -> file_name_1(N)
       end
   catch Reason ->
       {error, Reason}
   end.

This will speed up the case when flat strings are passed to the file_name function more than 150%.

The reason without peek in the code is that "C band -256" generates code that will be executed without
any functions calls inside the vm.  Maybe something for the compiler writer to think about ?

/Tony




On 1 jun 2010, at 15.31, James Hague wrote:

> I've got an application which reads through directory trees, compares file
> dates, sorts lists of files, that sort of thing. I'm not loading files so
> much as calling file:list_dir and file:read_file_info. It's slower than I
> expected it to be, so I ran it through eprof. The result is that over 55% of
> the time is spent in file:file_name. Even functions I expected to be
> slightly expensive, like building a dict of all the filenames in a tree, are
> irrelevant in comparison.
> 
> file:file_name looks like this:
> 
> file_name(N) ->
>    try
>        file_name_1(N)
>    catch Reason ->
>        {error, Reason}
>    end.
> 
> file_name_1([C|T]) when is_integer(C), C > 0, C =< 255 ->
>    [C|file_name_1(T)];
> file_name_1([H|T]) ->
>    file_name_1(H) ++ file_name_1(T);
> file_name_1([]) ->
>    [];
> file_name_1(N) when is_atom(N) ->
>    atom_to_list(N);
> file_name_1(_) ->
>    throw(badarg).
> 
> I didn't realize until looking at the source that a filename can be a deep
> list of characters and atoms. If it was an iolist, then the entire function
> could just go away, but that wouldn't handle atoms. As it stands, this
> function is surprisingly expensive.



More information about the erlang-questions mailing list