[erlang-questions] interesting I/O bottleneck
Tony Rogvall
tony@REDACTED
Tue Jun 1 16:59:27 CEST 2010
Interesting stuff ;-)
I tried some tricks, and I was a bit surprised.
Replace
"when is_integer(C), C > 0, C =< 255"
with
"when (C band -256) =:= 0, C =/= 0"
This will speed up the function a bit more than 50% !
Without the check for zero, the code is even a bit faster.
The code check that the characters is in range 1-255, avoid to pass zeros to the C driver layer.
side note:
One problem with this code is that it accepts atoms on form 'hello\0world'. Either the code
must be updated to check the atoms or the code may be speed up a bit further ;-)
Test this:
file:write_file("hello\0world", <<1,2,3,4>>).
and then:
file:write_file('hello\0world', <<1,2,3,4>>).
The first case generates a badarg while the second case generates a file "hello" with the content <<1,2,3,4>>.
An other approach to speed it up is to check for the common case without building a new string:
file_name(N) ->
try
case is_flat_file_name(N) of
true -> N;
false -> file_name_1(N)
end
catch Reason ->
{error, Reason}
end.
This will speed up the case when flat strings are passed to the file_name function more than 150%.
The reason without peek in the code is that "C band -256" generates code that will be executed without
any functions calls inside the vm. Maybe something for the compiler writer to think about ?
/Tony
On 1 jun 2010, at 15.31, James Hague wrote:
> I've got an application which reads through directory trees, compares file
> dates, sorts lists of files, that sort of thing. I'm not loading files so
> much as calling file:list_dir and file:read_file_info. It's slower than I
> expected it to be, so I ran it through eprof. The result is that over 55% of
> the time is spent in file:file_name. Even functions I expected to be
> slightly expensive, like building a dict of all the filenames in a tree, are
> irrelevant in comparison.
>
> file:file_name looks like this:
>
> file_name(N) ->
> try
> file_name_1(N)
> catch Reason ->
> {error, Reason}
> end.
>
> file_name_1([C|T]) when is_integer(C), C > 0, C =< 255 ->
> [C|file_name_1(T)];
> file_name_1([H|T]) ->
> file_name_1(H) ++ file_name_1(T);
> file_name_1([]) ->
> [];
> file_name_1(N) when is_atom(N) ->
> atom_to_list(N);
> file_name_1(_) ->
> throw(badarg).
>
> I didn't realize until looking at the source that a filename can be a deep
> list of characters and atoms. If it was an iolist, then the entire function
> could just go away, but that wouldn't handle atoms. As it stands, this
> function is surprisingly expensive.
More information about the erlang-questions
mailing list