<div dir="auto">Stop using filelib functions. Use file:read_file_info and file:list_dir.<div dir="auto"><br></div><div dir="auto">Sergej</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Dec 10, 2016 9:29 AM, "Frank Muller" <<a href="mailto:frank.muller.erl@gmail.com">frank.muller.erl@gmail.com</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div style="font-family:UICTFontTextStyleBody;font-size:17px">Hi Stanislaw</div><div style="font-family:UICTFontTextStyleBody;font-size:17px"><br></div><div style="font-family:UICTFontTextStyleBody;font-size:17px">First, I don't care if I've to use documented/undocumented calls as long as I can achieve my goal: faster dir walking.</div><div style="font-family:UICTFontTextStyleBody;font-size:17px"><br></div><div style="font-family:UICTFontTextStyleBody;font-size:17px">And you're right, here is a detailed comparison with other scripting languages:</div><div style="font-family:UICTFontTextStyleBody;font-size:17px"><br></div><div style="font-family:UICTFontTextStyleBody;font-size:17px">In my /usr/share, there’s:</div><div style="font-family:UICTFontTextStyleBody;font-size:17px">2580 directories</div><div style="font-family:UICTFontTextStyleBody;font-size:17px">28953 files</div><div style="font-family:UICTFontTextStyleBody;font-size:17px"><br></div><div style="font-family:UICTFontTextStyleBody;font-size:17px">1. Erlang (no io:format/1, just recurse):</div><div style="font-family:UICTFontTextStyleBody;font-size:17px"><br></div><div style="font-family:UICTFontTextStyleBody;font-size:17px"><div>walk(Dir) -></div><div> {ok, Files} = file:list_dir(Dir),</div><div> walk(Dir, Files).</div><div><br></div><div>walk(Dir, [ Basename | Rest ]) -></div><div> Path = filename:join([ Dir, Basename ]),</div><div> case filelib:is_dir(Path) of</div><div> true -></div><div> walk(Path);</div><div> false -></div><div> %% io:format("~s~n", [Path]),</div><div> filelib:file_size(Path)</div><div> end,</div><div> walk(Dir, Rest);</div><div>walk(_, []) -></div><div> ok.</div></div><div style="font-family:UICTFontTextStyleBody;font-size:17px"><br></div><div style="font-family:UICTFontTextStyleBody;font-size:17px"><div>timer:tc(fun() -> directoy:walker("/usr/share") end).</div><div>{<a href="tel:4662361" dir="ltr" target="_blank">4662361</a>,ok}</div></div><div style="font-family:UICTFontTextStyleBody;font-size:17px"><br></div><div style="font-family:UICTFontTextStyleBody;font-size:17px">2. Python (this code even count the size of dir):</div><div style="font-family:UICTFontTextStyleBody;font-size:17px">From: <a href="http://stackoverflow.com/questions/1392413/calculating-a-directory-size-using-python" target="_blank">http://stackoverflow.<wbr>com/questions/1392413/<wbr>calculating-a-directory-size-<wbr>using-python</a></div><div style="font-family:UICTFontTextStyleBody;font-size:17px"><br></div><div style="font-family:UICTFontTextStyleBody;font-size:17px">import os</div><div style="font-family:UICTFontTextStyleBody;font-size:17px">def get_size(start_path = '.'):</div><div style="font-family:UICTFontTextStyleBody;font-size:17px"> total_size = 0</div><div style="font-family:UICTFontTextStyleBody;font-size:17px"> for dirpath, dirnames, filenames in os.walk(start_path):</div><div style="font-family:UICTFontTextStyleBody;font-size:17px"> for f in filenames:</div><div style="font-family:UICTFontTextStyleBody;font-size:17px"> fp = os.path.join(dirpath, f)</div><div style="font-family:UICTFontTextStyleBody;font-size:17px"> total_size += os.path.getsize(fp)</div><div style="font-family:UICTFontTextStyleBody;font-size:17px"> return total_size</div><div style="font-family:UICTFontTextStyleBody;font-size:17px"><br></div><div style="font-family:UICTFontTextStyleBody;font-size:17px">print get_size()</div><div style="font-family:UICTFontTextStyleBody;font-size:17px"><br></div><div style="font-family:UICTFontTextStyleBody;font-size:17px">$ cd /usr/share</div><div style="font-family:UICTFontTextStyleBody;font-size:17px">$ time dir_walker.py</div><div style="font-family:UICTFontTextStyleBody;font-size:17px"><div><a href="tel:432034130" dir="ltr" target="_blank">432034130</a></div><div>0.25 real 0.13 user 0.10 sys</div></div><div style="font-family:UICTFontTextStyleBody;font-size:17px"><br></div><div style="font-family:UICTFontTextStyleBody;font-size:17px"><div>2. Perl (same, count dir size)</div><div><a href="http://www.perlmonks.org/?node_id=168974" target="_blank">http://www.perlmonks.org/?<wbr>node_id=168974</a></div></div><div style="font-family:UICTFontTextStyleBody;font-size:17px"><br></div><div style="font-family:UICTFontTextStyleBody;font-size:17px"><div>use File::Find; </div><div>my $size = 0; </div><div>find(sub { $size += -s if -f $_ }, "/usr/share");</div></div><div style="font-family:UICTFontTextStyleBody;font-size:17px"><br></div><div style="font-family:UICTFontTextStyleBody;font-size:17px"><div>$ time perl <a href="http://dir_walker.pl" target="_blank">dir_walker.pl</a></div><div><a href="tel:432034130" dir="ltr" target="_blank">432034130</a></div><div>0.13 real 0.05 user 0.08 sys</div></div><div style="font-family:UICTFontTextStyleBody;font-size:17px"><br></div><div style="font-family:UICTFontTextStyleBody;font-size:17px">3. Ruby (same, count dir size):</div><div style="font-family:UICTFontTextStyleBody;font-size:17px"><br></div><div style="font-family:UICTFontTextStyleBody;font-size:17px"><div>def directory_size(path)</div><div> path << '/' unless path.end_with?('/')</div><div> raise RuntimeError, "#{path} is not a directory" unless File.directory?(path)</div><div> total_size = 0</div><div> Dir["#{path}**/*"].each do |f|</div><div> total_size += File.size(f) if File.file?(f) && File.size?(f)</div><div> end</div><div> total_size</div><div>end</div><div>puts directory_size '/usr/share’</div></div><div style="font-family:UICTFontTextStyleBody;font-size:17px"><br></div><div style="font-family:UICTFontTextStyleBody;font-size:17px"><div>$ time walker.rb</div><div><a href="tel:432028422" dir="ltr" target="_blank">432028422</a></div><div>0.21 real 0.09 user 0.11 sys</div></div><div style="font-family:UICTFontTextStyleBody;font-size:17px"><br></div><div style="font-family:UICTFontTextStyleBody;font-size:17px">4. Lua:</div><div style="font-family:UICTFontTextStyleBody;font-size:17px">From: <a href="http://lua-users.org/wiki/DirTreeIterator" target="_blank">http://lua-users.org/<wbr>wiki/DirTreeIterator</a></div><div style="font-family:UICTFontTextStyleBody;font-size:17px"><br></div><div style="font-family:UICTFontTextStyleBody;font-size:17px"><div>require "lfs"</div><div><br></div><div>function dirtree(dir)</div><div> assert(dir and dir ~= "", "directory parameter is missing or empty")</div><div> if string.sub(dir, -1) == "/" then</div><div> dir=string.sub(dir, 1, -2)</div><div> end</div><div><br></div><div> local function yieldtree(dir)</div><div> for entry in lfs.dir(dir) do</div><div> if entry ~= "." and entry ~= ".." then</div><div> entry=dir.."/"..entry</div><div><span class="m_480222179950307766Apple-tab-span" style="white-space:pre-wrap"> </span>local attr=lfs.attributes(entry)</div><div><span class="m_480222179950307766Apple-tab-span" style="white-space:pre-wrap"> </span>coroutine.yield(entry,attr)</div><div><span class="m_480222179950307766Apple-tab-span" style="white-space:pre-wrap"> </span>if attr.mode == "directory" then</div><div><span class="m_480222179950307766Apple-tab-span" style="white-space:pre-wrap"> </span> yieldtree(entry)</div><div><span class="m_480222179950307766Apple-tab-span" style="white-space:pre-wrap"> </span>end</div><div> end</div><div> end</div><div> end</div><div><br></div><div> return coroutine.wrap(function() yieldtree(dir) end)</div><div>end</div><div><br></div><div>for filename, attr in dirtree("/usr/share") do</div><div> print(attr.mode, filename)</div><div>end</div></div><div style="font-family:UICTFontTextStyleBody;font-size:17px"><br></div><div style="font-family:UICTFontTextStyleBody;font-size:17px">$ luarocks install luafilesystem</div><div style="font-family:UICTFontTextStyleBody;font-size:17px">$ time lua walker.lua > /dev/null</div><div style="font-family:UICTFontTextStyleBody;font-size:17px"><div>0.30 real 0.16 user 0.14 sys</div></div><div style="font-family:UICTFontTextStyleBody;font-size:17px"><br></div><div style="font-family:UICTFontTextStyleBody;font-size:17px">Do you need more?</div><div style="font-family:UICTFontTextStyleBody;font-size:17px"><br></div><div style="font-family:UICTFontTextStyleBody;font-size:17px">Thanks for you help.</div><div style="font-family:UICTFontTextStyleBody;font-size:17px">/Frank</div><div style="font-family:UICTFontTextStyleBody;font-size:17px"><div style="font-family:Helvetica;font-size:12px;word-wrap:break-word"><div style="word-wrap:break-word"><div style="word-wrap:break-word"><div style="word-wrap:break-word"><div style="word-wrap:break-word"><div style="margin:0px;line-height:normal;min-height:14px"><span></span></div></div></div></div></div></div><span style="font-family:Helvetica;font-size:12px"><div><span><br></span></div></span></div><div class="gmail_quote"><div>Le sam. 10 déc. 2016 à 00:51, Stanislaw Klekot <<a href="mailto:erlang.org@jarowit.net" target="_blank">erlang.org@jarowit.net</a>> a écrit :<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On Fri, Dec 09, 2016 at 11:15:58PM +0000, Frank Muller wrote:<br class="m_480222179950307766gmail_msg"><br>> I would like to improve the speed of my directory walker.<br class="m_480222179950307766gmail_msg"><br>><br class="m_480222179950307766gmail_msg"><br>> walk(Dir) -><br class="m_480222179950307766gmail_msg"><br>> {ok, Files} = prim_file:list_dir(Dir),<br class="m_480222179950307766gmail_msg"><br>> walk(Dir, Files).<br class="m_480222179950307766gmail_msg"><br><br class="m_480222179950307766gmail_msg"><br>Why prim_file:list_dir() instead of file:list_dir()? The former is<br class="m_480222179950307766gmail_msg"><br>undocumented internal function.<br class="m_480222179950307766gmail_msg"><br><br class="m_480222179950307766gmail_msg"><br>[...]<br class="m_480222179950307766gmail_msg"><br>> Compared to almost anything i found on the web, it’s still very slow:<br class="m_480222179950307766gmail_msg"><br>> > timer:tc(fun() -> dir:walk("/usr/share") end).<br class="m_480222179950307766gmail_msg"><br>> {4662361,ok}<br class="m_480222179950307766gmail_msg"><br><br class="m_480222179950307766gmail_msg"><br>What is it this "anything you found on the web"? And how did you run<br class="m_480222179950307766gmail_msg"><br>your comparisons? There's a large difference between first and second<br class="m_480222179950307766gmail_msg"><br>consequent run caused by OS' directory cache, and there's large<br class="m_480222179950307766gmail_msg"><br>difference between simply walking through the directory and walking with<br class="m_480222179950307766gmail_msg"><br>printing something to the screen for every file.<br class="m_480222179950307766gmail_msg"><br><br class="m_480222179950307766gmail_msg"><br>Then there's also your using filelib:is_dir() and then<br class="m_480222179950307766gmail_msg"><br>filelib:file_size(), which means two stat(2) calls, while you only need<br class="m_480222179950307766gmail_msg"><br>to do it once per file (file:read_file_info()).<br class="m_480222179950307766gmail_msg"><br><br class="m_480222179950307766gmail_msg"><br>--<br class="m_480222179950307766gmail_msg"><br>Stanislaw Klekot<br class="m_480222179950307766gmail_msg"><br></blockquote></div></div>
<br>______________________________<wbr>_________________<br>
erlang-questions mailing list<br>
<a href="mailto:erlang-questions@erlang.org">erlang-questions@erlang.org</a><br>
<a href="http://erlang.org/mailman/listinfo/erlang-questions" rel="noreferrer" target="_blank">http://erlang.org/mailman/<wbr>listinfo/erlang-questions</a><br>
<br></blockquote></div></div>