[erlang-questions] widefinder update

Hynek Vychodil <>
Sun Oct 28 18:32:52 CET 2007


Hi Anders,
I rewrote your code a little. I removed all remaining binary bindings
and it is noticeable faster again. Try wf_pichi3.erl.

It requires:
chunk_reader - http://www.erlang.org/pipermail/erlang-questions/attachments/20071028/16fc8af3/attachment-0002.obj
nlt_reader - http://www.erlang.org/pipermail/erlang-questions/attachments/20071028/16fc8af3/attachment-0003.obj
file_map_reduce -
http://www.erlang.org/pipermail/erlang-questions/attachments/20071028/207cb882/attachment.obj

Have a fun
--Hynek (Pichi) Vychodil

On 10/26/07, Anders Nygren <> wrote:
> On 10/23/07, Anders Nygren <> wrote:
> > To summarize my progress on the widefinder problem
> > A few days ago I started with Steve Vinoski's tbray16.erl
> > As a baseline on my 1.66 GHz dual core Centrino
> > laptop, Linux,
> > tbray16
> > real    0m7.067s
> > user    0m12.377s
> > sys     0m0.584s
> >
> > I removed the dict used for the shift table,
> > and changed the min_heap_size.
> > That gave
> > real    0m2.713s
> > user    0m4.168s
> > sys     0m0.412s
> >
> > (see tbray_tuple.erl and wfbm4_tuple.erl)
> > Steve reported that it ran in ~1.9 s on his 8 core server.
> >
> > Then I removed the dicts that were used for collecting the
> > matches and used ets instead, and got some improvement
> > on my dual core laptop.
> > real    0m2.220s
> > user    0m3.252s
> > sys     0m0.344s
> >
> > (see tbray_ets.erl and wfbm4_ets.erl)
> >
> > Interestingly Steve reported that it actually performed
> > worse on his 8 core server.
> >
> > These versions all read the whole file into memory at the start.
> > On my laptop that takes ~400ms (when the file is already cached
> > in the OS).
> >
> > So I changed it to read the file in chucks and spawn the worker
> > after each chunk is read.
> >
> > tbray_blockread with 4 processes
> > real    0m1.992s
> > user    0m3.176s
> > sys     0m0.420s
> >
> > (see tbray_blockread.erl and wfbm4_ets.erl)
> >
> > Running it in the erlang shell it takes ~1.8s.
> >
> > Just starting and stopping the VM takes
> > time erl -pa ../../bfile/ebin/ -smp -noshell -run init stop
> >
> > real    0m1.229s
> > user    0m0.208s
> > sys     0m0.020s
> >
> > It would be interesting to see how it runs on other machines,
> > with more cores.
> >
> > /Anders
> >
> >
>
> So I have a new version that I think will break the 1 second barrier
> on Steve's 8-core
> box.
> The best I have seen on my dual core laptop is
> real:  0m1.689s
> user: 0m2.2756s
> sys:  0m0.396s
>
> The changes relative my latest posted tbray_blockread.erl are
> - reading the file is in a separate process
> - never bind variables to sub binaries unless absolutely necessary
> - only have a limited number of worker processes at any time
>
> One lesson from this exercise is that it can be bad for performance,
> the result of changing the code to not bind variables to sub binaries
> can be seen in the garbage collection statistics.
>
> wfinder, (an unreleased version that ran in 1.050s on Steve's 8-core)
> garbage collections: 46302
> words reclaimed: 501768347
>
> wfinder1
> garbage collections: 13917
> words reclaimed: 384561741
>
> /Anders
>
> _______________________________________________
> erlang-questions mailing list
> 
> http://www.erlang.org/mailman/listinfo/erlang-questions
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: wf_pichi3.erl
Type: application/octet-stream
Size: 3858 bytes
Desc: not available
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20071028/814bacad/attachment.obj>


More information about the erlang-questions mailing list