[erlang-questions] widefinder update
Sun Oct 28 08:21:24 CET 2007
These results are interesting, but I demur to kind of solution. Your
and Steve's approach have some caveats.
1/ File is read all in memory. When workers are so much slow, it can
happen principally. 200MB of Tim Bray's data is not problem on your
8CPU box, but what if file will be bigger. What about 1GB? No problem?
And 1TB? Still no problem? I know, that current i/o HW (and you don't
flush caches between measures and workers on 8CPU box are still fast
enough) can't provide data in performance causing problem for this
simple Tim Bray's exercise, but it is principally problem.
2/ Workers share resource (ets table) and it is principally bad. If
you have more CPU consuming task and you must use more CPU than as
current task to consume your input data bandwitch and simultaneously
more result extensive task, you fall in trouble again.
As conclusion I think, your solution scale bad for both end. When you
have small amount of CPUs, you run out memory on larger datasets. When
you have more CPU, you fall in bottle neck of your shared resource. Of
course, Tim Bray's exercise is more CPU consuming than result
extensive and you don't fall to bottle neck trap and file reading on
current HW must be sequential and i/o performance is so bad, thus 8
CPU is enough to consume data faster than i/o can produce and you
don't run out of memory. But I think Tim Bray's exercise is not about
tuning solution for this one task, I think Tim Bray's exercise is
about multicore crisis and principal solutions.
--Hynek (Pichi) Vychodil
On 10/26/07, Anders Nygren <> wrote:
> On 10/23/07, Anders Nygren <> wrote:
> > To summarize my progress on the widefinder problem
> > A few days ago I started with Steve Vinoski's tbray16.erl
> > As a baseline on my 1.66 GHz dual core Centrino
> > laptop, Linux,
> > tbray16
> > real 0m7.067s
> > user 0m12.377s
> > sys 0m0.584s
> > I removed the dict used for the shift table,
> > and changed the min_heap_size.
> > That gave
> > real 0m2.713s
> > user 0m4.168s
> > sys 0m0.412s
> > (see tbray_tuple.erl and wfbm4_tuple.erl)
> > Steve reported that it ran in ~1.9 s on his 8 core server.
> > Then I removed the dicts that were used for collecting the
> > matches and used ets instead, and got some improvement
> > on my dual core laptop.
> > real 0m2.220s
> > user 0m3.252s
> > sys 0m0.344s
> > (see tbray_ets.erl and wfbm4_ets.erl)
> > Interestingly Steve reported that it actually performed
> > worse on his 8 core server.
> > These versions all read the whole file into memory at the start.
> > On my laptop that takes ~400ms (when the file is already cached
> > in the OS).
> > So I changed it to read the file in chucks and spawn the worker
> > after each chunk is read.
> > tbray_blockread with 4 processes
> > real 0m1.992s
> > user 0m3.176s
> > sys 0m0.420s
> > (see tbray_blockread.erl and wfbm4_ets.erl)
> > Running it in the erlang shell it takes ~1.8s.
> > Just starting and stopping the VM takes
> > time erl -pa ../../bfile/ebin/ -smp -noshell -run init stop
> > real 0m1.229s
> > user 0m0.208s
> > sys 0m0.020s
> > It would be interesting to see how it runs on other machines,
> > with more cores.
> > /Anders
> So I have a new version that I think will break the 1 second barrier
> on Steve's 8-core
> The best I have seen on my dual core laptop is
> real: 0m1.689s
> user: 0m2.2756s
> sys: 0m0.396s
> The changes relative my latest posted tbray_blockread.erl are
> - reading the file is in a separate process
> - never bind variables to sub binaries unless absolutely necessary
> - only have a limited number of worker processes at any time
> One lesson from this exercise is that it can be bad for performance,
> the result of changing the code to not bind variables to sub binaries
> can be seen in the garbage collection statistics.
> wfinder, (an unreleased version that ran in 1.050s on Steve's 8-core)
> garbage collections: 46302
> words reclaimed: 501768347
> garbage collections: 13917
> words reclaimed: 384561741
> erlang-questions mailing list
More information about the erlang-questions