[erlang-questions] widefinder update

Wed Oct 24 00:11:12 CEST 2007

On 10/23/07, Steve Vinoski <vinoski@REDACTED> wrote:
> On 10/23/07, Anders Nygren <anders.nygren@REDACTED> wrote:
> > To summarize my progress on the widefinder problem
> > A few days ago I started with Steve Vinoski's tbray16.erl
> > As a baseline on my 1.66 GHz dual core Centrino
> > laptop, Linux,
> > tbray16
> > real    0m7.067s
> > user     0m12.377s
> > sys     0m0.584s
>
> Anders, thanks for collecting and posting these. I've just performed a set
> of new timings for all of them, as listed below. For each, I just ran this
> command:
>
> time erl -smp -noshell -run <test_case> main o1000k.ap >/dev/null
>
> where "<test_case>" is the name of the tbray test case file. All were looped
> ten times, and I took the best timing for each. All tests were done on my
> 8-core 2.33 GHz dual Intel Xeon with 2 GB RAM Linux box, in a local
> (non-NFS) directory.
>

I don't keep track of the finer details of different CPUs, but I have
a vague memory of that the 8 core Xeon is really 2 4 core CPUs
on one chip, is that correct?

The reason I am asking is that I can not figure out why Your
measurements have shorter real times than mine, but more
than twice the user time.

Also it does not seems to scale so well up to 8 cores.
Steve's best time is 0m1.546s an mine was 0m1.992s.

Steve, can You also do some tests on tbray_blockread using
different numbers of worker processes. Since smaller block
size means that we start using all the cores earlier.

> My original tbray16 runs in
>
>
> real    0m3.162s
> user    0m16.513s
> sys     0m1.762s
> > I removed the dict used for the shift table,
> > and changed the min_heap_size.
> > That gave
> > real    0m2.713s
> > user    0m4.168s
> > sys     0m0.412s
> >
> > (see tbray_tuple.erl and wfbm4_tuple.erl)
> > Steve reported that it ran in ~1.9 s on his 8 core server.
>
>
> What I get for tbray_tuple is:
>
> real    0m2.285s
> user    0m8.615s
> sys     0m0.988s
>
>
> > Then I removed the dicts that were used for collecting the
> > matches and used ets instead, and got some improvement
> > on my dual core laptop.
> > real    0m2.220s
> > user    0m3.252s
> > sys     0m0.344s
> >
> > (see tbray_ets.erl and wfbm4_ets.erl)
> >
> > Interestingly Steve reported that it actually performed
> > worse on his 8 core server.
>
> The discrepancy seems to be gone. With your new file that you supplied in
> your message, the official timing for tbray_ets on the 8-core is:
>
>
> real    0m1.868s
> user    0m7.416s
> sys     0m0.509s
>
>
> > These versions all read the whole file into memory at the start.
> > On my laptop that takes ~400ms (when the file is already cached
> > in the OS).
> >
> > So I changed it to read the file in chucks and spawn the worker
> > after each chunk is read.
> >
> > tbray_blockread with 4 processes
> > real    0m1.992s
> > user    0m3.176s
> > sys     0m0.420s
> >
> > (see tbray_blockread.erl and wfbm4_ets.erl)
> >
> > Running it in the erlang shell it takes ~1.8s.
>
>
> Interestingly, some of my earlier attempts tried to overlap block reads and
> worker spawning, but the results were always worse, so that's why I went to
> reading in the whole file. This blockread approach may very well be The
> Ultimate Wide Finder.
>
> Timing for tbray_blockread on the 8-core:
>
> real    0m1.546s
> user    0m7.337s
> sys     0m0.662s
>
>
> > Just starting and stopping the VM takes
> > time erl -pa ../../bfile/ebin/ -smp -noshell -run init stop
> >
> > real    0m1.229s
> > user    0m0.208s
> > sys     0m0.020s
>
> On the 8-core this takes:
>
> real     0m1.093s
> user    0m0.072s
> sys     0m0.012s
>
> > It would be interesting to see how it runs on other machines,
> > with more cores.
>
> Tim Bray is traveling at the moment, but he told me by email that he hopes
> to get back to measuring these on the T5120 next week.
>
> thanks,
> --steve