[erlang-questions] widefinder update

Sun Oct 28 14:12:14 CET 2007

On 10/28/07, Thomas Lindgren <thomasl_erlang@REDACTED> wrote:
>
> --- Hynek Vychodil <vychodil.hynek@REDACTED> wrote:
>
> > Hello,
> > These results are interesting, but I demur to kind
> > of solution. Your
> > and Steve's approach have some caveats.
> >
> > 1/ File is read all in memory.
>
> Hynek,
>
> This is true for some versions, but not all. The
> 'block read' version reads the file in chunks.

It is still "sort of" true for the blockread and later versions,
since there is no flowcontrol, so when the file is already
cached in the OS the reading is faster than the processing and
all (almost) of the file will be in memory.
I am aware of this but have not bother with adding the flow
control yet.

>
> > 2/ Workers share resource (ets table) and it is
> > principally bad. If
> > you have more CPU consuming task and you must use
> > more CPU than as
> > current task to consume your input data bandwitch
> > and  simultaneously
> > more result extensive task, you fall in trouble
> > again.
>
> Note that the ets table in all proposals but one is
> managed by a single process. It is just used as a more
> efficient data structure. So the potential problem
> here is really if this process becomes a bottleneck.
>
> So, we have so far looked at two extremes:
>
> 1. Every worker maintains a local count, these are
> then merged into a global count.
>
> 2. A single process maintains the global count,
> workers send it updates.
>
> But if this becomes problematic, one could also
> combine the two by having 1 to N centralized counting
> processes to trade off the cost of merging versus the
> cost of incrementally sending all counts to a
> 'master'. (And one could batch the sending of updates
> too, come to think of it.)
>

I have not seen this as a problem yet since there is a relative
small number of concurrent workers. However as the number of
cores grow it may become a problem.
An alternative is that each worker has a ets tables for its counters and
sends its results to the central ets table on termination.

> > As conclusion I think, your solution scale bad for
> > both end. When you
> > have small amount of CPUs, you run out memory on
> > larger datasets.
>
> Not necessarily. With the block read solution, it
> doesn't seem like you run that risk.

See above.

>
> The use of file:read_file/1 just showed that you
> _could_ do fast I/O in Erlang, at a time when people
> thought Erlang file I/O was very slow indeed. Showing
> this was done by switching to a more suitable API
> call. But you can be even more sophisticated than
> that, e.g., by using file:pread.
>
> > When
> > you have more CPU, you fall in bottle neck of your
> > shared resource.
>
> Do you mean that the problem becomes I/O bound? Do
> note that all sufficiently fast solutions will
> ultimately be limited by a hardware bottleneck of some
> sort: CPU, I/O, network ...
>
> In this particular case, you could increase I/O
> performance by, say, striping the disk. And you can
> increase CPU performance by, say, distributing the
> work to multiple hosts/nodes (fairly straightforward
> with Erlang, by the way). But with these problems,
> even with infinite hardware you will eventually run
> into some sequential portion of the code, and that
> will limit the speedup as per Amdahl's Law.
>

Currently that sequential part is ~ 0.5s on my 1.66GHz
dual core laptop.
the part of the work that can be run in parallel takes
~2.254 s
so theoretically we would get
Cores  Real time  Speedup  Rel. speedup by doubling #cores
1         2.754	
2         1.627       1.693      1.693
4         1.064	      2.590      1.530
8         0.782	      3.523      1.360
16       0.641	     4.297      1.220
32       0.570	     4.828      1.123
64       0.535	     5.146      1.066
128     0.518	    5.321      1.034
256     0.509	    5.413      1.017

Which is not very good after 8 cores.

So I am now looking at making this a 'real' distributed solution instead.

/Anders