[erlang-questions] widefinder update

Tue Nov 6 04:50:58 CET 2007

Hynek Vychodil wrote:
> On 10/29/07, David Hopwood <david.hopwood@REDACTED> wrote:
>> Anders Nygren wrote:
>>> Currently that sequential part is ~ 0.5s on my 1.66GHz
>>> dual core laptop. the part of the work that can be run in parallel
>>> takes ~2.254 s so theoretically we would get
>>> Cores  Real time  Speedup  Rel. speedup by doubling #cores
>> [...]
>>>   8      0.782     3.523      1.360
>> [...]
>>> 256      0.509     5.413      1.017
>>>
>>> Which is not very good after 8 cores.
>> 0.5 s is not very long, in human terms. For me to be convinced that
>> there is any need for further optimization, the problem would have
>> to be scaled to a point where the total run-time is something that
>> a human might conceivably get impatient waiting for. At that point,
>> the sequential part would likely be a smaller proportion of the
>> run-time anyway.
>
> It's bad point of view. What if you would like do it 100 times?

Then you would only load the Erlang VM once, not 100 times. And cache
effects would also mean that the runtime would not be multiplied by 100.

> And what about 20GB instead 200MB?

That is "scaling the problem to a point where the total run-time is
something that a human might conceivably get impatient waiting for".

> The serial part will be 50s, [...]

No, there is no a-priori reason to believe that the scaling will be
linear. For a start, the in-memory solutions can't be used for 20 GB,
on typical hardware.

> [...] And what about web service?

In that case the problem statement should be refined to specify the
distribution of queries (if the query was fixed, we could cache the
answer and then the computation time would be independent of the number
of clients).

In the original problem, the data comes from a logfile; maybe we should
be indexing that logfile, and updating the index incrementally as it is
extended.

> You can use three times cheaper HW to serve same amount of users.
> It's not enough? Tim Brays exercise is not only 200MB ~ 1Mrec, but
> also 1GB ~ 5Mrec, but in this case 2.5s is long time in human terms
> if it will may be Web/UI response.

Only if the answer has not been cached.

My main point is, you can't reasonably assume that the sequential part
will still be the same proportion of the run-time when the problem has
been scaled. That will depend on *how* it is scaled.

-- 
David Hopwood