[erlang-questions] Faster ets:lookups

Sat Aug 14 21:52:25 CEST 2010

With that scenario, even if you were to rewrite the whole thing in C++,
you'd find that the lookup into your hash bag or whatever structure you'll
use will take the most time, as it does in erlang. Short of having a O(1)
lookup, any language will exhibit this.
Perhaps you would get better luck tuning your algorithm differently (ie: 64k
blocks instead of 4k) or simply trying a new one altogether. You could also
look into parallelizing the whole thing and splitting your ets table and
merging the results later, removing duplicates.

Nicholas

On Sat, Aug 14, 2010 at 2:46 PM, tom kelly <ttom.kelly@REDACTED> wrote:

> Hi Matthew,
>
> Maybe I should have said more about my scenario.
>
> I'm passing a ~30MB test file through my cache. For each byte I calculate a
> rolling adler for the 4K block that it is the start of and check my ets
> table to see if I already have it. If I get a hit I calculate the MD5 to
> make sure it's a genuine hit and if it is I can send a reference instead of
> the complete block. The table is a bag in case different blocks have the
> same adler.
>
> For the case I'm trying to optimise the file is not already in cache so I
> end up calculating ~30 million adlers and doing ~30 million lookups, which
> will all fail, on a table that grows to over 7,000 entries. Each entry
> contains an adler and MD5 of a 4k block of data stored elsewhere.
>
> Before I measured it I would have bet my house on the adler calculation
> taking most of the execution time, but it seems that the ets lookup takes
> just under three times as much! 4.5 seconds for the adler calculation, 12
> seconds for the ets lookup.
>
> By now I don't really expect to improve on ets, I know it's just that our
> algorithm uses it so intensively. I'm just hoping beyond hope that maybe
> someone has a suggestion for a better way to keep and check this data?
>
> Our CTO isn't sold on Erlang yet and is considering re-writing the whole
> thing in C. I just can't let him win!! ;-)
>
> //Tom.
>
>
> On Fri, Aug 13, 2010 at 7:13 PM, Evans, Matthew <mevans@REDACTED>
> wrote:
>
> > What sort of results are you getting? In most tests that I have been
> doing
> > ETS is as fast, if not faster than C++ map. I do know that a table of
> type
> > bag is a bit slower than set or ordered set.
> >
> >
> >
> > -----Original Message-----
> > From: erlang-questions@REDACTED [mailto:erlang-questions@REDACTED]
> On
> > Behalf Of tom kelly
> > Sent: Friday, August 13, 2010 6:20 AM
> > To: erlang-questions@REDACTED
> > Subject: [erlang-questions] Faster ets:lookups
> >
> > Hi List,
> > I've just been reading Scott Lystig Fritchies paper "A Study of Erlang
> ETS
> > Table Implementations and Performance" and was wondering whatever became
> of
> > the "judy" option for ets table types. It's not in any OTP release I've
> > worked with (R11 onwards) even though the research was done on R9, I
> assume
> > this was considered but not accepted by OTP, anyone know the reasons?
> > I found this when looking for ways to improve our cache system, we uses
> > ets:lookup very intensively (table contains ~7000 entries indexed by a
> > tuple
> > of two integers representing an adler, table is type "bag") and our
> > profiling has shown that the lookup uses the largest proportion of our
> > processing time.
> > Does anyone have any suggestions on how to optimise the lookup?
> > Thanks in advance!
> > //Tom.
> >
>