[erlang-questions] Faster ets:lookups
Sat Aug 14 21:52:25 CEST 2010
With that scenario, even if you were to rewrite the whole thing in C++,
you'd find that the lookup into your hash bag or whatever structure you'll
use will take the most time, as it does in erlang. Short of having a O(1)
lookup, any language will exhibit this.
Perhaps you would get better luck tuning your algorithm differently (ie: 64k
blocks instead of 4k) or simply trying a new one altogether. You could also
look into parallelizing the whole thing and splitting your ets table and
merging the results later, removing duplicates.
On Sat, Aug 14, 2010 at 2:46 PM, tom kelly <> wrote:
> Hi Matthew,
> Maybe I should have said more about my scenario.
> I'm passing a ~30MB test file through my cache. For each byte I calculate a
> rolling adler for the 4K block that it is the start of and check my ets
> table to see if I already have it. If I get a hit I calculate the MD5 to
> make sure it's a genuine hit and if it is I can send a reference instead of
> the complete block. The table is a bag in case different blocks have the
> same adler.
> For the case I'm trying to optimise the file is not already in cache so I
> end up calculating ~30 million adlers and doing ~30 million lookups, which
> will all fail, on a table that grows to over 7,000 entries. Each entry
> contains an adler and MD5 of a 4k block of data stored elsewhere.
> Before I measured it I would have bet my house on the adler calculation
> taking most of the execution time, but it seems that the ets lookup takes
> just under three times as much! 4.5 seconds for the adler calculation, 12
> seconds for the ets lookup.
> By now I don't really expect to improve on ets, I know it's just that our
> algorithm uses it so intensively. I'm just hoping beyond hope that maybe
> someone has a suggestion for a better way to keep and check this data?
> Our CTO isn't sold on Erlang yet and is considering re-writing the whole
> thing in C. I just can't let him win!! ;-)
> On Fri, Aug 13, 2010 at 7:13 PM, Evans, Matthew <>
> > What sort of results are you getting? In most tests that I have been
> > ETS is as fast, if not faster than C++ map. I do know that a table of
> > bag is a bit slower than set or ordered set.
> > -----Original Message-----
> > From: [mailto:]
> > Behalf Of tom kelly
> > Sent: Friday, August 13, 2010 6:20 AM
> > To:
> > Subject: [erlang-questions] Faster ets:lookups
> > Hi List,
> > I've just been reading Scott Lystig Fritchies paper "A Study of Erlang
> > Table Implementations and Performance" and was wondering whatever became
> > the "judy" option for ets table types. It's not in any OTP release I've
> > worked with (R11 onwards) even though the research was done on R9, I
> > this was considered but not accepted by OTP, anyone know the reasons?
> > I found this when looking for ways to improve our cache system, we uses
> > ets:lookup very intensively (table contains ~7000 entries indexed by a
> > tuple
> > of two integers representing an adler, table is type "bag") and our
> > profiling has shown that the lookup uses the largest proportion of our
> > processing time.
> > Does anyone have any suggestions on how to optimise the lookup?
> > Thanks in advance!
> > //Tom.
More information about the erlang-questions