[erlang-questions] Faster ets:lookups

Sat Aug 14 20:46:53 CEST 2010

Hi Matthew,

Maybe I should have said more about my scenario.

I'm passing a ~30MB test file through my cache. For each byte I calculate a
rolling adler for the 4K block that it is the start of and check my ets
table to see if I already have it. If I get a hit I calculate the MD5 to
make sure it's a genuine hit and if it is I can send a reference instead of
the complete block. The table is a bag in case different blocks have the
same adler.

For the case I'm trying to optimise the file is not already in cache so I
end up calculating ~30 million adlers and doing ~30 million lookups, which
will all fail, on a table that grows to over 7,000 entries. Each entry
contains an adler and MD5 of a 4k block of data stored elsewhere.

Before I measured it I would have bet my house on the adler calculation
taking most of the execution time, but it seems that the ets lookup takes
just under three times as much! 4.5 seconds for the adler calculation, 12
seconds for the ets lookup.

By now I don't really expect to improve on ets, I know it's just that our
algorithm uses it so intensively. I'm just hoping beyond hope that maybe
someone has a suggestion for a better way to keep and check this data?

Our CTO isn't sold on Erlang yet and is considering re-writing the whole
thing in C. I just can't let him win!! ;-)

//Tom.

On Fri, Aug 13, 2010 at 7:13 PM, Evans, Matthew <mevans@REDACTED> wrote:

> What sort of results are you getting? In most tests that I have been doing
> ETS is as fast, if not faster than C++ map. I do know that a table of type
> bag is a bit slower than set or ordered set.
>
>
>
> -----Original Message-----
> From: erlang-questions@REDACTED [mailto:erlang-questions@REDACTED] On
> Behalf Of tom kelly
> Sent: Friday, August 13, 2010 6:20 AM
> To: erlang-questions@REDACTED
> Subject: [erlang-questions] Faster ets:lookups
>
> Hi List,
> I've just been reading Scott Lystig Fritchies paper "A Study of Erlang ETS
> Table Implementations and Performance" and was wondering whatever became of
> the "judy" option for ets table types. It's not in any OTP release I've
> worked with (R11 onwards) even though the research was done on R9, I assume
> this was considered but not accepted by OTP, anyone know the reasons?
> I found this when looking for ways to improve our cache system, we uses
> ets:lookup very intensively (table contains ~7000 entries indexed by a
> tuple
> of two integers representing an adler, table is type "bag") and our
> profiling has shown that the lookup uses the largest proportion of our
> processing time.
> Does anyone have any suggestions on how to optimise the lookup?
> Thanks in advance!
> //Tom.
>