[erlang-questions] How fast is to retrieve date from ETS table

Martin Dimitrov <>
Wed Jan 11 14:54:17 CET 2012


Thanks for the informative answer. I did some tests with memcpy in C and
it turns out that 0.1 second for ~2MB data is not "quite fast" but, I
guess, normal.

Best regards,

Martin

On 1/11/2012 1:45 PM, Ulf Wiger wrote:
> Well, reading from ETS does impose a copy operation, but this is just as efficient (as far as it goes) as message passing, which also copies. Exactly the same copy operation is used, in fact.
>
> And just as with message passing, if you are using binaries, they may be passed by reference instead, as has already been pointed out.
>
> Given that term copying is central in message passing and GC, as well as in ETS, a lot of time has been invested in optimizing it*. The Erlang VM does it well, but of course, copying is always copying - the cost will be relative to the size of the data, and the emulator must traverse the term in order to know what's in it.
>
> The actual code is in $ERL_TOP/erts/emulator/beam/copy.c (copy_struct()), or:
>
> https://github.com/erlang/otp/blob/master/erts/emulator/beam/copy.c#L191
>
> Another potential performance issue with ETS tables is locking, if you have many cores. Again, this is an area that the ERTS team is working hard on, so it gets better and better. However, shared data structures are notoriously hard to manage as the core count grows.
>
> Bottom line: it's good to be aware of cost factors, but you won't know how fast or slow it is in reality, until you measure (which you did!). ETS tables are fast enough for most purposes. :)
>
> BR,
> Ulf W
>
> * The garbage collector uses its own copying techniques, since it doesn't have to be limited to copying one term at a time, and also _must_ preserve subterm sharing.
>
> On 11 Jan 2012, at 12:15, Martin Dimitrov wrote:
>
>> I thought this is a lame question so I posted it on StackOverflow
>> (http://stackoverflow.com/questions/8811430/retrieval-of-data-from-ets-table)
>> so not to bother the list but there aren't many replies.
>>
>> Here it is my observation:
>>
>> I know that lookup time is constant for ETS tables. But I also heard
>> that the table is kept outside of the process and when retrieving data,
>> the data needs to be moved to the process heap. So, this is expensive.
>> But then, how to explain this:
>>
>> 1> {ok, B} = file:read_file("IMG_2171.JPG").
>> {ok,<<255,216,255,225,63,254,69,120,105,102,0,0,73,73,42,
>>      0,8,0,0,0,10,0,14,1,2,0,32,...>>}
>> 2> size(B).
>> 1986392
>> 3> L = binary_to_list(B).
>> [255,216,255,225,63,254,69,120,105,102,0,0,73,73,42,0,8,0,0,
>> 0,10,0,14,1,2,0,32,0,0|...]
>> 4> length(L).
>> 1986392
>> 5> ets:insert(utilo, {a, L}).
>> true
>> 6> timer:tc(ets, match, [utilo, {a, '$1'}]).
>> {106000,
>> [[[255,216,255,225,63,254,69,120,105,102,0,0,73,73,42,0,8,0,
>>    0,0,10,0,14,1,2|...]]]}
>>
>> It takes 106000 microseconds to retrieve 1986392 long list which is
>> pretty fast, isn't it?
>> I also tried it from a module and the result is the same.
>>
>> Best regards,
>>
>> Martin
>>
>>
>> _______________________________________________
>> erlang-questions mailing list
>> 
>> http://erlang.org/mailman/listinfo/erlang-questions
>




More information about the erlang-questions mailing list