[erlang-questions] How would you implement TTL for ETS based cache?

Fri Mar 15 11:46:37 CET 2013

Hi Dmitry,

We've never performed an analysis on 1000 entries. I am sure we can agree that every approach would work well on 1000 entries. When we did our testing, we did it on 1,000,000+ entries -- unfortunately I do not have results handy. Thus, when used on a small scale, our approach (if I may call it that way, and that for reference purposes only) may work only marginally better, hence the effort may not be worthwhile. However, if you want to have a solution that scales, then you aught to make a few trade-offs, and this a good one, in my view.

As for ratio of 2/3 you've mention, well,  consider this:

A)  If you want to prolong a TTL of the entry on every hit, you have no choice but to insert in order to update the entry with a new TTL value -- again, no matter which method you are using;
B) If you going to drop the whole table at the end of the traversal, very little point in deleting every entry one by one. Hence, it may cost a bit of memory, but there's no need to call ets:delete for every entry even when element is going to be moved to a "younger cache" -- that is, if you favor speed over memory. (Also, note that a call to ets:delete doesn't actually deallocates the memory immediately anyway);

Moving entries around has that purpose -- prolonging a lifecycle of the most recent accessed entries, which should be purpose of the cache.

OTOH, if you do not want to prolong the caching of the element, then just keep it in the same table (generation) without any update, and dispose of it with the rest of the generation by dropping the table as  a whole. 
Granted, in such case you may have a higher probability of executing a lookup more than once for the same element as the element grows older, but still result in a fewer lookups than traversing a single ETS periodically.

By contrast, when we talk about traversing the whole table periodically, we are talking abut absolute certainty (P=1).

V/

On 15 Mar 2013, at 11:56 AM, Dmitry Kolesnikov wrote:

> Hello Valentin,
> 
> Let me clarify one issue with you approach. 
> In one of your earliest email you wrote: "if entry with a corresponding key is found -- it is moved to the youngest cache."
> This implies a data copy from one ETS table to another. The copy rate is sole depends on application but having 30 sec TTL with 10 sec generation you have 2/3 of cache hit request requires ets:lookup, ets:insert, ets:delete. Have you analysed impact of those?
> 
> On another hands, a cache house keeping process might be executed less frequently then 1 sec but data eviction is happens at access time. 
> 
> - Dmitry
> 
> On Mar 15, 2013, at 11:12 AM, Valentin Micic <v@REDACTED> wrote:
> 
>> Forgive me for questioning your view, but just consider this:
>> 
>> Say that your TTL is set to 30 seconds and that you decide to traverse the cache every second, in which  you have 3000 elements, with a maximum incoming rate of 100 entries per second. That results in traversing the entire cache 29 times in order to handle at most 100 elements at the time. This kind of pro-activity is not economical -- you can always put these CPU cycles to a better use (am I old fashion or what?)
>> 
>> You may achieve the same thing with generational cache, as you can traverse the table only before you drop it, and then commit entries with expired TTL (e.g.  execute a callback), whilst move the ones which TTL did not expire to younger ETS table. 
>> 
>> Put this to the test: say you create a new generation every 10 seconds. Considering incoming rate of 100 per second, you should have 1000 entries per generation, which you would traverse exactly once, thus, for the same number of entries, say 3000 (with a constant incoming rate of 100), well, you do the math.  ;-)
>> 
>> V/
>> 
>> On 15 Mar 2013, at 10:06 AM, Max Bourinov wrote:
>> 
>>> This will not work for me. I need pro-active TTL.
>>> 
>>> Best regards,
>>> Max
>>> 
>>> 
>>> 
>>> On Fri, Mar 15, 2013 at 8:28 AM, Bach Le <thebullno1@REDACTED> wrote:
>>> I like this approach. Thanks for sharing. For TTL per entry, you can do it lazily. Retrieve the entry and look at the TTL, if it's supposed to expire, return nothing and don't promote this entry. The generation will eventually be "garbage collected" anyway.
>>> 
>>> Also, instead of create and delete table, is it faster to clear the table and reuse it? Assuming that ets does some memory pooling, clearing might be more efficient than just dropping.
>>> 
>>> 
>>> On Thursday, March 14, 2013 10:50:56 PM UTC+8, Valentin Micic wrote:
>>> Hi Max, 
>>> 
>>> We have implemented something similar in the past -- we called it a generational cache (hope no-one accuses me of a plagiarism, for English is but a finite language after all. And so is the problem/solution space...). 
>>> 
>>> The gist of it is that instead of using a single ETS table for a cache, you would use two or more "disposable" ETS tables. 
>>> The generation management is implemented as a process that, upon some criteria are met, destroys the oldest table, and creates a new one in its stead. 
>>> 
>>> The youngest cache (the newest ETS table) is always queried first. If the value is not found, the older tables are queried and if entry with a corresponding key is found -- it is moved to the youngest cache. 
>>> If none of the ETS tables in a cache set contains the key, then the key is loaded from some external source and inserted into the youngest (the newest ETS table) cache. 
>>> 
>>> This method worked quite well for us in the past, and we've implemented a several variants of it, e.g. time based, size based, and combination of the two. 
>>> Bear in mind that when you drop a very large ETS table (and I mean very large, as in Gigabytes of memory), it may take some time to release the allocated memory, but that may be a problem for another day. 
>>> 
>>> The downside, this does not really correspond to your requirement to keep a TTL per single cache entry. 
>>> The upside, it is much faster and cheaper than traversing a single cache periodically. Also, it end up keeping only entries that are most frequently used. 
>>> 
>>> Hope this helps. 
>>> 
>>> Kind reagards 
>>> 
>>> V/ 
>>> 
>>> 
>>> On 14 Mar 2013, at 4:15 PM, Max Bourinov wrote: 
>>> 
>>> > Hi Erlangers, 
>>> > 
>>> > How would you implement TTL with callback for ETS based cache? 
>>> > 
>>> > I want when TTL expires callback is triggered and cached value goes to DB. 
>>> > 
>>> > For key I use 32 byte long binary(), value is not bigger than 8 KB. 
>>> > 
>>> > Any suggestions? 
>>> > 
>>> > p.s. elang:send_after/3 is a most brutal approach, but how would it be when cache is under load? 
>>> > 
>>> > Best regards, 
>>> > Max 
>>> > 
>>> > _______________________________________________ 
>>> > erlang-questions mailing list 
>>> > erlang-q...@REDACTED 
>>> > http://erlang.org/mailman/listinfo/erlang-questions 
>>> 
>>> _______________________________________________ 
>>> erlang-questions mailing list 
>>> erlang-q...@REDACTED 
>>> http://erlang.org/mailman/listinfo/erlang-questions 
>>> 
>>> _______________________________________________
>>> erlang-questions mailing list
>>> erlang-questions@REDACTED
>>> http://erlang.org/mailman/listinfo/erlang-questions
>>> 
>>> 
>>> _______________________________________________
>>> erlang-questions mailing list
>>> erlang-questions@REDACTED
>>> http://erlang.org/mailman/listinfo/erlang-questions
>> 
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-questions
> 
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130315/593b96eb/attachment.htm>