[erlang-questions] Memory-effective queue handling

Knut Nesheim knutin@REDACTED
Tue Jul 24 23:09:05 CEST 2012


On Tue, Jul 24, 2012 at 5:11 PM, Attila Rajmund Nohl
<attila.r.nohl@REDACTED> wrote:
> Hello!
>
> I have a data structure where I manage "historical data": a list of
> timestamps when events happened. I only need to know how many events
> happened in the last 24 hours.

Using an ordered_set ETS-table and iterating over the table to count
and delete events will be very fast and not explode your memory usage.
You can also enable compression on the table. ETS allows concurrent
writes which removes a potential disaster if you are using a single
process in charge of the queue (with many events, this process might
get overloaded).

However, if you are only interested in the count of events, maybe you
don't need to store all events in full? If you are ok with losing some
granularity, you can use one "bucket" per minute or second or whatever
for the last 24 hours. In the bucket you keep the count in that time
period, incrementing it for every event. Finding the current bucket
can be done using "Now rem NumBuckets" where Now has the granularity
you need. The count for the last 24 hours is the sum of all the
elements in the table.

If you use an ETS table to keep the count (ets:update_counter/3 is
very nice for this), you can reset the count by doing a decrement of
the current bucket every minute or second by fetching the current
count, then decrementing with that value (this would work even with
interleaved operations.) If you are able to use a single process in
charge of getting the count polling-style, you could copy the buckets
to a second table at every polling interval. The diff between the two
tables would be the number of events since last poll.

With this solution, the space required is determined by the
granularity. Finding the sum is a traversal away. Pruning can be
fiddly, but requires no CPU or memory intensive work. To me it sounds
good on paper, maybe there are some obstacles I didn't think of. Maybe
there would be very high write lock contention on the single element
to update, maybe ets:update_counter/3 still performs very well.

For problems like these, pushing the requirements might be very
helpful. If you could do a good sample of the events, you can still
get an accurate answer when you ask for the count and keep your
current approach.

Regards
Knut



More information about the erlang-questions mailing list