LRU queue or heap data structure implementation?

Wed Nov 23 18:29:54 CET 2005

On 23 Nov 2005, at 15:56, Jay Nelson wrote:

> I have wanted to write a similar thing for a while now.  I would do  
> it with processes:
>
> 1) Create a server to manage the ets table (eventually you would  
> want this distributed), which handles the following messages:
>    - New: create a new process and insert in ets with key
>    - Locate: see if a key leads to an existing process
>    - Reap: remove key and process from ets
>
> 2) Each process has a receive which does the following:
>    - GetData: returns the state and loops
>    - after LRU_Timeout: calls ets manager with Reap(SelfKey) then dies

... so this sounds like a purely in-memory database where each "row"  
of data is actually a process, with a mind of its own. I'm sure if  
you took that to its logical conclusions you could build a  
particularly wacky variant of a relational database (where foreign  
keys between "rows" are paths you can send messages over). There's  
probably a weird variant of database theory in which you model select  
and update operations by sending a request to an object in the  
system, recursively having it create a cascade of messages out to  
every "row" the query needs to consider, and waiting for asynchronous  
responses to come back. Sounds quite insane for someone who's grown  
up with SQL, but I wonder if it's a more natural way of thinking  
about data. Who knows?

Keeping my feet on the floor for a bit though, all I'm trying to do  
as extend an ets table to store data and keep track of infrequently  
accessed items. Potentially we could be talking about 10^7 or 10^8  
entries in the cache, so I'm not sure how happy the current Erlang  
runtime would be about spawning that number of processes. My guess is  
that it wouldn't be very amused at all. Either way, I'd just be  
pushing out the problem of trying to come up with sensible data  
structure to whichever part of the emulator is responsible for  
maintaining the schedule of which process is due to receive "after"  
timeouts and when.

Francesco suggested that it might be possible just to use an ets  
table which maps access times to keys, updating the key with  
erlang:now() on every read operation. I suspect there's going to be a  
hell of a lot of binary-tree rebalancing going on, but I'll prototype  
that up and see how it performs. Maybe I don't even need to worry  
about optimisation and everything will magically run fast.

... that said, I'll probably actually go away, make a cup of tea, and  
think about this intriguing idea of building a database where queries  
are evaluated by pinging lots of messages around lots of little data- 
as-persistent-process things. It might even be a refreshing way of  
thinking about OLTP systems (and probably a nightmare for doing DSS  
queries).

Richard.