[erlang-questions] random entry from mensia

Thu Nov 18 19:53:41 CET 2010

Ok thanks jesper that explains a lot!

I ran rp([{N,mnesia:dirty_slot(offline_msg,N)}||N<-lists:seq(0,255)]). on a table with only 103 items in it and now i understand what it does. Only about 3 of the items had anything in the at all & it seemed that item 199 had about 90 % of everything, so i can see it is nothing like what i wanted. I would assume that a heavily loaded table the distribution may be a bit more even but even so i could potentially end up with a few % of my table begin returned with that command. Out of interest would increasing the number of buckets say 100 fold have a strong detrimental effect on the overall performance of the table & is this a user controllable parameter?

In conclusion however, just added an auto incremented indexed column is the most  sensible solution to the problem i think

James

On 18 Nov 2010, at 18:11, Jesper Louis Andersen wrote:

> On Thu, Nov 18, 2010 at 6:00 PM, James Churchman
> <jameschurchman@REDACTED> wrote:
>> Someone i know brought up an interesting question that is usually very easy to solve in sql but i could not see an obvious efficient solution in Mnesia
> 
> The same problem is present in most other K/V stores where you can not
> rely on the fact that the K is an increasing value. My solution would
> probably be to have another table mapping {I, K} where I is a running
> index. The exercise left for the reader is to handle the case where K
> is deleted :)
> 
>> 
>> There is the option in both ETS and Mnesia called dirty_slot, which i thought gets an item by its numeric order.
> 
> No, it doesn't.
> 
> ETS is, for some ETS types, implemented as a hash-table. This means
> that the table contains buckets, some of which are full, and some of
> which are empty. the slot/2 calls retrieves the contents of the
> buckets in the hash-table so depending on load-factor of the table you
> will amost always get a result, almost never get a result or something
> in between. Leave slot for the debugging it was meant to do. For
> example, an empty table use a 256 bucket has table. If you insert a
> single element into this table, one of the bucket will be filled, but
> the remaining 255 buckets are not, yet, with any data. It does suggest
> a crude algorithm however:
> 
> 1. Pick a number, i, randomly between 0 and BUCKET_SIZE (i.e., 0.256 above)
> 2. Search i, i+1, i+2, i+.., until you hit a non-empty bucket. Pick
> randomly from this bucket.
> 3. If you hit $end_of_table continue the search from 0.
> 
> Depending on the randomness properties of the hash-table this is
> either really really bad or adequate. I'd hesitate to call the
> distribution good though as I am afraid something will go very wrong.
> Also, the run-time is potentially much greater than what you expect
> but you will hopefully hit something.
> 
> -- 
> J.