[erlang-questions] dealing with large dicts

Thu Sep 4 20:05:19 CEST 2008

On Thu, Sep 4, 2008 at 10:38 AM, Richard Carlsson <richardc@REDACTED> wrote:

> Jacob Perkins wrote:
> > T = fun(Term, Key) ->
> >          [Dict1] = mnesia:read(Table, Term),
> >          Dict2 = dict:store(Key, Val, Dict1),
> >          ok = write_val(Table, Term, Dict2)
> >      end.
>
> I think the problem is not the updates to the dict itself, but the
> fact that you are moving the entire dicts in and out of mnesia
> (whose storage is based on ets/dets tables, which do not share memory
> with your process). Each transaction thus consists of 1) a huge copy
> out, 2) relatively minor rewrite of the dict structure, 3) copy entire
> new structure back. You'll be better off just using a mnesia table (or
> several) for your key/value data.
>
>    /Richard
>

The reason for the dicts and not separate tables is that I don't know the
initial Terms (string keys to lookup dicts) beforehand, and the number of
possible Terms is fairly large. The Terms would have to be converted to
atoms in order to be table names, which are limited.

The table is for a term-document index, where given a term, I want a list of
document ids with their relative weight. So at indexing time, I need to
store a document id and a weight for a given term. I can put an upper limit
on the number of documents (right now I'm hoping it could be as high as
50k), and in practice the number of terms would probably be much less, but
could be as many as 10k.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20080904/3889893e/attachment.htm>