[erlang-questions] tcerl memory usage, plans for hash-type db

Thu Aug 14 22:51:07 CEST 2008

We tend to talk about these things here:
http://groups.google.com/group/mnesiaex-discuss

Anyway, I just open a tcbdb database in the linked-in driver, no special
extra memory usage that I'm aware of.  I surmised from the docs that tcbdb
is built on top of tchdb and stores entire pages in tchdb; so the cache
params are actually in terms of pages not records (unlike tchdb).  So 64
pages might be alot of your database in cache, especially since your
records are kinda big (40kb each?).  Actually you may need to play with
parameters to avoid being considered a large object, ask the tokyocabinet
folks to be sure.  Anyway, the cache is implemented in mmap so typically
on a 64 bit machine where there is plenty of address space it is harmless,
the OS will manage physical memory.

In our production system we run with minimal cache (1 leaf/1 non-leaf) and
just use the OS page cache, since we are on 32 bit boxes (ec2 c1.medium).
Paranthetically, we also set { deflate, true }.  We're able to get dozens
of tables (fragments) per node in less than a gig this way. If we were on
64 bit boxes I would try cranking up the cache and let tokyocabinet mmap
away, since I suspect page faults are faster than constantly doing system
calls, but I haven't benchmarked it so I don't know.

Re: supporting tchdb, there is no method for positioning a cursor on or
after a record in tchdb, so there is no nice way to implement next or
prev.  I'm holding off implementing tchdb until one materializes.  I've
asked the tokyocabinet maintainer to add it; last I checked it wasn't
there. If it's important to you, contact him!

Re: sync times, well a couple of seconds is perhaps not out of line,
depending upon the amount of data.  As you might already have found, I
don't get the durability story of tokyocabinet
(http://groups.google.com/group/mnesiaex-discuss/browse_thread/thread/da2ae1da862b01c0)
but we use distributed mnesia and have multiple copies of any table so we
basically never locally load a table on restart.

Good luck!

-- p

On Thu, 14 Aug 2008, jason pellerin wrote:

> I'm interested in using tcerl for a few projects, but I'm running into
> issues that I don't know how to resolve. The big one is memory use.
> While I can tune it a bit by decreasing the leaf and non-leaf node
> caches, tcerl still seems to grab tons of RAM, much more than I was
> (naively?) expecting. With the two cache params set to 64 each, a db
> of 7000 40k records takes up almost 400mb of ram. With the default
> cache params of 1024 and 512, it consumes over 1G. Syncs of a db of
> this size also take several seconds, which effectively means that I'd
> never be able to sync, except when shutting down a node.
>
> Tcerl folks -- am I doing something wrong? I need to support dbs of
> about 100,000 records of this size (with a *lot* of reads and writes)
> with reasonable enough RAM usage that each node can fit onto an EC2
> instance with a bunch of other stuff -- realistically, 256M at most.
> I'd guess that the tokyocabinet hash db would be better suited to our
> needs, since we don't need ordered_set. But would the RAM use be any
> different, and do you have near-term plans to add support for the hash
> db to tcerl?
>
> Does anyone have any other suggestions for storage backends to try?
> Dets unfortunately is much too slow, and mnesia with disc tables too
> RAM-hungry.
>
> Thanks (and thanks for tcerl!)
>
> JP
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-questions
>

In an artificial world, only extremists live naturally.

        -- Paul Graham