beating mnesia to death (was RE: Using 4Gb of ram with Erlang VM)

Mon Nov 7 23:41:37 CET 2005

Continuing with trying to max out that 16GB SPARC... 

I canceled the test, restarted with a thread pool of 256,
and tried building a 700,000 record disc_copy table.
It worked much better. Page faults still occured, but didn't
slow Erlang to a crawl. The log dumps interleaved, but that
didn't significantly affect responsiveness either. Building
the whole table took about 10 minutes. Dirty_reads while logs 
were dumping took a few hundred microsecs to complete (about
3x slower than in an idle system).

A read-write transaction on the full table (reading one 
object by key, and writing it back), took about 6-700 usec,
which seems quite normal.

Starting mnesia and loading the 6GB table from disk took
ca 10 minutes. The system felt pretty snappy while the
table was loading.

Now, squeezing 900,000 records into the table and monitoring
the process with vmstat made me think. I'd been using
ets:info(mytab,memory) and multiplying by four as always
(the function returns the number of words used).

But the numbers didn't add up! It looked as I was running
out of memory with 8GB to spare. vmstat reported 250 MB left
in the system when 900,000 records were loaded, and asking
the system_info(memory) BIF gave a similar answer.

(foo@REDACTED)17> erlang:system_info(memory).
[{total,15308844459},
 {processes,70511136},
 {processes_used,70507552},
 {system,15238333323},
 {atom,484673},
 {atom_used,468236},
 {binary,17296165},
 {code,5012924},
 {ets,15214982736}]
(foo@REDACTED)18> {ets:info(mytab,size),(ets:info(mytab,memory)*4)/1000000}.
{900000,7607.32}

Then it hit me - I haven't been getting enough sleep
lately! Since I'm using 64-bit erlang, it's 8 bytes for
every heap word! I've been building a 15 GB table!

Now I feel a whole lot better.  (:
So, the "6 GB" table mentioned above was really a 12 GB table,
and it behaved quite nicely. Using up slightly more than 15G
on a 16G machine was probably pushing it.

(The 900,000 record experience ended with a big Ooops!
Crash dump was written to: erl_crash.dump
eheap_alloc: Cannot allocate 64194840 bytes of memory 
  (of type "heap").)

Now the mytab.DCD is only 3.2 GB on disk... I guess 
we pay quite a price for the 64-bit word, then.

Time for the next test: write the payload as 
term_to_binary(Value,[compressed]). Now, 700,000
records took only 96 MB, and the table took 37
seconds to build. Inspired, I slammed 5 million
records into the table. This took 5 minutes,
and the resulting table used only 686 MB of RAM,
and 373 MB on disk. Sigh...

So, instead of sitting here all night, experimenting,
I'm going to jump to some conclusions:

On a 16 GB machine, you can:

- run 6 million simultaneous processes
  (through use of erlang:hibernate, I was actually 
   able to run 20 million - spawn time: 6.3 us, 
   message passing time: 5.3 us, and I had 
   1.8 GB to spare.)

- populate mnesia with at least 12 GB of data, but
  think through how you want to represent it, since
  the 64-bit word size blows things up a bit.

- keep a 10 GB+ disc_copy table in mnesia. The 
  load times and log dump cost seem acceptable
  (10 minutes to load, dumping takes a while but
  runs in the background quite nicely.)

Of course, I didn't make much use of that 
second cpu...  (:

/Uffe

> -----Original Message-----
> From: owner-erlang-questions@REDACTED 
> [mailto:owner-erlang-questions@REDACTED] On Behalf Of Ulf 
> Wiger (AL/EAB)
> Sent: den 7 november 2005 18:52
> To: erlang-questions@REDACTED
> Subject: beating mnesia to death (was RE: Using 4Gb of ram 
> with Erlang VM)
> 
>  
> Ulf Wiger wrote:
> > I've been able to get my hands on some SPARCs (2x 1.5 GHz) 
> with 16 GB 
> > RAM.
> 
> I thought I'd push mnesia a bit, to see how far I could get 
> with 16 GB of RAM and 64-bit Erlang.
> 
> (First of all, something seems to happen around 8GB. I get 
> into lots of page faults (but no swapping). When this 
> happens, responsiveness goes down the drain. Does anyone have 
> an idea what it is that's happening?)
> 
> 
> I created a ram_copy table and first stuffed it full of 
> minimal records (well, almost: {mytab, integer(), integer()}).
> 
> I pushed in 10 million records easily with no visible effect 
> on write() read() cost. But the memory usage of the table was 
> only 485 MB, so I decided to add some payload:
> 
> {mytab,integer(),erlang:make_tuple(100,lists:duplicate(10,a))}
> 
> With this, I could insert 900,000 records, with the table 
> occupying 7.6 GB of RAM. 1 million records though, and the 
> batch function never returned. My perfmeter showed lots of 
> page faults, and user CPU utilization went down to zero.
> 
> While the node remained responsive, a dirty_write() took ca 
> 10 usec with no payload, and 87 usec with payload.
> dirty_read() took 3.4 usec without and 72 usec with payload.
> 
> I've now changed the table copy type to disc_copies, and am 
> trying to find out how huge disc_copies perform.
> Again, I got a bit impatient and went for the 900,000 record 
> version with payload.
> 
> I first ran with default dump_log_write_threshold -- not a 
> very good idea. I then changed it to 10,000, and after that 
> to 100,000 writes. At the time of writing, Mnesia is bending 
> over backwards trying to handle a burst of 900,000 records 
> with checkpointing to disk. It's not liking it, but it keeps 
> going... slowly. I got lots of page faults for a while, and 
> several "Mnesia is overloaded" messages (mostly 
> time_threshold). The transaction log was 1.6 GB at its peak.  (:
> 
> The whole system is running like molasses. Still some 60,000 
> records to insert. The I/O subsystem seems saturated. The 
> process mnesia_controller:dump_and_reply()
> has a heap of > 24 MB. A disk_log process has a 40 MB heap.
> 
> Of course, I ran all this with a thread pool size of 0.
> That was probably another mistake...
> 
> 
> With 100,000 records (incl payload), the table loaded from 
> disk in 45 seconds. With 200,000 records, loading took 87 seconds.
> 
> I'm probably going to let it finish overnight. Will report 
> any further developments.
> 
> /Uffe
>