Mnesia and additional indexes: a cautionary tale

Ulf Wiger (AL/EAB) ulf.wiger@REDACTED
Thu Mar 30 10:23:07 CEST 2006


Enter the 'rdbms' contrib...

I see a couple of traits of the additional indexing
support in rdbms that could help in this
particular situation:

- You can have disc_copy indexes, which are not rebuilt
  every time mnesia is restarted
- You can have ordered indexes, which don't have linear
  insertion complexity. They are ordered_set tables, where
  the key is {IndexValue, Oid}.

Still waiting for some feedback as to whether this 
either sucks badly or actually helps ... (:

/Ulf W

> -----Original Message-----
> From: owner-erlang-questions@REDACTED 
> [mailto:owner-erlang-questions@REDACTED] On Behalf Of Dan 
> Gudmundsson
> Sent: den 30 mars 2006 09:45
> To: Scott Lystig Fritchie
> Cc: erlang-questions@REDACTED
> Subject: Mnesia and additional indexes: a cautionary tale
> 
> 
> Mnesia indecies are implemented with an additional [d]ets 
> _BAG_ table per index, which have the secondary index as a 
> key and the value is the key in the real table.
> 
> Insertion time in ets bag tables are linear, and have to be 
> that way mnesia relies on the insertion order (in other parts).
> 
> You are not the first person to have made that mistake I can 
> assure you :-) 
> 
> The others have most often done it on test systems, though, 
> then they come and complain about mnesia's lousy insertion 
> performance..
> Maybe I should add something more about it in the manual...
> 
> /Dan
> 
> Scott Lystig Fritchie writes:
>  > Greetings.  I have a cautionary tale to tell about Mnesia 
> and adding  > an extra attribute index.
>  >
>  > The story starts with panic (mine!) late last night.  I 
> was doing some  > route performance tests for a Mnesia-based 
> application: simple  > 1-attribute changes to single records 
> in several tables.  Updates for  > one specific table were 
> 2.5 *orders of magnitude* slower than all  > others.
>  >
>  > All of the tables were disc_copies tables.  All contained 
> 200K  > entries.  All fit quite comfortably in RAM without 
> pissing off the  > virtual memory system.
>  >
>  > It was late, and I didn't want to struggle with 
> remembering how to use  > "fprof" or "eprof", so I used 
> "cprof".  IIRC, "cprof" can profile all  > Erlang processes 
> without lots of brain power or keystrokes.  (It was  > late, 
> I was tired.)  Cprof showed that about close to 2 orders of  
> > magnitude fewer VM reductions being executed.  Huh.  That 
> was not what  > I wanted to see.
>  >
>  > Go to sleep, wake up refreshed, then tackle the problem again.
>  > Additional profiling is frustrated: no Erlang functions 
> claim the  > extra time.  Perhaps I'm just inept at "fprof" 
> subtlety, somehow  > omitting the Erlang process that was 
> eating all the CPU time?  {shrug}  >  > Later in the 
> afternoon, I shutdown Mnesia, then restart it.  My  > 
> application starts timing out at mnesia:wait_for_tables/2.  
> So I start  > mnesia manually, then go get coffee and make a 
> phone call.  When I  > return 15 minutes later, Mnesia 
> *still* hasn't finished starting up.
>  >
>  > The "beam" process size should've been about 1,400KB with 
> everything  > loaded.  But the process size was only 390MB, 
> and "beam" was still  > using 100% CPU time ... doing 
> something, I dunno what!
>  >
>  > So, I kill the VM and restart.  Before starting Mnesia, I 
> use  > mnesia:set_debug_level(verbose).  Sure enough, I see 
> messages like:
>  > 
>  >     Mnesia(pss1@REDACTED): Intend to load tables: 
> [{'Tab1',local_only},
>  >                                                 
> {'Tab2',local_only},
>  >                                                 
> {'Tab3',local_only},
>  >                                                 
> {'Tab4',local_only},
>  >     					    ...
>  >     					   ]
>  >     Mnesia(pss1@REDACTED): Mnesia started, 0 seconds
>  >     Mnesia(pss1@REDACTED): Creating index for 'Tab1' 
>  >     Mnesia(pss1@REDACTED): Creating index for 'Tab2' 
>  >     Mnesia(pss1@REDACTED): Creating index for 'Tab3' 
>  >     Mnesia(pss1@REDACTED): Creating index for 'Tab3' 
>  >
>  > ... and it hangs there, eating 100% CPU and getting no further.
>  >
>  > A quick edit to mnesia_index.erl to include the attribute 
> position  > number shows me this instead:
>  > 
>  >     Mnesia(pss1@REDACTED): Creating index for 'Tab1' Pos 3
>  >     Mnesia(pss1@REDACTED): Creating index for 'Tab2' Pos 7
>  >     Mnesia(pss1@REDACTED): Creating index for 'Tab3' Pos 3
>  >     Mnesia(pss1@REDACTED): Creating index for 'Tab3' Pos 5
>  >
>  > Ah!  Suddenly, it becomes very, very clear.
>  >
>  > The table 'Tab3' contains 200K of debugging/development 
> records.  When  > the code to create those records was first 
> written, the attribute at  > position #5 was a constant binary term.
>  >
>  > Then "feature creep" happened, and an extra Mnesia index 
> was created  > for position #5.  At the 200K records were 
> added slowly, no one  > noticed the performance impact using 
> the exact same term for position  > #5 ... until I did, last 
> last night.
>  >
>  > Moral of the story for Mnesia users (and other databases, 
> I'm sure):
>  > beware of the impact of adding secondary indexes.
>  >
>  > For the Mnesia dev team, I have two questions:
>  >
>  > 1. That change to mnesia_index.erl is awfully handy ... though
>  >    unfortunately only handy when the Mnesia debug level is changed
>  >    from the default.
>  >
>  > 2. What are the odds that a future release could have less evil
>  >    behavior (less than O(N^2), taking a wild guess) with secondary
>  >    indexes like my (unfortunate, pilot error!) story?
>  >
>  > -Scott
> 
> 
> 



More information about the erlang-questions mailing list