Mnesia and additional indexes: a cautionary tale
Ulf Wiger (AL/EAB)
ulf.wiger@REDACTED
Thu Mar 30 10:23:07 CEST 2006
Enter the 'rdbms' contrib...
I see a couple of traits of the additional indexing
support in rdbms that could help in this
particular situation:
- You can have disc_copy indexes, which are not rebuilt
every time mnesia is restarted
- You can have ordered indexes, which don't have linear
insertion complexity. They are ordered_set tables, where
the key is {IndexValue, Oid}.
Still waiting for some feedback as to whether this
either sucks badly or actually helps ... (:
/Ulf W
> -----Original Message-----
> From: owner-erlang-questions@REDACTED
> [mailto:owner-erlang-questions@REDACTED] On Behalf Of Dan
> Gudmundsson
> Sent: den 30 mars 2006 09:45
> To: Scott Lystig Fritchie
> Cc: erlang-questions@REDACTED
> Subject: Mnesia and additional indexes: a cautionary tale
>
>
> Mnesia indecies are implemented with an additional [d]ets
> _BAG_ table per index, which have the secondary index as a
> key and the value is the key in the real table.
>
> Insertion time in ets bag tables are linear, and have to be
> that way mnesia relies on the insertion order (in other parts).
>
> You are not the first person to have made that mistake I can
> assure you :-)
>
> The others have most often done it on test systems, though,
> then they come and complain about mnesia's lousy insertion
> performance..
> Maybe I should add something more about it in the manual...
>
> /Dan
>
> Scott Lystig Fritchie writes:
> > Greetings. I have a cautionary tale to tell about Mnesia
> and adding > an extra attribute index.
> >
> > The story starts with panic (mine!) late last night. I
> was doing some > route performance tests for a Mnesia-based
> application: simple > 1-attribute changes to single records
> in several tables. Updates for > one specific table were
> 2.5 *orders of magnitude* slower than all > others.
> >
> > All of the tables were disc_copies tables. All contained
> 200K > entries. All fit quite comfortably in RAM without
> pissing off the > virtual memory system.
> >
> > It was late, and I didn't want to struggle with
> remembering how to use > "fprof" or "eprof", so I used
> "cprof". IIRC, "cprof" can profile all > Erlang processes
> without lots of brain power or keystrokes. (It was > late,
> I was tired.) Cprof showed that about close to 2 orders of
> > magnitude fewer VM reductions being executed. Huh. That
> was not what > I wanted to see.
> >
> > Go to sleep, wake up refreshed, then tackle the problem again.
> > Additional profiling is frustrated: no Erlang functions
> claim the > extra time. Perhaps I'm just inept at "fprof"
> subtlety, somehow > omitting the Erlang process that was
> eating all the CPU time? {shrug} > > Later in the
> afternoon, I shutdown Mnesia, then restart it. My >
> application starts timing out at mnesia:wait_for_tables/2.
> So I start > mnesia manually, then go get coffee and make a
> phone call. When I > return 15 minutes later, Mnesia
> *still* hasn't finished starting up.
> >
> > The "beam" process size should've been about 1,400KB with
> everything > loaded. But the process size was only 390MB,
> and "beam" was still > using 100% CPU time ... doing
> something, I dunno what!
> >
> > So, I kill the VM and restart. Before starting Mnesia, I
> use > mnesia:set_debug_level(verbose). Sure enough, I see
> messages like:
> >
> > Mnesia(pss1@REDACTED): Intend to load tables:
> [{'Tab1',local_only},
> >
> {'Tab2',local_only},
> >
> {'Tab3',local_only},
> >
> {'Tab4',local_only},
> > ...
> > ]
> > Mnesia(pss1@REDACTED): Mnesia started, 0 seconds
> > Mnesia(pss1@REDACTED): Creating index for 'Tab1'
> > Mnesia(pss1@REDACTED): Creating index for 'Tab2'
> > Mnesia(pss1@REDACTED): Creating index for 'Tab3'
> > Mnesia(pss1@REDACTED): Creating index for 'Tab3'
> >
> > ... and it hangs there, eating 100% CPU and getting no further.
> >
> > A quick edit to mnesia_index.erl to include the attribute
> position > number shows me this instead:
> >
> > Mnesia(pss1@REDACTED): Creating index for 'Tab1' Pos 3
> > Mnesia(pss1@REDACTED): Creating index for 'Tab2' Pos 7
> > Mnesia(pss1@REDACTED): Creating index for 'Tab3' Pos 3
> > Mnesia(pss1@REDACTED): Creating index for 'Tab3' Pos 5
> >
> > Ah! Suddenly, it becomes very, very clear.
> >
> > The table 'Tab3' contains 200K of debugging/development
> records. When > the code to create those records was first
> written, the attribute at > position #5 was a constant binary term.
> >
> > Then "feature creep" happened, and an extra Mnesia index
> was created > for position #5. At the 200K records were
> added slowly, no one > noticed the performance impact using
> the exact same term for position > #5 ... until I did, last
> last night.
> >
> > Moral of the story for Mnesia users (and other databases,
> I'm sure):
> > beware of the impact of adding secondary indexes.
> >
> > For the Mnesia dev team, I have two questions:
> >
> > 1. That change to mnesia_index.erl is awfully handy ... though
> > unfortunately only handy when the Mnesia debug level is changed
> > from the default.
> >
> > 2. What are the odds that a future release could have less evil
> > behavior (less than O(N^2), taking a wild guess) with secondary
> > indexes like my (unfortunate, pilot error!) story?
> >
> > -Scott
>
>
>
More information about the erlang-questions
mailing list