[erlang-questions] Trie Data Structure

Mon Jan 3 02:58:16 CET 2011

It is a little unfair comparison you have done between a trie and dict/gb_trees/... as the trie is designed for a specific type of key, a list of integers/characters, while the others can have keys which can be any valid term. So the trie *should* be more efficient for this case. That being said this definitely shows the need of having more good implementations of different data structures like the trie. Improving our toolbox is never wrong.

Robert

----- "Michael Truog" <mjtruog@REDACTED> wrote:

> Actually, use the benchmark code here:
> https://github.com/okeuday/erlbench
> 
> Blog Post:
> http://okeuday.livejournal.com/16707.html
> 
> The trie fetch performance is roughly 15% faster than a dict for
> 10000
> entries, but might get better as the number of entries increases
> (though the
> performance depends on the distribution of the key size, i.e., string
> length).
> 
> On Thu, Dec 30, 2010 at 6:54 AM, Michael Truog <mjtruog@REDACTED>
> wrote:
> 
> > The trie:test_speed/0 function is what I used for determining how
> fast the
> > trie was for lookups, when compared to a dict.  The wordlist used
> comes from
> > the Ubuntu 10.04 installation and is located at
> /usr/share/dict/words (98569
> > words with an average of 9.5 characters per word).  You can also
> specify
> > your own wordlist with test_speed/1.  The comparison stores all the
> words
> > and does lookups on all the words (using is_key, fetch, and find)
> while
> > tracking the time taken to do all the lookups.
> >
> >
> > On Wed, Dec 29, 2010 at 9:04 PM, Ryan Zezeski <rzezeski@REDACTED>
> wrote:
> >
> >> On Wed, Dec 29, 2010 at 9:21 PM, Michael Truog <mjtruog@REDACTED>
> wrote:
> >>
> >>> Hi,
> >>>
> >>> I implemented a trie data structure that has performed 20% faster
> lookups
> >>> than the dict implementation in R14B01.  I have tested the code
> but it
> >>> can
> >>> always benefit from more testing.  The trie module provides the
> same
> >>> interface as the dict module, but requires a list of integers for
> the
> >>> key.
> >>> Please tell me if you find issues in the code.
> >>>
> >>> Code:
> >>>
> >>>
> https://github.com/okeuday/CloudI/blob/master/src/lib/cloud_stdlib/src/trie.erl
> >>>
> >>> Blog posts:
> >>> http://okeuday.livejournal.com/16454.html
> >>> http://okeuday.livejournal.com/16182.html (dict as the fastest
> general
> >>> tree
> >>> data structure)
> >>>
> >>> I know you can use ETS for similar problems, but avoiding shared
> data
> >>> helps
> >>> utilize Erlang's scalability advantages.
> >>>
> >>> Thanks,
> >>> Michael Truog
> >>>
> >>
> >> Do you have the benchmark and dataset available as well?  I wrote a
> trie
> >> in Erlang a few weeks ago and I'm just curious how it stacks up.
> >>
> >> --
> >> *.........01010...00..00...0100...00..00.
> >> .........10..11...0100...11..00..101.01.
> >> .111010..01100.....10....101110..01.101.
> >> .........10..10....11....01..10..10..01.
> >> .........00..00....00....00..00..00..00.
> >> ........................................*
> >>
> >
> >