[erlang-questions] where it's the best way to store a very big term object shared between processes

Caragea Silviu silviu.cpp@REDACTED
Thu Oct 22 22:54:18 CEST 2015


Hello,

@Michael I'm using btree only because of btrie:find_prefix_longest .

Basically this is the main functionality I need. As I already posted if you
have a btrie with the following elements ["aa", "a", "b", "bb", "aaa"] and
you call: btrie:find_prefix_longest("aaawhatever")  will return the
associated value to the key "aaa".

I need this for a long table with calling breakouts (prefixes and rate per
prefix) - around 50 k breakouts and basically I call
btrie:find_prefix_longest(<<"phonenumber">>) and it returns me the prefix
and the rate I need to bill for that destination. Lookup operation seems ok
from 1-2.5 ms 95% of time is spent in ets:lookup. As somebody already
pointed out is because ets is doing a copy. I will change with gen_server
state and benchmark again.

Thanks everyone for suggestions !

On Thu, Oct 22, 2015 at 11:38 PM, Michael Truog <mjtruog@REDACTED> wrote:

> On 10/22/2015 01:29 AM, Caragea Silviu wrote:
>
> Hello.
>
> In one of my projects I need to use a radix tree. I found out a very nice
> library :
> https://github.com/okeuday/trie
>
> Lookup performances are great. But I have one problem.
>
> Basically my tree has around 100 000 elements so building it it's an
> extremely operation. For this reason I'm building it once and all processes
> that needs to do lookups need to share the btrie object (created using
> btrie:new/1).
>
> Here I see several options:
>
> 1. Use a gen server and store the btrie object on the state or process
> dictionary. - I didn't tried this
> 2. Use a ets table and store the tire object on a public table where all
> processes can read and  write.
>
> It is easier to scale and is more natural in Erlang if you pursue #1
> (using the state, not the process dictionary).  The #2 path (including
> mochiglobal) is typical in imperative programming (mutating global state).
> With #1 you can manage the reliability of individual processes for
> fault-tolerance concerns and you would probably start with a single locally
> registered process name.  Then if there is too much contention for the
> single process that has the btrie, you would switch to using a process
> group, to share the load with replicated data.
>
> The btrie usage is probably slower than using the newer maps data
> structure.  The trie repo was mainly created for string keys, not binary
> keys, due to the memory access details in Erlang (i.e., it is easier to
> have more efficient lookups with string keys, when using process heap data,
> which includes being more efficient than maps in some cases).
>
> You could also store the key/value lookup as a single large binary that
> you reference (in multiple processes, since large binaries are reference
> counted) with something like https://github.com/knutin/bisect which may
> work too.
>
> Best Regards,
> Michael
>
>
> Doing some benchmarks I see that lookup-ing for the longest prefix (btrie:
> find_prefix_longest) in around 100 K elements by prefix it's around 2- 5
> ms and 95% of the time is spent in the ets:lookup.
>
> I think the time spent there is so big because also my term stored there
> is very big.
>
> Any other suggestions ?
>
> Silviu
>
>
>
> _______________________________________________
> erlang-questions mailing listerlang-questions@REDACTED://erlang.org/mailman/listinfo/erlang-questions
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20151022/a4d1269c/attachment.htm>


More information about the erlang-questions mailing list