[erlang-questions] IP search optimization
Alexandre Snarskii
snar@REDACTED
Sun Nov 22 17:53:45 CET 2015
On Fri, Nov 20, 2015 at 02:18:04PM -0600, Adam Cammack wrote:
> On Fri, Nov 20, 2015 at 06:01:50PM +0100, Sid Muller wrote:
> > > Looks more like a task for radix trees...
> >
> > Hmm, good point although with ets if my owner process gets killed I
> > inherit the ets and no data is lost. Not sure how that cold be done
> > with a radix tree.
>
> The ETS ordered_set table type is nearly as good for this purpose.
>
> > > The best you can achieve with ets is:
> > > on insert: make sure that inserted networks are normalized (do not have
> > > any bits set after prefix length).
> > > on lookup:
> > > 1. start with masklen of 32
> > > 2. normalize address to current masklen
> > > 3. lookup for address/masklen record, return if found
> > > 4. if current masklen is zero - return 'not_found'
> > > 5. decrease masklen and repeat from step two.
> >
> > There are a few faster methods:
> > 1. Have a pre-scanned list of /CIDR [21, 20...], there are usually
> > about 10. But this still traverses the table 10 times in worst case.
> >
> > 2. Traverse the table with ets:lookup(ets:next(), this has the
> > advantage over #1 as it's much faster in most cases.
> >
> > 3. ets:select(MatchSpec), this is probably the fastest solution
> > because it should traverse the table only once. But right now I'm
> > having a bit of a problem with my select statement... see other email
> > [bitstring select]
>
> These are all slower. ets:match/{2,3}, ets:select/{2,3}, and repeatedly
> calling ets:next/2 should be avoided where possible on large tables
> because of their immense running time.
>
> CIDR blocks can be expressed as a tuple of two integers: the 32-bit
> padded mask and the number of significant bits. We can take advantage of
> Erlang's term ordering[1] and use the ordered_set table type with the
> block as the key {Mask, Length}. In an ordered_set table, ets:prev(Tab,
> {IP, 32}) will find the block that might contain that address[2]. Simple
> bit arithmetic will tell you if the address is in the block or not. This
> is about an order of magnitude faster on my box than doing ets:lookup/2
> for increasingly less precise masks.
This algorithm is good for your test set of non-overlapping networks,
but not in general. Let's assume we add entry {0, 0} (default route)
to your set and try to lookup some non-existant route. In case we look
for 244.42.192.0/20 - ets:prev returns 244.42.64.0/20 (last entry in
your set, which does not match), ets:prev for 244.42.64.0/20 returns
244.42.48.0/20 (does not match again), then .32.0/20 and only after
ONE MILLION lookups you get matching 0.0.0.0/0 which is the right answer.
With decrementing masklen you will get the same result in 32 lookups.
> Your issue with matching the bit strings in a match spec can be resolved
> by storing the bit strings as above and then using the bit arithmetic
> guard expressions (select/1 in the below gist), although I do not
> recommend this.
>
> When in doubt (and especially when certain), measure!
>
> https://gist.github.com/acammack/f631cf5ca978775f10f7
>
> For 100 lookups on a 1,000,000 element table, Erlang/OTP 18.1:
>
> ets:prev/2 (ordered): 9.40000e-5
> ets:select/3 (ordered): 1.45619e+1
> ets:lookup/2 (ordered): 1.72300e-3
> ets:lookup/2 (unordered): 7.94000e-4
>
> [1] Tuples of the same length are ordered by their elements, starting
> with the first.
> [2] If individual addresses are stored in the same table, you will need
> to do an extra ets:lookup/2 for the address. If blocks can overlap,
> you will need to call ets:prev/2 until you get a block that does not
> match the address.
>
> Happy coding!
>
> --
> Adam Cammack
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
More information about the erlang-questions
mailing list