[erlang-questions] Efficient way to select a range of data between 2 values?
Rapsey
rapsey@REDACTED
Mon Aug 31 18:35:00 CEST 2009
This is a copy/paste from some programming forum (it was my post):
GeoIP with Erlang
You want to know what country your users are coming from?
Download the DB in CSV format from:
http://www.maxmind.com/app/geolitecountry
The file is about 7.8MB in size and one line in the file looks like this:
"4.20.73.32","4.23.128.183","68438304","68649143","US","United States"
Whats what should be obvious. Lets turn this into something a bit more
useful, a list of tuples: {IntMin, IntMax, CountryCode}. Speed is of no
importance because you only do this once.
code:
{ok, File} = file:read_file(Filename),
Lines = string:tokens(binary_to_list(File), "\n"),
Ranges = lists:map(fun(Line) ->
[_, _, MinStr, MaxStr, Country|_] = string:tokens(Line,
",\""),
{list_to_integer(MinStr), list_to_integer(MaxStr),
list_to_binary(Country)}
end, Lines),
Sorted = lists:keysort(1, Ranges)
Sorted now holds a sorted list of tuples that represent IP ranges, from
lowest IP range to highest.
If we were stupid we could end it here and loop this list for every user, to
find out where he is coming from. A tree structure makes a bit more sense
though.
So lets turn this list into a tree. Put the result of the last function into
this one.
cons_tree([]) ->
undefined;
cons_tree(L) ->
{Left, Right} = lists:split(round(length(L)/2), L),
[{Min, Max, Country}|LeftSide] = lists:reverse(Left),
{Min, Max, Country, cons_tree(lists:reverse(LeftSide)),
cons_tree(Right)}.
And now we have a tree of tuples: {IntMin, IntMax, CountryCode,
SmallerRanges, LargerRanges}.
All we need now is a way to find the country code by IP:
find_key({Min, Max, Country, _Left, _Right}, IP) when IP > Min, IP < Max ->
Country;
find_key({_Min, Max, _Country, _Left, Right}, IP) when IP >= Max ->
find_key(Right, IP);
find_key({Min, _Max, _Country, Left, _Right}, IP) when IP =< Min ->
find_key(Left, IP);
find_key(undefined, _) ->
false.
All we need now is to test how fast it is. Loop takes in number of
iterations and the previously constructed tuple tree.
loop(N, Ranges) ->
io:format("~p~n", [now()]),
quer(N, Ranges),
io:format("~p~n", [now()]).
quer(0, L) ->
L;
quer(N, L) ->
find_key(L, random:uniform(4294967296)),
quer(N-1, L).
Result for 100k:
geoip:loop(100000, T).
{1235,667762,860389}
{1235,667763,299972}
About half a second. Pretty good no?
Sergej
On Mon, Aug 31, 2009 at 6:25 PM, Alexander Uvarov <
alexander.uvarov@REDACTED> wrote:
> I have ~100K ip2country records like {ip_address_map, {range_from,
> range_to, country_code}} and I am searching for fastest way to lookup
> country_code. There should be matching guard like range_from =< IP >=
> range_to. Any ideas, guys? I am not satisfied with ets (lookup time from
> 0.06s up to 1.6s).
>
> This is my current code (original
> http://code.google.com/p/ip2country-erlang/source/browse/trunk/ip2country.erl
> ):
>
> -record(ip_address_map, {range_from, range_to, country_code}).
>
> start() ->
> ets:new(ip, [named_table, public, {keypos, 2}]).
>
> lookup(IP) ->
> MatchHead = #ip_address_map{range_from='$1', range_to='$2',
> country_code='$3'},
> Guard = [{'=<', '$1', IP}, {'>=', '$2', IP}],
> Result = '$3',
> ets:select(ip, [{MatchHead, Guard, [Result]}], 1).
>
>
> ________________________________________________________________
> erlang-questions mailing list. See http://www.erlang.org/faq.html
> erlang-questions (at) erlang.org
>
>
More information about the erlang-questions
mailing list