[erlang-questions] Keyword searching+

Christian chsu79@REDACTED
Mon Jan 19 15:29:48 CET 2009


> The site I'm working on needs to display images that area searchable by
> using keywords (for example something like Cats AND Dogs...). Do you have
> any suggestions or know how one can do so in erlang? For example, lets say
> before uploading the image I attach a bunch of key words to it, lets say in
> the following manner
> {[Keyword1...Keywordn],Image}
> Any suggestions or tips would be appreciated,

This is an inverted index. Same thing search engines uses, except you
dont extract the search terms from within the documents, but supply
them from the "outside".

invert({Ks, Image}) ->
   [{K, Image} || K <- Ks].

%% invert({[a,b,c,d], imageid}) gives you
%% [{a,imageid},{b,imageid},{c,imageid},{d,imageid}]

If you invert every image to a list like this, then you can build your
inverted index from the lists. You probably need to store this on disc
because they grow pretty large. I'm going to use a dict though.

update_index(KIs, Index0) ->
   lists:foldl(fun({K, I}, Acc) ->  dict:append(K, I, Acc) end, Index0, KIs).

And lets say that we want to build up this inverted index from a set
of documents that look like this:

Docs =
   [{[feathers,beak,taste_like_chicken],duck},
    {[feathers,teeth,extinct],dinosaur},
    {[beak,taste_like_chicken],platypus}].

build_index(Docs) ->
	Index = dict:new(),
	lists:foldl(fun (Doc, Acc) -> update_index(invert(Doc), Acc) end,
					Index, Docs).
Then we can get this:

4> dict:to_list(index:build_index(Docs)).
[{feathers,[duck,dinosaur]},
 {extinct,[dinosaur]},
 {teeth,[dinosaur]},
 {beak,[duck,platypus]},
 {taste_like_chicken,[duck,platypus]}]

Thus you can get a list of images that have a given keyword associated
with it. Set operations allow you to to find intersections such as
Images that has Keyword1 & Keyword2, or subtractions as in Keyword1 &
not Keyword2.


There.  Now you too can be google.  http://gist.github.com/49008



More information about the erlang-questions mailing list