[erlang-questions] Checking if the keys of a list of tuples match a list

Silas silas@REDACTED
Fri Apr 13 17:18:58 CEST 2018


Sorry for the confusing subject.  I couldn't find another way to explain 
this simple problem.  Before just making the question, I'll ask you 
about the architecture of the system because it is probably the root of 
the doubt.

I'm making a small bookmark tag system.  The user adds some information 
about a URL (url, title, etc.) and its corresponding tags.  I have two 
dets tables that use the records:

    -record(url, {url, title, date, description, refs, tags}).
    -record(tag, {tag, url}).

The url record is stored in a "urls" table.

You see that the url record has some information and also a list of tags 
these bookmark has.  To speed up tags lookup I created the "tags" table 
that stores tag records.  So I have something like this:

    #url{url="wikipedia.org", title="Wikipedia", ... , tags=["internet","documentation"]}
    #url{url="erlang.org", title="Erlang", ... , tags=["documentation","programming"]}

    #tag{tag="internet", url="wikipedia.org"}
    #tag{tag="documentation", url="wikipedia.org"}
    #tag{tag="documentation", url="erlang.org"}
    #tag{tag="programming", url="erlang.org"}

(Note I also have tags information in #url record).

It works pretty fine for one tag: I just have to lookup at the tags 
table, retrieve all urls related to that tag and make a lookup for each 
url at the "urls" table.

The problem arises when I tried to implement more than one tag lookup 
with AND, so when searching for "documentation" and "programming" it 
would return "erlang.org" only.

The solution I found works, but it doesn't look quite elegant for me:

a. I first lookup one of the tags the user provided (in the example 
   above, "documentation".

b. I then retrieve all urls matching that that.

c. I then filter out all urls that don't have ALL the remaining tags 
   (in the example above, just "programming").  I do this by looking 
   at the "tags" field of the "url" record.

d. I print all the remaining urls records.

Although it works, I see two problems/questions:

1. Retrieving all urls information and discarding then looks just waste 
   of processing.  I may fix this over a rather complex (and slow) 
   algorithm over "tags" table or using a third table to speed this 
   (maybe {url, tag}?).  Any sugestions?

2. Now a question about Erlang data structures: To check if a list has 
   all elements of another list (regardless of the order), I'm using 
   this:

    % This function returns true if all elements in List2 are in List1, regardless
    % of the order.
    sublistmember(_, []) -> true;
    sublistmember(List1, List2) ->
        [Head|Tail] = List2,
        case lists:member(Head, List1) of
            true -> sublistmember(List1, Tail);
            false -> false
        end.

Its complexity is O(len(List1)) * O(len(List2)) which is not actually a 
problem for small lists (I can limit the number of tags the user may 
provide) but I was wondering if there is a more effective way.  I 
thought about using sets to simplify but list -> set -> list conversion 
may be even slower, right?  Any suggestions on this?

Thanks!

-- 
Silas



More information about the erlang-questions mailing list