[erlang-questions] Checking if the keys of a list of tuples match a list
Silas
silas@REDACTED
Fri Apr 13 17:18:58 CEST 2018
Sorry for the confusing subject. I couldn't find another way to explain
this simple problem. Before just making the question, I'll ask you
about the architecture of the system because it is probably the root of
the doubt.
I'm making a small bookmark tag system. The user adds some information
about a URL (url, title, etc.) and its corresponding tags. I have two
dets tables that use the records:
-record(url, {url, title, date, description, refs, tags}).
-record(tag, {tag, url}).
The url record is stored in a "urls" table.
You see that the url record has some information and also a list of tags
these bookmark has. To speed up tags lookup I created the "tags" table
that stores tag records. So I have something like this:
#url{url="wikipedia.org", title="Wikipedia", ... , tags=["internet","documentation"]}
#url{url="erlang.org", title="Erlang", ... , tags=["documentation","programming"]}
#tag{tag="internet", url="wikipedia.org"}
#tag{tag="documentation", url="wikipedia.org"}
#tag{tag="documentation", url="erlang.org"}
#tag{tag="programming", url="erlang.org"}
(Note I also have tags information in #url record).
It works pretty fine for one tag: I just have to lookup at the tags
table, retrieve all urls related to that tag and make a lookup for each
url at the "urls" table.
The problem arises when I tried to implement more than one tag lookup
with AND, so when searching for "documentation" and "programming" it
would return "erlang.org" only.
The solution I found works, but it doesn't look quite elegant for me:
a. I first lookup one of the tags the user provided (in the example
above, "documentation".
b. I then retrieve all urls matching that that.
c. I then filter out all urls that don't have ALL the remaining tags
(in the example above, just "programming"). I do this by looking
at the "tags" field of the "url" record.
d. I print all the remaining urls records.
Although it works, I see two problems/questions:
1. Retrieving all urls information and discarding then looks just waste
of processing. I may fix this over a rather complex (and slow)
algorithm over "tags" table or using a third table to speed this
(maybe {url, tag}?). Any sugestions?
2. Now a question about Erlang data structures: To check if a list has
all elements of another list (regardless of the order), I'm using
this:
% This function returns true if all elements in List2 are in List1, regardless
% of the order.
sublistmember(_, []) -> true;
sublistmember(List1, List2) ->
[Head|Tail] = List2,
case lists:member(Head, List1) of
true -> sublistmember(List1, Tail);
false -> false
end.
Its complexity is O(len(List1)) * O(len(List2)) which is not actually a
problem for small lists (I can limit the number of tags the user may
provide) but I was wondering if there is a more effective way. I
thought about using sets to simplify but list -> set -> list conversion
may be even slower, right? Any suggestions on this?
Thanks!
--
Silas
More information about the erlang-questions
mailing list