[erlang-questions] Need advice on how to design a tagging web service

Ulf Wiger ulf@REDACTED
Thu Sep 14 13:34:07 CEST 2006


Den 2006-09-14 10:29:55 skrev Christian S <chsu79@REDACTED>:

> Since you never mention what db to use I assume it is mnesia.
>
> You have to understand that mnesia doesnt contain a query planner for
> complex joins. Also, only sufficiently smart relative database
> managers know how to perform queries of the sort you described in an
> efficient way for the tables you listed.


Actually, Mnemosyne does/did optimize queries.

If you read the manual on Mnemosyne, there is a warning not
to use Mnemosyne for performance-critical applications.
The wording in the efficiency Guide is more to the point:

   "Mnesia supports complex queries through the query language
   Mnemosyne. This makes it possible to perform queries of any
   complexity on Mnesia tables. However for simple queries
   Mnemosyne is usually much more expensive than sensible
   handwritten functions doing the same thing."

More information on Mnemosyne and it's current status can be
found in this thread:

http://www.erlang.org/ml-archive/erlang-questions/200502/msg00272.html


> The result of this is that you have to maintain extra tables in mnesia
> for clever indexing of the data, and update them "manually" in your
> mnesia transactions.

Well, mnesia is a bit limited in how you can index data, but
it does support indexes (secondary key indexes), and they
do not have to be maintained manually, of course.



> If you have a thingid and a userid, and want to get the tags it is
> registered under it is a good idea to have a  {userid, thingsid}-tuple
> mapped to a list of tagids
>
>     ( as in   {user_thing_tags, {UserID, ThingId}, [TagId]} )
>
> to find what the thing has been tagged as. by the user
>
> If you have a thingid and want to find the most frequent tags for it,
> then it might be good have a table that maps thingid to userids to
> find users that have the thing.
>
>   (as in {thing_users, ThingId, [UserId]} )
>
> This would involve a query to thing_users, then N queries to
> user_thing_tags (once for each user).  A not all that efficient query
> if it is a thingid that 10000 users have tagged. You might want to
> keep this information precalculated and updated by a background batch
> job. The information would lag behind but this is acceptable for the
> application (del.icio.us does it, afaik).

You can also have an ordered_set table with the key
{UserId,ThingId,TagId} (*). If you occasionally want to
find all occurrences of TagId, you can get it with a
simple select operation. If you keep an attribute [TagId],
select won't help - nor will indexing (unless you use
'rdbms', which is able to index on e.g. all elements
of a list attribute.)

(*) Unlike ets, mnesia requires at least one non-key attribute.
If you have no more attributes, you must add a dummy attribute.

> As the application scales you can begin to add
> more mnesia nodes, and eventually begin to frag hash
> so write transactions do not need to synchronize with
> every mnesia node. Also try to make a single user's
> data end up in the same fragment.

... and this can be done even if UserId is only part of
the key.

One way to keep grouped data in the same fragment is to
write a mnesia_frag_hash callback module. All functions
in mnesia_frag_hash can be reused (just call the originals),
except two:

key_to_frag_number(State, {UserId, _, _}) ->
    mnesia_frag_hash:key_to_frag_number(State, UserId).

and

match_spec_to_frag_numbers(State, MatchSpec) ->
   ... % left as an excercise for the reader.

http://www.erlang.org/doc/doc-5.5.1/lib/mnesia-4.3.2/doc/html/mnesia_frag_hash.html

I had this idea once to write an XML document database
using mnesia, where I would keep the data in ordered sets,
and use this technique to hash on the document id, co-
locating all elements in a single fragment.

I think it would have worked out too, but I ended up
shelving the idea after a feeble attempt at adapting
an implementation of XMLQuery (made in Erlang by
Hans Nilsson). It turned out to be too much work for
something that I actually didn't need myself.

BR,
Ulf W
-- 
Ulf Wiger



More information about the erlang-questions mailing list