[erlang-questions] Need advice on how to design a tagging web service

Thu Sep 14 10:29:55 CEST 2006

Since you never mention what db to use I assume it is mnesia.

You have to understand that mnesia doesnt contain a query planner for
complex joins. Also, only sufficiently smart relative database
managers know how to perform queries of the sort you described in an
efficient way for the tables you listed.

The result of this is that you have to maintain extra tables in mnesia
for clever indexing of the data, and update them "manually" in your
mnesia transactions.

If you have a thingid and a userid, and want to get the tags it is
registered under it is a good idea to have a  {userid, thingsid}-tuple
mapped to a list of tagids

    ( as in   {user_thing_tags, {UserID, ThingId}, [TagId]} )

to find what the thing has been tagged as. by the user

If you have a thingid and want to find the most frequent tags for it,
then it might be good have a table that maps thingid to userids to
find users that have the thing.

  (as in {thing_users, ThingId, [UserId]} )

This would involve a query to thing_users, then N queries to
user_thing_tags (once for each user).  A not all that efficient query
if it is a thingid that 10000 users have tagged. You might want to
keep this information precalculated and updated by a background batch
job. The information would lag behind but this is acceptable for the
application (del.icio.us does it, afaik).

Hint: In mnesia a table supervisor is notified about table updates.
This can be used to 'taint' a given thingid as in need of an updated
tag frequency analysis. This is the place for queuing work as you
described.

You need a similar offline job to precalculate a mapping of tagid to
thingids. To give relevant order in listing you should probably look
into information retrieval methods, so you rank documents after how
many users have given them a tag.

I am confident that with these measures you will be able to support
lots of transactions. As the application scales you can begin to add
more mnesia nodes, and eventually begin to frag hash so write
transactions do not need to synchronize with every mnesia node. Also
try to make a single user's data end up in the same fragment.    None
of these measures will affect your original schema or application
code.

On 9/14/06, Matthew Wilson <matt@REDACTED> wrote:
>
> I want to learn erlang and I'm trying to figure out if this project idea
> might be a good one.
>
> I want to write a network service that applications can use to support
> user-supplied tags, like del.icio.us and flickr.
>
> I need the service to be able to process *lots* of transactions.
>
> clients of the service would do any of the following:
>
>  * add a tag supplied by a user for a thing.
>
>  * look up all pairs of (tags, things) that a given user has supplied.
>
>  * look up all the tags supplied for any users for a given thing.
>
> I'm thinking that the erlang service would sit in front of a database,
> and the database would have this schema:
>
> tags table: (tagid, tagname)
> things table: (thingid, thingname)
> users table: (userid, username)
> gumbo table: (userid, thingid, tagid)
>
> And here's how I'm thinking about designing the service:
>
> One process periodically connects to the database and reads everything
> out to a local in-memory cache.
>
> Another process waits for client requests.  Any client request to get
> data goes to the process with the in-memory cache to get results.
>
> Meanwhile, any request to add data goes to another process that
> accumulates these into a queue.
>
> Yet another process pops elements out of that queue and then writes them
> all to the database.
>
> All comments are welcome.
>
> Is this absurd?
>
> TIA
>
> Matt
>
> --
> A better way of running series of SAS programs:
> http://overlook.homelinux.net/wilsonwiki/SasAndMakefiles
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-questions
>