[erlang-questions] What's the best way to select articles by page in a forum?

Tue Jan 13 04:03:00 CET 2009

Bernard Duggan wrote:
> I'm neither an SQL nor mnesia expert (yet...), but I'd ask a couple of
Me too;-)

> things first:
> 
> * Is the memory use of storing twenty items at once really actually a
> big problem?  If they're just forum posts, wouldn't each record mostly
> just be a smallish blob of text (unless you're routinely dealing with
> large attachments).  I suppose if you were running hundreds or thousands
> of such queries at once it could start to mount up, but I see so many
> cases where people assume that something is going to be a performance or
> memory issue without actually looking at any solid data that it's
> usually a good question to ask.
>
Yes, I also think twenty items won't consume too much memory, the problem
is I don't know how to get the twenty items efficiently without traversing
the whole table or the whole result set each time, I need some logic like
this:

get_topics(LastId, LastTimestamp, Count) ->
    F = fun () ->
        qlc:e(qlc:q([Article || Article <- mnesia:table(article, {n_objects, Count}]),
                                Article#article.id =:= Article#article.topic_id,
				Article#article.timestamp >= LastTimestamp]))
	end.
    {atomic, Articles} = mnesia:transaction(F).
    case find_article(Articles, LastId) of
        -1 -> Articles;    % the last article has been deleted
        N -> lists:nthtail(Articles, N)
    end.

I almost get it except I haven't figure out how to get a result set
ordered by timestamp.

> * If you're using transactions in SQL, which I assume you are, then it
> seems to me that the results must end up being stored in memory (or
> somewhere) anyway, even if you only access them one at a time after
> issuing the query.  Otherwise a subsequent transaction could alter or
> delete them before you're done iterating them and where would you pull
> the results from then?
> 
You remind me, I should use one transaction for one query to a page,
not one transaction for queries to all pages. Because I touch only 20 items
in a transaction I guess that won't consume too much memory.

This is not a perfect transaction protected traverse over *all items* but
I think users won't care (even won't be aware of), they just want to view
the articles page by page from the oldest to the newest.

> And the second question kind of leads into what I suspect is the answer.
>  I don't think there's a non-dirty way to do it (possibly not even a
> dirty one - I can't immediately see one), and certainly not without
> holding open a transaction lock for the whole period you need the results.
> 
> I guess if you don't care about transactional integrity, there may be
> ways around it, but it seems like a web forum would care very much about
> such things :)
> 
> Any mnesia experts should of course feel free to point out why I'm an
> idiot :)
> 

You questions are valuable, they help me sort out the solution, thank you
very much!

> Cheers,
> 
> Bernard
> 
> Liu Yubao wrote:
>> Hi,
>>
>> I'm trying to implement a web forum with Erlang + Mnesia and meet a big difficult
>> problem. This is the record to represent an article:
>>
>>   -record(article, {id, parent_id, topic_id, author, title, content, timestamp}).
>>
>> For a new article:
>>     id = md5(author + title + timestamp).
>>     parent_id = <<0>>.
>>     topic_id = id.
>>
>> For a reply:
>>     parent_id = parent's id.
>>     topic_id = parent's topic_id.
>>     id = md5(parent_id + topic_id + author + title + timestamp).
>>
>> I use md5 digests instead of monotone increasing integers as id because I hope
>> there is no centralized counter that is bad for concurrency and distribution.
>>
>> Now the problem is how I select [N, N + 20) topics efficiently and how I select
>> [N, N + 20) articles in a topic efficiently.
>>
>> In SQL I can do like this:
>>
>> 	SELECT * FROM article WHERE id = topic_id ORDER BY timestamp
>>         SELECT * FROM article WHERE topic_id = ? ORDER BY timestamp
>>
>> and then read the result *one by one*, but it seems mnesia doesn't support this
>> traverse, it only support two kinds of traverse:
>>
>>    * traverse whole table
>>      (not efficient)
>>    * get the whole result set immediately and traverse records in it
>>     (consume many memory)
>>
>> Any suggestion is appreciated!
>>
>>
>> Best regards,
>>
>> Liu Yubao
>>
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://www.erlang.org/mailman/listinfo/erlang-questions
> 
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-questions
>