[erlang-questions] Efficient way to store message lists in mnesia

Khitai Pang khitai.pang@REDACTED
Wed Jun 8 09:33:20 CEST 2016


Thank you all for your input and let me provide more background of my 
question.  We already have a messaging app with about 200,000 active 
users, the app uses MQTT publish/subscribe for messaging. Now we plan to 
change to a pure erlang solution with customized messaging protocol. 
Here is the idea, and please forgive me if I am providing too much 
unnecessary details:

Every user can login, send and receive messages from a master client 
(mobile phone) and a slave client (iPad, Android pad, Windows, MAC), 
master and slave clients of the same user can be online at the same 
time.  For every user, the master client must eventually have the full 
message history, i.e. outgoing messages sent by the user's slave client 
and all incoming messages when the master client is offline must be 
synced to the master client when it connects.  When both clients of a 
user are online, incoming messages and sent to both clients, and 
outgoing messages from either client are synced to the other client. 
Messages which haven't been acked by master stays in the list.  Messages 
that have been acked by both master and slave are immediately removed 
from the list.  Messages that are acked by master but not slave also 
stays in the list, but the list is limited to the most recent 10 messages.

Server keeps a message list for every user. The list has two 
non_neg_integer() versions: start_version and end_version, both starts 
from 1.  The server also keeps track of the versions acked by 
master/slave.  Every message has a timestamp of seconds from epoch. 
Messages should be kept in order by the time it's sent/received on 
server, but the order should not depend on timestamp since multiple 
messages can happen in the same second.

Every time a user sends or receives a message, the message is inserted 
to the head of the user's message list, and end_version is bumped (+1).

When a user client connects to the server, it sends its own end_version 
of its message list to the server, the server then compares it with the 
start_version and end_version in db, and sends to the client all 
messages that the client doesn't have as well as a new end_version. 
Receiving the messages, the client then sends ack with the new end_version.

About 70% of the messages are small text messages, typically less than 
48 bytes, 20% of the messages are audio messages typically less than 10k 
bytes, the other 10% are thumbnails less than 50k bytes. On average, 
every user sends 50 messages every day.  No data on master/slave client 
usage because currently we only have mobile phone clients.

My idea is to use two set type table:

{user_id, start_version, end_version, [messages]}
{user_id, master_acked_version, slave_acked_version}

It may look pretty naive, anyway, this is my first erlang project and 
also my first messaging app.  Any advices are appreciated!


Regards,
Khitai

On 2016/6/8 12:22, Richard A. O'Keefe wrote:
> On 8/06/16 11:26 AM, Dmitry Belyaev wrote:
>> Alexander, thanks for a great example of a double-linked list
>> implementation in a database.
>>
>> As we can see one insert operation requires 3 write operations in the
>> database in this case. Anyway this is still a constant factor applied
>> to a normal write operation.
> I can't recall, did this thread start with a presentation of
>  - message sizes
>  - rates at which messages are produced and consumed
>  - number of registered and of active users
>  - quality issues such as what proportion of messages may be lost
> that kind of stuff?  I mean that's what tells us how many writes
> per message are a problem.
>
> It's also important to distinguish between *worst-case* performance and
> *amortised* performance.  A message will go into the doubly linked list
> once (at O(1) cost) and come out of the doubly linked list once (at
> O(1) cost)
> so the amortised cost per message of the doubly linked list is O(1) per
> message.  It's an unpleasantly high constant factor, but it is a constant
> factor.  Reading and deleting a batch of k messages is then O(k), so a
> burst of stuff can happen at one time, but reading is going to take O(k)
> time anyway, so we're talking about a matter of constant factor.
>
> There are possible hybrid data structures where you choose a batch
> size B and buffer recent messages in memory until there are B of them,
> and run a doubly linked list of batches.
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions




More information about the erlang-questions mailing list