[erlang-questions] Efficient way to store message lists in mnesia
Khitai Pang
khitai.pang@REDACTED
Wed Jun 8 09:33:20 CEST 2016
Thank you all for your input and let me provide more background of my
question. We already have a messaging app with about 200,000 active
users, the app uses MQTT publish/subscribe for messaging. Now we plan to
change to a pure erlang solution with customized messaging protocol.
Here is the idea, and please forgive me if I am providing too much
unnecessary details:
Every user can login, send and receive messages from a master client
(mobile phone) and a slave client (iPad, Android pad, Windows, MAC),
master and slave clients of the same user can be online at the same
time. For every user, the master client must eventually have the full
message history, i.e. outgoing messages sent by the user's slave client
and all incoming messages when the master client is offline must be
synced to the master client when it connects. When both clients of a
user are online, incoming messages and sent to both clients, and
outgoing messages from either client are synced to the other client.
Messages which haven't been acked by master stays in the list. Messages
that have been acked by both master and slave are immediately removed
from the list. Messages that are acked by master but not slave also
stays in the list, but the list is limited to the most recent 10 messages.
Server keeps a message list for every user. The list has two
non_neg_integer() versions: start_version and end_version, both starts
from 1. The server also keeps track of the versions acked by
master/slave. Every message has a timestamp of seconds from epoch.
Messages should be kept in order by the time it's sent/received on
server, but the order should not depend on timestamp since multiple
messages can happen in the same second.
Every time a user sends or receives a message, the message is inserted
to the head of the user's message list, and end_version is bumped (+1).
When a user client connects to the server, it sends its own end_version
of its message list to the server, the server then compares it with the
start_version and end_version in db, and sends to the client all
messages that the client doesn't have as well as a new end_version.
Receiving the messages, the client then sends ack with the new end_version.
About 70% of the messages are small text messages, typically less than
48 bytes, 20% of the messages are audio messages typically less than 10k
bytes, the other 10% are thumbnails less than 50k bytes. On average,
every user sends 50 messages every day. No data on master/slave client
usage because currently we only have mobile phone clients.
My idea is to use two set type table:
{user_id, start_version, end_version, [messages]}
{user_id, master_acked_version, slave_acked_version}
It may look pretty naive, anyway, this is my first erlang project and
also my first messaging app. Any advices are appreciated!
Regards,
Khitai
On 2016/6/8 12:22, Richard A. O'Keefe wrote:
> On 8/06/16 11:26 AM, Dmitry Belyaev wrote:
>> Alexander, thanks for a great example of a double-linked list
>> implementation in a database.
>>
>> As we can see one insert operation requires 3 write operations in the
>> database in this case. Anyway this is still a constant factor applied
>> to a normal write operation.
> I can't recall, did this thread start with a presentation of
> - message sizes
> - rates at which messages are produced and consumed
> - number of registered and of active users
> - quality issues such as what proportion of messages may be lost
> that kind of stuff? I mean that's what tells us how many writes
> per message are a problem.
>
> It's also important to distinguish between *worst-case* performance and
> *amortised* performance. A message will go into the doubly linked list
> once (at O(1) cost) and come out of the doubly linked list once (at
> O(1) cost)
> so the amortised cost per message of the doubly linked list is O(1) per
> message. It's an unpleasantly high constant factor, but it is a constant
> factor. Reading and deleting a batch of k messages is then O(k), so a
> burst of stuff can happen at one time, but reading is going to take O(k)
> time anyway, so we're talking about a matter of constant factor.
>
> There are possible hybrid data structures where you choose a batch
> size B and buffer recent messages in memory until there are B of them,
> and run a doubly linked list of batches.
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
More information about the erlang-questions
mailing list