[erlang-questions] process priority

Jachym Holecek freza@REDACTED
Tue Jul 5 18:07:54 CEST 2011


# Mazen Harake 2011-07-05:
> Writing to mnesia/dets requires no locking, nothing mentionable
> anyway, since the messages (internal ones) are serialized just like
> your gen_server will have its messages serialized (also, use
> dirty-operations).

I was talking about VM level locks -- but like I said, I don't
know if the impact is measurable here.

> This creates end to end flow control, actually even more so because
> you won't need the extra process (your logging process) in between
> the write.

By end-to-end I mean a feedback between message producers and their
consumer (or consumers) -- I don't see how you get such behaviour
with table-based approach. What prevents producers from generating
messages faster than consumer/s can read them?

> Perhaps your small 100B messages work so well with delayed_write
> because you can write many of them to memory before they are flushed
> to a file thus not hogging the disk, but I the bigger messages you
> have the larger your in memory size needs to be to to avoid this.

Sure, delayed_write parameters are configurable in the library I have
in mind. It's really more about avoiding OS overhead for many writes,
the disk itself just has to be fast enough to handle the load -- if
it's not, every buffer wil overrun eventually. It's also a matter of
how much delay are you willing to tolerate between enqueing message
and seeing it on disk; and how many messages are you willing to lose
on tragical VM crash.

> Delayed write does of course work well but I have experience that says
> that writing and buffering it up in tables can be helpful to avoid
> disk thrashing when messages are large (or higher volume). I don't
> remember exactly how much throughput we had (and I don't want to guess
> since it will be mere speculation without having hard data) but it
> helped immensely.
> 
> So I guess OP now have 2 suggestions which of course isn't bad ;)

Certainly. :-)

> One should also keep in mind though that different situation may have
> different needs, would be interesting to see how they would measure
> up.

Sure -- you can't get persistent queues with gen_server-based approach
for instance; it's designed & optimized for relatively small messages
arriving at very high rates.

If you can recall some details about your workload (average message size,
were they iolists/binaries, if iolists how complex were they, how did
flusher process work roughly -- this sort of thing) I could probably
measure the two approaches in various situations (different message
sizes and producer concurrency levels) over the weekend and share the
results (but not the code, sorry, proprietary stuff).

The overall lesson I've learnt from this is that gen_server calls are
dirt-cheap, with a bit of care here and there.

BR,
	-- Jachym



More information about the erlang-questions mailing list