[erlang-questions] Erlang Memory Question

Sun Oct 5 23:17:41 CEST 2014

On Sun, Oct 5, 2014 at 9:27 AM, Eranga Udesh <eranga.erl@REDACTED> wrote:

> I found temporary variables, eg. binary_to_list of a XML data say 100 KB
> in size (Xmerl needs string), won't get freed for a long period of time
> without force garbage collection. Therefore when there are about 500 user
> sessions, each process consuming large memory blocks, makes the system
> memory usage extremely high. We plan to support a large number of user
> sessions, say 10000s and this memory consumption is a show stopper for us
> at the moment.

Hi!

This is your problem in a nutshell. Calling binary_to_list/1 on a 100KB
binary blows it up to at least 2.4 megabytes in size. When the process is
done, it takes a bit of time for the heap to shrink down again. It will
just get into a serious problem when your system is going to process XML
documents for a large set of users at the same time. You have two general
options, which should both be applied in a serious system:

* xmerl is only useful for small configuration blocks of data. If you are
processing larger amounts of data, you need an XML parser which operates
directly on the binary representation. In addition, if you can find an XML
parser which allows you to parse in SAX-style, so you don't have to form an
intermediate structure will help a lot. In Haskell, particularly GHC,
fusion optimizations would mostly take of these things, but it doesn't
exist in the Erlang ecosystem, so you will have to approach it yourself.
Unfortunately I don't have any suggestion handy, since it is too long since
I've last worked with XML as a format.

* Your Erlang node() needs to have a way to shed load once it reaches
capacity. In other words, you design your system up to a certain amount of
simultaneous users and then you make sure there is a limit to how much
processing that can happen concurrently. This frames the erlang system so
it does not break down under the stress if it gets loaded over capacity.
Fred Hebert has written a book, "Erlang in Anger"[0] which touches on the
subject in chapter 3 - "Planning for overload". You may have 20.000 users
on the system, but if you make sure only 100 of those can process XML data
at the same time, you can at most have 240 megabytes of outstanding memory
space at the moment. Also, you may want to think about how much time it
will take K cores to chew through 240 megabytes of data. Reading data is
expensive.

Irina Guberman (from Ubiquity networks if memory serves) recently had a
very insightful (and funny!) talk[1] on how she employed the "jobs"
framework in a situation which is slightly akin to yours. It is highly
recommended, since she touches on the subject in far more depth than what I
do here. For a production system I would recommend employing some kind of
queueing framework early on. Otherwise, you system will just bow under the
load once it gets deployed.

[0] http://www.erlang-in-anger.com/
[1]
https://www.youtube.com/watch?v=1Z_Z8aLIBQ8&list=UUQ7dFBzZGlBvtU2hCecsBBg

-- 
J.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20141005/7b6bb4c8/attachment.htm>