[erlang-questions] Erlang Memory Question

Mon Oct 6 03:52:10 CEST 2014

Thanks Jesper.. good stuff/advice.

Let me digest your suggestions and articles and re-think of better
architecture. Will revert back with results soon.

Cheers,
- Eranga

On Mon, Oct 6, 2014 at 2:47 AM, Jesper Louis Andersen <
jesper.louis.andersen@REDACTED> wrote:

>
> On Sun, Oct 5, 2014 at 9:27 AM, Eranga Udesh <eranga.erl@REDACTED> wrote:
>
>> I found temporary variables, eg. binary_to_list of a XML data say 100 KB
>> in size (Xmerl needs string), won't get freed for a long period of time
>> without force garbage collection. Therefore when there are about 500 user
>> sessions, each process consuming large memory blocks, makes the system
>> memory usage extremely high. We plan to support a large number of user
>> sessions, say 10000s and this memory consumption is a show stopper for us
>> at the moment.
>
>
> Hi!
>
> This is your problem in a nutshell. Calling binary_to_list/1 on a 100KB
> binary blows it up to at least 2.4 megabytes in size. When the process is
> done, it takes a bit of time for the heap to shrink down again. It will
> just get into a serious problem when your system is going to process XML
> documents for a large set of users at the same time. You have two general
> options, which should both be applied in a serious system:
>
> * xmerl is only useful for small configuration blocks of data. If you are
> processing larger amounts of data, you need an XML parser which operates
> directly on the binary representation. In addition, if you can find an XML
> parser which allows you to parse in SAX-style, so you don't have to form an
> intermediate structure will help a lot. In Haskell, particularly GHC,
> fusion optimizations would mostly take of these things, but it doesn't
> exist in the Erlang ecosystem, so you will have to approach it yourself.
> Unfortunately I don't have any suggestion handy, since it is too long since
> I've last worked with XML as a format.
>
> * Your Erlang node() needs to have a way to shed load once it reaches
> capacity. In other words, you design your system up to a certain amount of
> simultaneous users and then you make sure there is a limit to how much
> processing that can happen concurrently. This frames the erlang system so
> it does not break down under the stress if it gets loaded over capacity.
> Fred Hebert has written a book, "Erlang in Anger"[0] which touches on the
> subject in chapter 3 - "Planning for overload". You may have 20.000 users
> on the system, but if you make sure only 100 of those can process XML data
> at the same time, you can at most have 240 megabytes of outstanding memory
> space at the moment. Also, you may want to think about how much time it
> will take K cores to chew through 240 megabytes of data. Reading data is
> expensive.
>
> Irina Guberman (from Ubiquity networks if memory serves) recently had a
> very insightful (and funny!) talk[1] on how she employed the "jobs"
> framework in a situation which is slightly akin to yours. It is highly
> recommended, since she touches on the subject in far more depth than what I
> do here. For a production system I would recommend employing some kind of
> queueing framework early on. Otherwise, you system will just bow under the
> load once it gets deployed.
>
> [0] http://www.erlang-in-anger.com/
> [1]
> https://www.youtube.com/watch?v=1Z_Z8aLIBQ8&list=UUQ7dFBzZGlBvtU2hCecsBBg
>
> --
> J.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20141006/526e99fe/attachment.htm>