[erlang-questions] CPU overhead of supervision and lager

Thu Aug 28 22:22:50 CEST 2014

Hi all,

I've made a recent change to an app to improve its supervision strategy and
to use lager to help with things like log readability and log rotation.

Regarding supervision, the app was changed from a shallow-wide supervision
hierarchy to one that has an intermediate level of supervision.  The result
was to go from 1 supervisor that managed about 20 gen_servers to adding
around 5 supervisors at the 2nd level that manage 3 to 5 gen_servers each.

I stress tested the application before and after these changes and got some
unexpected results.

In the post-change test the application exited without writing anything to
the sasl log (or the lager crash and error logs). The sasl log didn't show
the expected start up messages though so I'm suspicious that sasl log was
somehow corrupted.  The application log did contain all the expected
entries up until the point of failure. CPU utilization at the time of
failure was at 300% on a 4 CPU box.  I wasn't actively monitoring memory
use, but I don't have a strong reason to suspect that was a problem (other
than the application exiting).

The pre-change test went well past the load level that caused the
post-change test to fail. The only difference between the 2 versions, aside
from the supervision and lager changes, was that the pre-change version was
using about 10% less CPU for the same load.  One item of note, using
Observer I noticed that lager (lager_event) was getting quite a bit of the
CPU resources measured in terms of reductions.

Given the lack of sasl log or lager error/crash log information, I'm having
a hard time understanding how or why these changes would cause a failure at
the same processing load and why the pre-change version was able to scale
well beyond the post-change version's failure point.  The slight change in
CPU utilization doesn't appear significant given that the CPU load at the
time of the failure was more or less at 75% of capacity.

I'm wondering if anyone has seen similar behavioral differences when
introducing supervision and/or lager, and if supervision and lager might
account for the slight CPU utilization differences, at the same load,
between the 2 versions.

Thanks,
Rich
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20140828/8d4d1dd9/attachment.htm>