[erlang-questions] CPU overhead of supervision and lager

Thu Aug 28 22:58:13 CEST 2014

If you didn't change your calls to error_logger to lager calls (with
the level included), you can't take advantage of the optimizations
that lager provides, aside from it not ballooning in memory usage. If
you use the latter and the built-in parse_transform, log messages will
be filtered on the sender side, not at the event handler, which scales
much better under load.

On Thu, Aug 28, 2014 at 3:22 PM, Youngkin, Rich
<richard.youngkin@REDACTED> wrote:
> Hi all,
>
> I've made a recent change to an app to improve its supervision strategy and
> to use lager to help with things like log readability and log rotation.
>
> Regarding supervision, the app was changed from a shallow-wide supervision
> hierarchy to one that has an intermediate level of supervision.  The result
> was to go from 1 supervisor that managed about 20 gen_servers to adding
> around 5 supervisors at the 2nd level that manage 3 to 5 gen_servers each.
>
> I stress tested the application before and after these changes and got some
> unexpected results.
>
> In the post-change test the application exited without writing anything to
> the sasl log (or the lager crash and error logs). The sasl log didn't show
> the expected start up messages though so I'm suspicious that sasl log was
> somehow corrupted.  The application log did contain all the expected entries
> up until the point of failure. CPU utilization at the time of failure was at
> 300% on a 4 CPU box.  I wasn't actively monitoring memory use, but I don't
> have a strong reason to suspect that was a problem (other than the
> application exiting).
>
> The pre-change test went well past the load level that caused the
> post-change test to fail. The only difference between the 2 versions, aside
> from the supervision and lager changes, was that the pre-change version was
> using about 10% less CPU for the same load.  One item of note, using
> Observer I noticed that lager (lager_event) was getting quite a bit of the
> CPU resources measured in terms of reductions.
>
> Given the lack of sasl log or lager error/crash log information, I'm having
> a hard time understanding how or why these changes would cause a failure at
> the same processing load and why the pre-change version was able to scale
> well beyond the post-change version's failure point.  The slight change in
> CPU utilization doesn't appear significant given that the CPU load at the
> time of the failure was more or less at 75% of capacity.
>
> I'm wondering if anyone has seen similar behavioral differences when
> introducing supervision and/or lager, and if supervision and lager might
> account for the slight CPU utilization differences, at the same load,
> between the 2 versions.
>
> Thanks,
> Rich
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>

-- 
Sean Cribbs <sean@REDACTED>
Software Engineer
Basho Technologies, Inc.
http://basho.com/