[erlang-questions] segfaults in cowboy rest server
Michael Truog
mjtruog@REDACTED
Fri Apr 10 08:44:35 CEST 2015
On 04/09/2015 07:22 AM, Garry Hodgson wrote:
> i've got a problem i'm trying to solve wrt controlling memory
> consumption in a cloud environment. i've got a server that
> receives log data from security appliances and stores in a
> mariadb database. logs are sent to us via RESTful api calls,
> with batches of logs embedded in the json body of a POST
> call. they can get rather large, and we get a lot of them,
> at a high rate.
>
> when production load increased beyond what was anticipated
> (doesn't it always?) we began having failures, with the server
> disappearing without a trace. in some cases oom-killer killed
> it, in others it would fail trying to allocate memory. we only
> saw the latter by running in erlang shell and waiting until
> it died, then we saw a terse error message.
>
> to prevent this, i added a check in service_available() to
> see if erlang:memory( total ) + content-length > some threshold,
> and reject the request if so. also, having read the recent threads
> about garbage collecting binaries, i added a timer to check every
> 30 seconds that forces gc on all processes if memory usage
> is too high.
>
> this seems to work pretty well, except that after a few days
> of running, we get hard crashes, with segfaults showing up
> in /var/log/messages:
>
> kernel: beam[18098]: segfault at 7f09a004040c ip 000000000049e209 sp 00007fff860d32b0 error 4 in beam[400000+2ce000]
>
> kernel: beam[14177]: segfault at 7fce288829bc ip 000000000049e209 sp 00007fffa0d2d7a0 error 4 in beam[400000+2ce000]
>
> i've been using erlang for 15 years, and have never seen a segfault.
> we've recently updated from r15b02 to r17.4, and we've also
> switched from webmachine to cowboy. i don't know if either of
> those things are relevant. i'm kind of at a loss as to how to diagnose
> or deal with this.
>
> any advice would be greatly appreciated.
>
If you have any port drivers or NIFs that are used in the system, it would be best to examine them closely for errors hiding within. For example, a port driver that is written with C++ that throws an exception into the Erlang VM (for example, a timeout exception that has never before occurred) will crash the Erlang VM in new and exciting ways that can include seg faults.
If you are using maps, that could also be a cause, since there are still bugs being worked on.
More information about the erlang-questions
mailing list