[erlang-questions] user process suspended
Tue Sep 22 16:18:32 CEST 2015
On 09/22, Samuel wrote:
>We have been collecting crashdumps for all thos instances and all of
>them show the same process pattern: the user process is suspended
>while all others are waiting. Again I suspect that all of them are
>waiting simply because there is not a lot of activity in the node so I
>am inclined to think that the suspended user process is probably the
>We can remote shell into those nodes and we have tested that we can
>create and write files, so IO is not completely blocked.
I've seen similar issues to this on EC2 nodes on AWS that were
outputting content whenever the host instance had major issues (randomly
going bad on disk or whatever), which we more or less chalked up to be
issues with "the cloud being the cloud".
We would be logging a lot of content out and it could suddenly stall
entire nodes. The workaround we had for it was to move all of our
logging output to an asynchronous mechanism with buffering:
The thing is marked experimental, but has been at the core of logging
for logplex (https://github.com/heroku/logplex) for a couple of years
now without a problem.
The big downside you could see from it would be that when the user
process can't deal with content anymore, it will drop output messages on
the floor and just shed load.
Mind you that this may only help if you can notice significant amount of
outputs that could explain the IO stuff stalling. If not, the problem
could very well be elsewhere.
More information about the erlang-questions