[erlang-questions] Process heap inspector

Mon Nov 28 17:29:56 CET 2011

On 11-28 09:23, Paul Davis wrote:
> On Mon, Nov 28, 2011 at 7:55 AM, Kostis Sagonas <kostis@REDACTED> wrote:
> > On 11/28/2011 08:39 AM, Michal Ptaszek wrote:
> >>
> >> Hi everyone,
> >>
> >> This idea was born in my mind when debugging some complex, live system
> >> and trying to figure out where did all my memory go.
> >>
> >> So, when debugging live system/investigating suspicious memory consumption
> >> patterns
> >> or simply trying to understand better what's going on with our processes,
> >> it might be useful
> >> to take a peep at the data given process operates on.
> >>
> >> ...
> >>
> >> The implementation is rather simple: if the process we probe is not the
> >> caller one (we are not doing
> >> erlang:inspect_heap(self()), the data is copied from the callee heap to
> >> caller heap (to prevent from having
> >> cross-process references in variables), then we compute flat size of the
> >> each term we moved. Also, rootset
> >> is also included in the summary (i.e. process dict, seq tokens, etc.).
> >>
> >> Code is included in my inspect_heap OTP branch on:
> >>  github: https://github.com/paulgray/otp/tree/inspect_heap
> >>
> >> I am still a little bit hesitant about suspending process we probe: can
> >> anyone tell
> >> me if acquiring main process lock would be enough to keep its heap
> >> untouched during
> >> the call?
> >>
> >> Please, do point any bugs and tell me what do you think about the idea.
> >
> > I can see that this may be handy to have at some situations, but provided I
> > understand what is happening at the implementation level (disclaimer: I have
> > not looked at the implementation), I think it's actually a pretty bad idea
> > to include in a non debug-enabled runtime system.
> >
> > The reason is that this breaks all assumptions/invariants of the runtime
> > system in that Erlang processes are independent and can be scheduled to
> > execute concurrently on an SMP without being preempted by anything other
> > than exhausting their reduction step count or being stuck on some receive.
> > With this "built-in feature" processes need to be able to stop at more or
> > less any random point and stay suspended for an indefinite amount of time
> > based on code that _another_ process is executing.
> >
> 
> Bit confused, but wouldn't this objection also apply to
> erlang:suspend_process/2 [1] as well? I use this quite often in
> production on long lived processes that are chewing up resources. Its
> quite the handy tool in certain cases.
> 
> [1] http://erlang.org/doc/man/erlang.html#suspend_process-2
> 

I think problem with such feature, is that it break soft-realtimenes
and preemptibility of all erlang processes. By creating and calling such BIF
you are essentially makeing impossible to schedule other processes,
if you have single scheduler and single CPU.
Most long running BIFs run in separate async threads or are done
in such way that one can stop them in any reasonable point,
and continue later, this way long running BIF is broken
into some (maybe large) incremental steps, which one bringing
you closer to result, but at each transition you can choice
to perform step or go back to scheduler (due reductions exhaustion),
and be scheduled later to continue this steps...

This is for example situation in re module (regular expression) BIFs,
or even simple one like length/1.

So unless such BIF is written in preemptible way, it should not be included
in the non-debug build.

Regards,
Witek