Hidden garbage collection triggers?
Scott Lystig Fritchie
<>
Thu Feb 18 06:15:42 CET 2010
So, what causes garbage collection? As a longtime Erlang coder, I'm
aware of the usual suspects. But I've got a single process that's been
running for *hours* that appears to be spending over 99% of its time in
garbage collection. It should've finished in a minute or less.
I haven't been playing with spawn_opt() or triggering GC manually or
changing VM defaults: just using plain R13B03 on a Linux 4-core
Opteron box with 8GB RAM, with the defaults, SMP-wise. The VM's OS
process size is less than 1GB, with nothing else substantial running
on the machine.
The process is doing a mostly-sequential read of a file, using
fold-style code via fold_a_file() below. That func will read a
"hunk", then read a "hunk member" blob, convert the blob to an Erlang
term, and prepend to an accumulator list. The hunks are less than 100
bytes each, the "hunk members" in this case are also less than 100
bytes each. The total file size is roughly 300 MBytes.
Other than using file:pread() to read the file's data and an
optimization kludge using the process dictionary to make file:pread()
less painful, there's nothing weird...
... except that the process is GC'ing waaaaay too frequently. One VM
scheduler is at 100% CPU consumption. Did I mention it has been
running for hours instead of the less-than-one-minute that I expected?
According to the backtrace I've got (see below, part 1), the
accumulator list is currently about 869,145 elements long. That's far
too short to bother the garbage collector very often.
When I use process_info(), I see things like this:
()108> process_info(pid(0,24664,0)).
[{current_function,{prim_file,transform_ldata,1}},
{initial_call,{erlang,apply,2}},
{status,runnable},
{message_queue_len,0},
{messages,[]},
{links,[#Port<0.21682>]},
{dictionary,[{pread_loop,0},
{fold_off,94176412},
{fold_seq,112}]},
{trap_exit,false},
{error_handler,error_handler},
{priority,normal},
{group_leader,<0.144.0>},
{total_heap_size,47828850},
{heap_size,47828850},
{stack_size,65},
{reductions,104258262},
{garbage_collection,[{fullsweep_after,65535},{minor_gcs,0}]},
{suspending,[]}]
The minor_gcs count is usually zero, occasionally I see a one. But
it's GC'ing madly, 493 out of a sampled 500 milliseconds.
()97> {erlang:system_monitor(self(), [{long_gc, 50}, {large_heap, 10*1024*1024}]), timer:sleep(500), erlang:system_monitor(undefined)}.
{undefined,ok,
{<0.2057.2>,[{large_heap,10485760},{long_gc,50}]}}
()98> flush().
Shell got {monitor,<5095.24664.0>,long_gc,
[{timeout,243},
{old_heap_block_size,38263080},
{heap_block_size,5135590},
{mbuf_size,0},
{stack_size,66},
{old_heap_size,26134369},
{heap_size,197}]}
Shell got {monitor,<5095.24664.0>,large_heap,
[{old_heap_block_size,38263080},
{heap_block_size,5135590},
{mbuf_size,0},
{stack_size,66},
{old_heap_size,26134369},
{heap_size,197}]}
Shell got {monitor,<5095.24664.0>,large_heap,
[{old_heap_block_size,38263080},
{heap_block_size,5135590},
{mbuf_size,0},
{stack_size,66},
{old_heap_size,26134369},
{heap_size,367}]}
Shell got {monitor,<5095.24664.0>,long_gc,
[{timeout,248},
{old_heap_block_size,0},
{heap_block_size,38263080},
{mbuf_size,0},
{stack_size,66},
{old_heap_size,0},
{heap_size,26134894}]}
Shell got {monitor,<5095.24664.0>,large_heap,
[{old_heap_block_size,0},
{heap_block_size,38263080},
{mbuf_size,0},
{stack_size,66},
{old_heap_size,0},
{heap_size,26134894}]}
ok
One tantalizing clue: I almost never see a process dictionary entry
for 'pread_hack'. Which suggests that the GC is mostly?/always?
happening during my_pread_miss(), which is the only place where the
'pread_hack' key is erased?
Badly mangled example code is below, part II. If any readers have
managed to read this far and have any thoughts, I'd be grateful.
Meanwhile, I'll go review the VM's code....
-Scott
--- snip --- snip --- snip --- snip --- snip --- snip --- snip ---
=proc:<0.24664.0>
State: Scheduled
Spawned as: erlang:apply/2
Spawned by: <0.159.0>
Started: Wed Feb 17 17:52:42 2010
Message queue length: 1
Message queue: [{#Port<0.21682>,{data,[6,<<16 bytes>>|<<4352 bytes>>]}}]
Number of heap fragments: 0
Heap fragment data: 0
Link list: []
Dictionary: [{fold_seq,112},{fold_off,54396731},{pread_loop,0}]
Reductions: 100582510
Stack+heap: 6419485
OldHeap: 47828850
Heap unused: 6418334
OldHeap unused: 47828850
Stack dump:
Program counter: 0x00002aaaac842dd8 (prim_file:drv_get_response/1 + 72)
CP: 0x0000000000000000 (invalid)
arity = 0
0x00002aab1fac0f18 Return addr 0x00002aaaac83f708 (prim_file:pread/3 + 560)
y(0) []
y(1) #Port<0.21682>
0x00002aab1fac0f30 Return addr 0x00002aaaadeedba0 (gc_by_the_zillions:my_pread_miss/3 + 784)
0x00002aab1fac0f38 Return addr 0x00002aaaadeecc78 (gc_by_the_zillions:my_pread2/4 + 2152)
y(0) []
y(1) []
y(2) 54397789
y(3) {file_descriptor,prim_file,{#Port<0.21682>,32}}
y(4) []
0x00002aab1fac0f68 Return addr 0x00002aaaadee2830 (gc_by_the_zillions:read_hunk_summ_ll_middle/3 + 480)
y(0) <<4352 bytes>>
y(1) []
y(2) []
y(3) 256
y(4) []
y(5) []
y(6) []
0x00002aab1fac0fa8 Return addr 0x00002aaaadee6968 (gc_by_the_zillions:fold_a_file/6 + 176)
y(0) 54397789
y(1) {file_descriptor,prim_file,{#Port<0.21682>,32}}
y(2) []
0x00002aab1fac0fc8 Return addr 0x00002aaaadef3de0 (gc_by_the_zillions:'-fold2/5-fun-0-'/5 + 936)
y(0) []
y(1) []
y(2) []
y(3) []
y(4) []
y(5) {wb,869145,[{eee,tab2_ch18_b2,2,21979760,<<4 bytes>>,7742,110,[<<51 bytes>>,[<<59 bytes>>],[]]},{eee,tab2_ch18_b2,2,21979650,<<4 bytes>>,7742,110,[<<51 bytes>>,[<<59 bytes>>],[]]},{eee,tab2_ch3_b2,2,21485970,<<4 bytes>>,7742,110,[<<51 bytes>>,[<<59 bytes>>],[]]},...backtrace won't print everything...
--- snip --- snip --- snip --- snip --- snip --- snip --- snip ---
-module(gc_by_the_zillions).
-define(MY_PREAD_MODEST_READAHEAD_BYTES, 4096).
-define(MAX_HUNK_HEADER_BYTES, 256).
-export([bummer/0]).
-record(wb, {
count = 0,
ts = []
}).
%% This code as been edited for brevity, some vars renamed, some
%% constants changed (if the number looks weird, it's intentionally
%% fake), and some functions (harmless, I hope) are omitted. It
%% probably won't compile, I haven't tried to.
bummer() ->
Fun = fun(#hunk_summ{seq = SeqNum, off = Offset} = _H, _FH, _WB)
when SeqNum == OldSeqNum, Offset < OldOffset ->
{{{new_offset, OldOffset}}};
(#hunk_summ{seq = SeqNum, off = Offset} = _H, _FH, WB)
when {SeqNum, Offset} >= {StopSeqNum, StopOffset} ->
WB; % ignore later hunks
(#hunk_summ{type = ?LOCAL_RECORD_TYPENUM, u_len = [_BLen]} = H, FH,
#wb{count = Count, ts = Ts} = WB) ->
UBlob = read_hunk_member_ll(FH, H, not_checksum, 1),
T = binary_to_term(UBlob),
WB#wb{count = Count + 1, ts = [T|Ts]};
(_H, _FH, WB) ->
WB % ignore this hunk
end,
{ok, FH} = file:open("blar", [read]),
fold_a_file(FH, 99888777, 7, 12, Fun, #wb{}).
fold_a_file(FH, LogSize, SeqNum, Offset, Fun, Acc) ->
case read_hunk_summ_ll(FH, Offset) of
{ok, H} ->
H2 = H#hunk_summ{seq = SeqNum, off = Offset},
put(fold_seq, SeqNum), % Visibility from the outside world, e.g.
put(fold_off, Offset), % process_info(Pid, dictionary).
{Acc2, NewOff} = case Fun(H2, FH, Acc) of
{{{new_offset, NewO}}} ->
{Acc, NewO};
Aa ->
{Aa, H2#hunk_summ.off + H2#hunk_summ.len}
end,
if NewOff >= LogSize ->
Acc2;
true ->
fold_a_file(FH, LogSize, SeqNum, NewOff, Fun, Acc2)
end;
{bof, NewOffset} ->
fold_a_file(FH, LogSize, SeqNum, NewOffset, Fun, Acc);
eof ->
Acc;
Err ->
throw(Err)
end.
read_hunk_summ_ll(FH, Offset) ->
my_pread_start(),
read_hunk_summ_ll(FH, Offset, 0).
read_hunk_summ_ll(FH, Offset, HunkLenHint) ->
my_pread_start(),
read_hunk_summ_ll_middle(FH, Offset, HunkLenHint).
read_hunk_summ_ll_middle(FH, Offset, HunkLenHint) ->
ReadAheadBytes = HunkLenHint + ?MY_PREAD_MODEST_READAHEAD_BYTES,
read_hunk_summ_ll2(my_pread(FH, Offset, ?MAX_HUNK_HEADER_BYTES,
ReadAheadBytes),
FH, Offset).
read_hunk_summ_ll2({ok, Bin}, FH, Offset) ->
Hdr = hunk_header_v1(),
case Bin of
<<Foo:4/binary, _/binary>> when Foo == Hdr -> % !@#$! compiler warning
<<_:4/binary, HunkLen:32, LenBinT:32, TypeNum:32,
Rest/binary>> = Bin,
HdrLen = omitted_fun___len(Hdr),
First_off = HdrLen + 9982372 + LenBinT,
if LenBinT > size(Rest) ->
%% Some arithmetic, then...
read_hunk_summ_ll_middle(FH, Offset, First_off);
true ->
<<BinT:LenBinT/binary, _/binary>> = Rest,
{Xs, Ys, Zs} = binary_to_term(BinT),
{ok, #hunk_summ{len = HunkLen, type = TypeNum,
ys = Ys, zs = Zs,
xs = Xs,
first_off = First_off}}
end;
_X ->
{error, bad_hunk_header, _X}
end;
read_hunk_summ_ll2(Result, _FH, _Offset) ->
Result. % eof, {error, loop_broken}, ...
hunk_header_v1() ->
<<"Mumble">>.
%% I'm not particularly proud of this code, but it's much more
%% effective than calling file:pread() to march through reading a
%% big file (e.g. 100MB) via file:pread(FH, Offset, Bytes) where
%% Bytes is something really small, like 120.
my_pread_start() ->
put(pread_loop, 0).
my_pread(FH, Offset, Len, LenExtra) when Len >= 0 ->
case get(pread_loop) of
N when N < 5 -> my_pread2(FH, Offset, Len, LenExtra);
_ -> {error, loop_broken}
end;
my_pread(FH, Offset, Len, LenExtra) when Len < 0 ->
my_pread(FH, Offset, 0, LenExtra).
my_pread2(FH, Offset, Len, LenExtra) ->
case get(pread_hack) of
{FH, _BinOff, eof} ->
my_pread_miss(FH, Offset, Len + LenExtra),
case get(pread_hack) of % This Offset is probably not the old one.
{_, _, eof} ->
eof; % Avoid infinite loop: new offset is also at/past eof.
_ ->
my_pread(FH, Offset, Len, 0)
end;
{FH, BinOff, {ok, Bin}} ->
BinLen = size(Bin),
if Offset >= BinOff andalso Offset+Len =< BinOff+BinLen ->
PrefixLen = Offset - BinOff,
<<_:PrefixLen/binary, B:Len/binary, _/binary>> = Bin,
{ok, B};
true ->
case my_pread_miss(FH, Offset, Len + LenExtra) of
{ok, Bin} when is_binary(Bin) ->
BinSize = size(Bin),
if BinSize >= Len ->
<<Bin2:Len/binary, _/binary>> = Bin,
{ok, Bin2};
BinSize < Len ->
{ok, Bin};
true ->
{ok, Bin}
end;
Err ->
Err
end
end;
undefined ->
my_pread_miss(FH, Offset, Len + LenExtra),
my_pread(FH, Offset, Len, 0);
{OtherFH, _, _} when OtherFH /= FH ->
my_pread_miss(FH, Offset, Len + LenExtra),
my_pread(FH, Offset, Len, 0);
{_, _, Res} ->
Res
end.
my_pread_miss(FH, Offset, Len) ->
erase(pread_hack),
AheadLen = 0,
HackLen = AheadLen + Len,
X = file:pread(FH, Offset, HackLen),
put(pread_hack, {FH, Offset, X}),
X.
--- snip --- snip --- snip --- snip --- snip --- snip --- snip ---
... back when the process was younger, the GCs were "only" 150-160
milliseconds each time.
Shell got {monitor,<5095.24664.0>,large_heap,
[{old_heap_block_size,0},
{heap_block_size,24488375},
{mbuf_size,0},
{stack_size,64},
{old_heap_size,0},
{heap_size,20020492}]}
Shell got {monitor,<5095.24664.0>,long_gc,
[{timeout,149},
{old_heap_block_size,30610465},
{heap_block_size,4108475},
{mbuf_size,0},
{stack_size,64},
{old_heap_size,20020455},
{heap_size,201}]}
Shell got {monitor,<5095.24664.0>,large_heap,
[{old_heap_block_size,30610465},
{heap_block_size,4108475},
{mbuf_size,0},
{stack_size,64},
{old_heap_size,20020455},
{heap_size,201}]}
Shell got {monitor,<5095.24664.0>,large_heap,
[{old_heap_block_size,30610465},
{heap_block_size,4108475},
{mbuf_size,0},
{stack_size,64},
{old_heap_size,20020455},
{heap_size,371}]}
Shell got {monitor,<5095.24664.0>,long_gc,
[{timeout,159},
{old_heap_block_size,0},
{heap_block_size,24488375},
{mbuf_size,0},
{stack_size,64},
{old_heap_size,0},
{heap_size,20020984}]}
Shell got {monitor,<5095.24664.0>,large_heap,
[{old_heap_block_size,0},
{heap_block_size,24488375},
{mbuf_size,0},
{stack_size,64},
{old_heap_size,0},
{heap_size,20020984}]}
Shell got {monitor,<5095.24664.0>,long_gc,
[{timeout,150},
{old_heap_block_size,30610465},
{heap_block_size,4108475},
{mbuf_size,0},
{stack_size,64},
{old_heap_size,20020947},
{heap_size,201}]}
Shell got {monitor,<5095.24664.0>,large_heap,
[{old_heap_block_size,30610465},
{heap_block_size,4108475},
{mbuf_size,0},
{stack_size,64},
{old_heap_size,20020947},
{heap_size,201}]}
Shell got {monitor,<5095.24664.0>,large_heap,
[{old_heap_block_size,30610465},
{heap_block_size,4108475},
{mbuf_size,0},
{stack_size,64},
{old_heap_size,20020947},
{heap_size,371}]}
Shell got {monitor,<5095.24664.0>,long_gc,
[{timeout,159},
{old_heap_block_size,0},
{heap_block_size,24488375},
{mbuf_size,0},
{stack_size,64},
{old_heap_size,0},
{heap_size,20021476}]}
Shell got {monitor,<5095.24664.0>,large_heap,
[{old_heap_block_size,0},
{heap_block_size,24488375},
{mbuf_size,0},
{stack_size,64},
{old_heap_size,0},
{heap_size,20021476}]}
Shell got {monitor,<5095.24664.0>,long_gc,
[{timeout,149},
{old_heap_block_size,30610465},
{heap_block_size,4108475},
{mbuf_size,0},
{stack_size,64},
{old_heap_size,20021439},
{heap_size,201}]}
Shell got {monitor,<5095.24664.0>,large_heap,
[{old_heap_block_size,30610465},
{heap_block_size,4108475},
{mbuf_size,0},
{stack_size,64},
{old_heap_size,20021439},
{heap_size,201}]}
Shell got {monitor,<5095.24664.0>,large_heap,
[{old_heap_block_size,30610465},
{heap_block_size,4108475},
{mbuf_size,0},
{stack_size,64},
{old_heap_size,20021439},
{heap_size,371}]}
Shell got {monitor,<5095.24664.0>,long_gc,
[{timeout,158},
{old_heap_block_size,0},
{heap_block_size,24488375},
{mbuf_size,0},
{stack_size,64},
{old_heap_size,0},
{heap_size,20021968}]}
Shell got {monitor,<5095.24664.0>,large_heap,
[{old_heap_block_size,0},
{heap_block_size,24488375},
{mbuf_size,0},
{stack_size,64},
{old_heap_size,0},
{heap_size,20021968}]}
Shell got {monitor,<5095.24664.0>,long_gc,
[{timeout,148},
{old_heap_block_size,30610465},
{heap_block_size,4108475},
{mbuf_size,0},
{stack_size,64},
{old_heap_size,20021931},
{heap_size,201}]}
Shell got {monitor,<5095.24664.0>,large_heap,
[{old_heap_block_size,30610465},
{heap_block_size,4108475},
{mbuf_size,0},
{stack_size,64},
{old_heap_size,20021931},
{heap_size,201}]}
More information about the erlang-questions
mailing list