R13B01 bug in file:pwrite()?

Raimo Niskanen raimo+erlang-bugs@REDACTED
Tue Aug 25 17:40:50 CEST 2009


Some comments inserted below. Just a small top-post:
Can the OTP team get access to the core dumps produced
at the segfaults, and the erl_core.dump files?

Mail me in private...

On Thu, Jul 30, 2009 at 11:18:30PM -0500, Scott Lystig Fritchie wrote:
> Hi, all.  This sounds really weird, but I have a test case that can
> irregularly trigger this bug within 5-10 minutes.  I can't post all the
> code, in part because it's driven by Quviq's QuickCheck.  :-)  I'll try
> to cobble together as much info as I can gather, for such an
> intermittent problem.
> 
> 100% of the time, I see crashes after the "Use Scribble..." message.
> The error could be happening perhaps after the func has returned, but
> 100% of the time there isn't any output following "Use Scribble...", and
> the surrounding code is fairly verbose.
> 
> scribbles(ArgsList, PathT) ->
>     io:format("Num scribbles = ~p\n", [length(ArgsList)]),
>     lists:map(
>       fun([SeqNum, Offset, NewBlob]) ->
> 	      Path = element((SeqNum rem size(PathT)) + 1, PathT),
> 	      {ok, FI} = file:read_file_info(Path),
> 	      Pos = Offset rem (FI#file_info.size + 1), % in case size = 0.
> 	      {ok, FH} = file:open(Path, [binary, read, write]),
> 	      io:format("Pread ~p at Pos ~p for ~p bytes\n", [Path, Pos, size(NewBlob)]),
> 	      OldBlob = case (catch file:pread(FH, Pos, size(NewBlob))) of
> 			    {ok, BLB} -> BLB;
> 			    _         -> cant_match_any_binary_ha_ha
> 			end,
> 	      io:format("Use Scribble ~p at ~p Pos ~p\n", [NewBlob, Path, Pos]),
> 	      ok = file:pwrite(FH, Pos, NewBlob),
> 	      file:close(FH),
> 	      OldBlob == NewBlob
>       end, ArgsList).
> 
> I'm using R13B01 on a 32-bit install of a Linux Fedora 11 distro,
> 2.6.27.25-170.2.72.fc10.i686 kernel.  I've seen this happen with SMP
> support enabled or disabled, with the +A worker pool enabled or
> disabled, with +K true or false.
> 
>   USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
>   fritchie 12106 77.8 45.8 1716800 1661460 pts/22 Sl+ 22:22   6:10 /tmp/R13B01-debug/lib/erlang/erts-5.7.2/bin/beam.smp -K true -- -root /tmp/R13B01-debug/lib/erlang -progname erl -- -home /home/fritchie -smp enable -sname foo_dev -boot foo -pz ... [many more -pz flags...]
> 
> (I had "ps u" running in a loop 4x per second.  The above was taken
> during the ERTS_HOLE_MARKER output below.  The reading 0.25 seconds
> earlier was VSZ of about 70MB.)
> 
> The above is a debug build, as best as I've been able to do.  (It isn't
> one-command easy, alas.)
> 
>   Erlang R13B01 (erts-5.7.2) [source] [smp:2:2] [rq:2] [async-threads:0] [hipe] [kernel-poll:true] [type-assertions] [debug-compiled] [lock-checking]
> 
> Sample output after the crash looks like this.  Some messages are
> lacking the "Num scribbles..." output because I'd added that one a bit
> later.
> 
>   Use Scribble <<"ff">> at "./zzz-hlog-qc/2/6/-000000000061.HLOG" Pos 10172
>   Pread "./zzz-hlog-qc/2/2/-000000000085.HLOG" at Pos 64 for 27 bytes
>   Use Scribble <<"RRRRRRRRRRRRRRRRRRRRRRRRRRR">> at "./zzz-hlog-qc/2/2/-000000000085.HLOG" Pos 64
>   Pread "./zzz-hlog-qc/2/6/-000000000061.HLOG" at Pos 1162 for 3 bytes
>   Use Scribble <<"¸¸¸">> at "./zzz-hlog-qc/2/6/-000000000061.HLOG" Pos 1162
>   Pread "./zzz-hlog-qc/000000000084.HLOG" at Pos 652 for 22 bytes
>   Use Scribble <<"NNNNNNNNNNNNNNNNNNNNNN">> at "./zzz-hlog-qc/000000000084.HLOG" Pos 652
>   Pread "./zzz-hlog-qc/000000000046.HLOG" at Pos 653 for 8 bytes
>   Use Scribble <<"ìììììììì">> at "./zzz-hlog-qc/000000000046.HLOG" Pos 653
>   make: *** [run-app1-interactive] Segmentation fault

Does this produce a core dump?

> 
>   .....Pread "./zzz-hlog-qc/000000000001.HLOG" at Pos 216 for 1 bytes
>   Use Scribble <<"Z">> at "./zzz-hlog-qc/000000000001.HLOG" Pos 216
>   Pread "./zzz-hlog-qc/000000000001.HLOG" at Pos 330 for 1 bytes
>   Use Scribble <<"Ö">> at "./zzz-hlog-qc/000000000001.HLOG" Pos 330
>   Pread "./zzz-hlog-qc/000000000001.HLOG" at Pos 83 for 1 bytes
>   Use Scribble <<"°">> at "./zzz-hlog-qc/000000000001.HLOG" Pos 83
>   Pread "./zzz-hlog-qc/000000000001.HLOG" at Pos 176 for 1 bytes
>   Use Scribble <<"?">> at "./zzz-hlog-qc/000000000001.HLOG" Pos 176
> 
>   Crash dump was written to: erl_crash.dump
>   eheap_alloc: Cannot allocate 2147483700 bytes of memory (of type "heap_frag").

That should have produced an erl_crash.dump.
It can be analyzed with the crashdump_viewer.

>   make: *** [run-app1-interactive] Aborted
> 
> ----
> 
>   Pread "./zzz-hlog-qc/000000000001.HLOG" at Pos 443 for 0 bytes
>   Use Scribble <<>> at "./zzz-hlog-qc/000000000001.HLOG" Pos 443
>   Pread "./zzz-hlog-qc/000000000065.HLOG" at Pos 2496 for 21 bytes
>   Use Scribble <<"ooooooooooooooooooooo">> at "./zzz-hlog-qc/000000000065.HLOG" Pos 2496
>   Pread "./zzz-hlog-qc/2/2/-000000000064.HLOG" at Pos 70 for 16 bytes
>   Use Scribble <<"´´´´´´´´´´´´´´´´">> at "./zzz-hlog-qc/2/2/-000000000064.HLOG" Pos 70
>   Pread "./zzz-hlog-qc/3/6/-000000000047.HLOG" at Pos 6226 for 4 bytes
>   Use Scribble <<"ºººº">> at "./zzz-hlog-qc/3/6/-000000000047.HLOG" Pos 6226
>   make: *** [run-app1-interactive] Segmentation fault
> 
> ----
> 
>   .
>   =GMT ERR REPORT==== 30-Jul-2009::21:05:45 ===
>   fold_a_file: seq -2 offset 0: {error,unknown_file_header,
>                                        {ok,<<"LogVersion1L">>}}
>   .Pread "./zzz-hlog-qc/3/3/-000000000002.HLOG" at Pos 115 for 1 bytes
>   Use Scribble <<"·">> at "./zzz-hlog-qc/3/3/-000000000002.HLOG" Pos 115
>   Pread "./zzz-hlog-qc/3/3/-000000000002.HLOG" at Pos 127 for 1 bytes
>   Use Scribble <<"0">> at "./zzz-hlog-qc/3/3/-000000000002.HLOG" Pos 127
> 
>   Crash dump was written to: erl_crash.dump
>   eheap_alloc: Cannot allocate 3087007804 bytes of memory (of type "heap_frag").
>   make: *** [run-app1-interactive] Aborted
> 
> ----
> 
>   ..nnNum scribbles = 2
>   Pread "./zzz-hlog-qc/000000000023.HLOG" at Pos 183 for 6 bytes
>   Use Scribble <<21,21,21,21,21,21>> at "./zzz-hlog-qc/000000000023.HLOG"
>   Pos 183
>   Pread "./zzz-hlog-qc/1/4/-000000000003.HLOG" at Pos 966 for 13 bytes
>   Use Scribble <<"xxxxxxxxxxxxx">> at
>   "./zzz-hlog-qc/1/4/-000000000003.HLOG" Pos 9
>   66
>   ErrList1  ErrList2 [{seq,23,err,badarg}] [{seq,-3,err,{badmatch,false}}]
>   .Num scribbles = 1
>   Pread "./zzz-hlog-qc/3/3/-000000000002.HLOG" at Pos 32 for 18 bytes
>   Use Scribble
>   <<140,140,140,140,140,140,140,140,140,140,140,140,140,140,140,140,
>                  140,140>> at "./zzz-hlog-qc/3/3/-000000000002.HLOG" Pos
>   	       32
>   beam/erl_debug.c, line 361: ERTS_HOLE_MARKER found at 0x531ca148
>   |            | Range: 0x4e391038 - 0xb29d74a4              |
>   | Address    | Contents                                    |
>   |------------|---------------------------------------------|
>   | 0x4e391038 | 0x000000c0 0x4e391049 0x4e391044 0x00000000 |
>   | 0x4e391048 | 0xaf5e78cc 0xaf5e78cc 0xaf5e78cc 0xaf5e78cc |
>   | 0x4e391058 | 0xaf5e78cc 0xaf5e78cc 0xaf5e78cc 0xaf5e78cc |
>   | 0x4e391068 | 0xaf5e78cc 0xaf5e78cc 0xaf5e78cc 0xaf5e78cc |
>   | 0x4e391078 | 0xaf5e78cc 0xaf5e78cc 0xaf5e78cc 0xaf5e78cc |
>   | 0x4e391088 | 0xaf5e78cc 0xaf5e78cc 0xaf5e78cc 0xaf5e78cc |
>   | 0x4e391098 | 0xaf5e78cc 0xaf5e78cc 0xaf5e78cc 0xaf5e78cc |
>   | 0x4e3910a8 | 0xaf5e78cc 0xaf5e78cc 0xaf5e78cc 0xaf5e78cc |
>   [... 7 gazillions of nearly identical lines omitted...]
>   | 0x531ca0f8 | 0xaf5e78cc 0xaf5e78cc 0xaf5e78cc 0xaf5e78cc |
>   | 0x531ca108 | 0xaf5e78cc 0xaf5e78cc 0xaf5e78cc 0xaf5e78cc |
>   | 0x531ca118 | 0xaf5e78cc 0xaf5e78cc 0xaf5e78cc 0xaf5e78cc |
>   | 0x531ca128 | 0xaf5e78cc 0xaf5e78cc 0xaf5e78cc 0xaf5e78cc |
>   | 0x531ca138 | 0xaf5e78cc 0xaf5e78cc 0xaf5e78cc 0xaf5e78cc |
>   | 0x531ca148 | 0xaf5e78cc 0xaf5e78cc 0xaf5e78cc 0xaf5e78cc |
>   | 0x531ca158 | 0xaf5e78cc 0xaf5e78cc 0xaf5e78cc 0xaf5e78cc |
>   | 0x531ca168 | 0xaf5e78cc 0xaf5e78cc 0xaf5e78cc 0xaf5e78cc |
>   | 0x531ca178 | 0xaf5e78cc 0xaf5e78cc 0xaf5e78cc 0xaf5e78cc |
>   | 0x531ca188 | 0xaf5e78cc 0xaf5e78cc 0xaf5e78cc 0xaf5e78cc |
>   | 0x531ca198 | 0xaf5e78cc 0xaf5e78cc 0xaf5e78cc 0xaf5e78cc |
>   [... I didn't have enough disk space to record output to 0xb29d74a4...]
> 
> Attaching GDB to the process while QC is running doesn't work very
> well.  After "attach" and "continue", the VM freezes.  Pressing C-c and

That usually works.

> C-g (running GDB within Emacs in gdb-mode) can't interrupt GDB to get a
> prompt back.  I can only kill the GDB process externally.  :-(  So
> catching a SIGSEGV in-the-act hasn't been possible.  Is there some trick
> to doing that?  Or perhaps FC11's GDB is braindamaged somehow?  Or
> Linux?

If you just want to be in GDB at the segfault, run beam from GDB,
or analyze the core dump afterwards.

To run beam from GDB (commands after $ prompts and (gdb) prompts below),
for bash:

$ erl -emu_args
Executing: /usr/local/otp/releases/otp_beam_linux_sles10_x64_r13b_patched/erts-5.7.2/bin/beam.smp /usr/local/otp/releases/otp_beam_linux_sles10_x64_r13b_patched/erts-5.7.2/bin/beam.smp -- -root /usr/local/otp/releases/otp_beam_linux_sles10_x64_r13b_patched -progname erl -- -home /home/raimo

Erlang R13B01 (erts-5.7.2) [64-bit] [smp:4:4] [rq:4] [async-threads:0] [kernel-poll:false]

Eshell V5.7.2  (abort with ^G)
1> q().
ok
2>
$ cat `type -p erl`
:
ROOTDIR=/usr/local/otp/releases/otp_beam_linux_sles10_x64_r13b_patched
BINDIR=$ROOTDIR/erts-5.7.2/bin
:

$ export BINDIR=/usr/local/otp/releases/otp_beam_linux_sles10_x64_r13b_patched/erts-5.7.2/bin
$ PATH=$BINDIR:$PATH gdb /usr/local/otp/releases/otp_beam_linux_sles10_x64_r13b_patched/erts-5.7.2/bin/beam.smp
GNU gdb 6.6
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-suse-linux"...
Using host libthread_db library "/lib64/libthread_db.so.1".
(gdb) set args -- -root /usr/local/otp/releases/otp_beam_linux_sles10_x64_r13b_patched -progname erl -- -home /home/raimo
(gdb) run
Starting program: /usr/local/otp_product/releases/otp_beam_linux_sles10_x64_r13b_patched/erts-5.7.2/bin/beam.smp -- -root /usr/local/otp/releases/otp_beam_linux_sles10_x64_r13b_patched -progname erl -- -home /home/raimo
[Thread debugging using libthread_db enabled]
[New Thread 47822751210208 (LWP 28205)]
[New Thread 1082132800 (LWP 28209)]
[New Thread 1090525504 (LWP 28210)]
[New Thread 1090550080 (LWP 28211)]
[New Thread 1098942784 (LWP 28212)]
[New Thread 1107335488 (LWP 28213)]
[New Thread 1115728192 (LWP 28214)]
[New Thread 1124120896 (LWP 28215)]
Erlang R13B01 (erts-5.7.2) [64-bit] [smp:4:4] [rq:4] [async-threads:0] [kernel-poll:false]

Eshell V5.7.2  (abort with ^G)
1>






> 
>   % gdb --version
>   GNU gdb Fedora (6.8-32.fc10)
>   [...]
>   This GDB was configured as "i386-redhat-linux-gnu".
> 
> However, while all the ERTS_HOLE_MARKER stuff was dumping, I attached
> GDB and got this info on all threads.
> 
>   (gdb) thread apply all info stack
>   
>   Thread 6 (Thread 0xb7d27b90 (LWP 12113)):
>   #0  0xb8028416 in __kernel_vsyscall ()
>   #1  0x0054f19b in read () from /lib/libpthread.so.0
>   #2  0x081cb5ec in signal_dispatcher_thread_func (unused=0x0)
>       at sys/unix/sys.c:2913
>   #3  0x0823c0b6 in thr_wrapper (vtwd=0xbfaae2bc) at common/ethread.c:475
>   #4  0x0054851f in start_thread () from /lib/libpthread.so.0
>   #5  0x0047e04e in clone () from /lib/libc.so.6
>   
>   Thread 5 (Thread 0xb7326b90 (LWP 12114)):
>   #0  0xb8028416 in __kernel_vsyscall ()
>   #1  0x0054c105 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
>   #2  0x0823cf29 in ethr_cond_wait (cnd=0x82affe0, mtx=0x82aff40)
>       at common/ethread.c:1018
>   #3  0x080c170e in erts_cnd_wait (cnd=0x82affe0, mtx=0x82aff40)
>       at beam/erl_threads.h:634
>   #4  0x080c26f8 in erts_smp_cnd_wait (cnd=0x82affe0, mtx=0x82aff40)
>       at beam/erl_smp.h:417
>   #5  0x080bf963 in sys_msg_dispatcher_func (unused=0x0)
>       at beam/erl_trace.c:3078
>   #6  0x0823c0b6 in thr_wrapper (vtwd=0xbfaae32c) at common/ethread.c:475
>   #7  0x0054851f in start_thread () from /lib/libpthread.so.0
>   #8  0x0047e04e in clone () from /lib/libc.so.6
>   
>   Thread 4 (Thread 0xb8026b90 (LWP 12115)):
>   #0  0xb8028416 in __kernel_vsyscall ()
>   #1  0x0054c105 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
>   #2  0x0823cf29 in ethr_cond_wait (cnd=0x82d8c00, mtx=0x82d8bc0)
>       at common/ethread.c:1018
>   #3  0x080c170e in erts_cnd_wait (cnd=0x82d8c00, mtx=0x82d8bc0)
>       at beam/erl_threads.h:634
>   #4  0x081cb4a9 in child_waiter (unused=0x0) at sys/unix/sys.c:2842
>   #5  0x0823c0b6 in thr_wrapper (vtwd=0xbfaae2cc) at common/ethread.c:475
>   #6  0x0054851f in start_thread () from /lib/libpthread.so.0
>   #7  0x0047e04e in clone () from /lib/libc.so.6
>   
>   Thread 3 (Thread 0xb6925b90 (LWP 12116)):
>   #0  0xb8028416 in __kernel_vsyscall ()
>   #1  0x0046db3b in write () from /lib/libc.so.6
>   #2  0x00404a7c in _IO_new_file_write () from /lib/libc.so.6
>   #3  0x00405bf7 in _IO_new_do_write () from /lib/libc.so.6
>   #4  0x004055a5 in _IO_new_file_overflow () from /lib/libc.so.6
>   #5  0x00408423 in __overflow () from /lib/libc.so.6
>   #6  0x082410f7 in write_f_add_cr (vfp=0x50a4c0, buf=0x8250ffe "|\n", len=2)
>       at common/erl_printf.c:108
>   #7  0x08240c97 in erts_printf_format (fn=0x8241070 <write_f_add_cr>, 
>       arg=0x50a4c0, fmt=0x8250ffe "|\n", ap=0xb6924174 "\b")
>       at common/erl_printf_format.c:832
>   #8  0x08241576 in erts_printf (format=0x8250ffe "|\n")
>       at common/erl_printf.c:218
>   #9  0x080ebf5e in print_untagged_memory (pos=0x641e0338, end=0xb29d74a8)
>       at beam/erl_debug.c:518
>   #10 0x080eba1c in check_memory (start=0x4e391038, end=0xb29d74a8)
>       at beam/erl_debug.c:362
>   #11 0x080eb987 in erts_check_for_holes (p=0xb4e32d94) at beam/erl_debug.c:347
>   #12 0x08199ed4 in process_main () at beam/beam_emu.c:2049
>   #13 0x080f6f67 in sched_thread_func (vesdp=0xb7e0bc00)
>       at beam/erl_process.c:3015
>   #14 0x0823c0b6 in thr_wrapper (vtwd=0xbfaae38c) at common/ethread.c:475
>   #15 0x0054851f in start_thread () from /lib/libpthread.so.0
>   #16 0x0047e04e in clone () from /lib/libc.so.6
>   
>   Thread 2 (Thread 0xb5f24b90 (LWP 12117)):
>   #0  0xb8028416 in __kernel_vsyscall ()
>   #1  0x0047e836 in epoll_wait () from /lib/libc.so.6
>   #2  0x081d07e5 in check_fd_events (ps=0xb7daa400, tv=0xb5f2291c, max_res=256, 
>       ps_locked=0xb5f228d4) at sys/common/erl_poll.c:1906
>   #3  0x081d093b in erts_poll_wait_kp (ps=0xb7daa400, pr=0xb5f22928, 
>       len=0xb5f22924, utvp=0xb5f2291c) at sys/common/erl_poll.c:2042
>   #4  0x081d2f18 in erts_check_io_kp (do_wait=1)
>       at sys/common/erl_check_io.c:1156
>   #5  0x081cb4e1 in erl_sys_schedule (runnable=0) at sys/unix/sys.c:2860
>   #6  0x080f0f3f in sched_sys_wait (no=2, rq=0xb7e0b758)
>       at beam/erl_process.c:774
>   #7  0x080fdcb9 in schedule (p=0xb4f4bd1c, calls=15) at beam/erl_process.c:6023
>   #8  0x081887eb in process_main () at beam/beam_emu.c:1126
>   #9  0x080f6f67 in sched_thread_func (vesdp=0xb7e1dcd0)
>       at beam/erl_process.c:3015
>   #10 0x0823c0b6 in thr_wrapper (vtwd=0xbfaae38c) at common/ethread.c:475
>   #11 0x0054851f in start_thread () from /lib/libpthread.so.0
>   #12 0x0047e04e in clone () from /lib/libc.so.6
>   
>   Thread 1 (Thread 0xb800e6c0 (LWP 12106)):
>   #0  0xb8028416 in __kernel_vsyscall ()
>   #1  0x004763d1 in select () from /lib/libc.so.6
>   #2  0x081cb7d1 in erts_sys_main_thread () at sys/unix/sys.c:3021
>   #3  0x080918b1 in erl_start (argc=62, argv=0xbfaae684) at beam/erl_init.c:1231
>   #4  0x08072e66 in main (argc=Cannot access memory at address 0x0
>   ) at sys/unix/erl_main.c:29
>   (gdb) 
> 
> The ERTS_HOLE_MARKER thing happens much less frequently than the simple
> SIGSEGV or out-of-memory thing, perhaps once every 20 minutes or so of
> constant trying.  I've got a copy of the core image via "gcore", if it'd
> help.
> 
> I'm at a bit of a loss of what other info I can provide, so please let
> me know if I can send something else. 
> 
> -Scott
> 

-- 

/ Raimo Niskanen, Erlang/OTP, Ericsson AB


More information about the erlang-bugs mailing list