[erlang-questions] io:format/3 port interaction?

Paul Mineiro <>
Sun Jan 13 09:20:20 CET 2008


hi.

i have a situation where a port i'm writing is hanging, but if i add
a call to io:format on the erlang side which executes before i want
to get my first Mod:handle_info call, everything works.  it has me
pretty flummoxed, so any help would be greatly appreciated.

the straces look pretty similar in the working and non-working case (see
below) which hopefully lends credence to my claims.  also
erlang is talking to the port program over a pipe, so the VM is isolated.

this is complicated and i haven't given alot of information here,
plus i've made some bold claims (the only thing different is one
io:format call; if true it would suggest a bug in the erlang VM).
however fortunately this is an open source project so i can fully expose
the details to interested parties.

thanks in advance for any insight,

-- p

some more details.  straces are with -etrace=file,desc,process

==============
hanging strace excerpt (port driver)
==============
...
1200208485.432326 read(3, "\0\6\0\0\0\0\0\0\0debug\0\35\0\0\0\0\0\0\0mount.tmp"..., 58) = 58
1200208485.435356 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x2ae03fe92100) = 21801
1200208485.436781 close(5)              = 0
1200208485.447665 close(6) = 0
1200208485.448181 wait4(21801, NULL, 0, NULL) = 21801
1200208485.448683 write(4, "\0\0\0\v", 4) = 4
1200208485.448906 write(4, "\1\2\0\0\0\0\0\0\0ok", 11) = 11
1200208485.449139 write(2, "fuserl_ll_output complete\n", 26) = 26
1200208485.451509 poll([{fd=3, events=POLLIN}, {fd=5, events=POLLIN, revents=POLLIN}], 2, -1) = 1
1200208485.451720 write(2, "fuserl_ll_ready_input\n", 22) = 22
1200208485.452589 read(5, "8\0\0\0\32\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 135168) = 56
1200208485.452830 fstat(1, {st_mode=S_IFREG|0644, st_size=24, ...}) = 0
1200208485.453123 write(1, "unique: 1, opcode: INIT (26), no"..., 52) = 52
1200208485.453281 write(1, "INIT: 7.8\nflags=0x00000003\nmax_r"..., 52) = 52
1200208485.453411 write(1, "   INIT: 7.8\n   flags=0x00000001"..., 85) = 85
1200208485.453744 write(1, "   unique: 1, error: 0 (Success)"..., 46) = 46
1200208485.453845 writev(5, [{"(\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0", 16}, {"\7\0\0\0\10\0\0\0\0\0\2\0\1\0\0\0\0\0\0\0\0\0\2\0", 24}], 2) = 40
1200208485.453971 write(2, "fuserl_ll_ready_input complete\n", 31) = 31
1200208485.454627 poll([{fd=3, events=POLLIN}, {fd=5, events=POLLIN, revents=POLLIN}], 2, -1) = 1
1200208485.561811 write(2, "fuserl_ll_ready_input\n", 22) = 22
1200208485.565286 read(5, "(\0\0\0\3\0\0\0\2\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 135168) = 40
1200208485.565433 write(1, "unique: 2, opcode: GETATTR (3), "..., 54) = 54
1200208485.565680 write(2, "fuserl_ll_op_getattr start 0x607"..., 36) = 36
1200208485.568906 write(4, "\0\0\0\22", 4) = 4
1200208485.569171 write(4, "\0\2pr`\0\0\0\0\0\1\0\0\0\0\0\0\0", 18) = 18
1200208485.569412 write(2, "fuserl_ll_op_getattr complete 0x"..., 39) = 39
1200208485.573359 write(2, "fuserl_ll_ready_input complete\n", 31) = 31
1200208485.576698 poll(
...
==============

the port program is talking 4-byte length prefixed to erlang.  after the
initial handshake (writev on 10 of 62 bytes and read on 7 of 15 bytes)
the port program sends a 22 byte message to erlang (unsolicited;
in this situation the port program mostly acts as a client and the
erlang VM a server, rather than vice versa; the initial handshake
is one exception to this, the other is the shutdown command).

in the hanging case, this is never responded to.

==============
hanging strace excerpt (erlang VM).
==============
...
1200208480.854839 writev(10, [{"\0\0\0:", 4}, {"\0\6\0\0\0\0\0\0\0debug\0\35\0\0\0\0\0\0\0mount.tmp"..., 58}], 2) = 62
1200208480.855277 poll([{fd=3, events=POLLIN|POLLRDNORM}, {fd=6, events=POLLIN|POLLRDNORM}, {fd=5, events=POLLIN|POLLRDNORM}, {fd=0, events=POLLIN|POLLRDNORM}, {fd=7, events=POLLIN|POLLRDNORM, revents=POLLIN|POLLRDNORM}], 5, 4984) = 1
1200208485.449091 read(7, "\0\0\0\v\1\2\0\0\0\0\0\0\0ok", 65536) = 15
1200208485.452534 poll([{fd=3, events=POLLIN|POLLRDNORM}, {fd=6, events=POLLIN|POLLRDNORM}, {fd=5, events=POLLIN|POLLRDNORM}, {fd=0, events=POLLIN|POLLRDNORM}, {fd=7, events=POLLIN|POLLRDNORM}], 5, 99) = 0
1200208485.561746 poll([{fd=3, events=POLLIN|POLLRDNORM}, {fd=6, events=POLLIN|POLLRDNORM}, {fd=5, events=POLLIN|POLLRDNORM}, {fd=0, events=POLLIN|POLLRDNORM}, {fd=7, events=POLLIN|POLLRDNORM, revents=POLLIN|POLLRDNORM}], 5, 278) = 1
1200208485.569350 read(7, "\0\0\0\22\0\2pr`\0\0\0\0\0\1\0\0\0\0\0\0\0", 65536) = 22
1200208485.576366 poll([{fd=3, events=POLLIN|POLLRDNORM}, {fd=6, events=POLLIN|POLLRDNORM}, {fd=5, events=POLLIN|POLLRDNORM}, {fd=0, events=POLLIN|POLLRDNORM}, {fd=7, events=POLLIN|POLLRDNORM}], 5, 266) = 0
1200208485.845472 poll([{fd=3, events=POLLIN|POLLRDNORM}, {fd=6, events=POLLIN|POLLRDNORM}, {fd=5, events=POLLIN|POLLRDNORM}, {fd=0, events=POLLIN|POLLRDNORM}, {fd=7, events=POLLIN|POLLRDNORM}], 5, 6044) = 0
1200208491.889132 poll([{fd=3, events=POLLIN|POLLRDNORM}, {fd=6, events=POLLIN|POLLRDNORM}, {fd=5, events=POLLIN|POLLRDNORM}, {fd=0, events=POLLIN|POLLRDNORM}, {fd=7, events=POLLIN|POLLRDNORM}], 5, 0) = 0
1200208491.889408 poll([{fd=3, events=POLLIN|POLLRDNORM}, {fd=6, events=POLLIN|POLLRDNORM}, {fd=5, events=POLLIN|POLLRDNORM}, {fd=0, events=POLLIN|POLLRDNORM}, {fd=7, events=POLLIN|POLLRDNORM}], 5, 0) = 0
1200208491.889703 poll([{fd=3, events=POLLIN|POLLRDNORM}, {fd=6, events=POLLIN|POLLRDNORM}, {fd=5, events=POLLIN|POLLRDNORM}, {fd=0, events=POLLIN|POLLRDNORM}, {fd=7, events=POLLIN|POLLRDNORM}], 5, 0) = 0
1200208491.892669 poll([{fd=3, events=POLLIN|POLLRDNORM}, {fd=6, events=POLLIN|POLLRDNORM}, {fd=5, events=POLLIN|POLLRDNORM}, {fd=0, events=POLLIN|POLLRDNORM}, {fd=7, events=POLLIN|POLLRDNORM}], 5, 0) = 0
1200208491.893488 poll(
==============

so to investigate i added an io:format call in the Mod:init function of
the gen_server owning the port.  this resulted in everything magically
working.  i then moved the io:format call out of my code and into the
test generator so that it would be called before my code was even loaded
(otherwise, the straces below would be interlaced with concurrent reads
on file descriptor 3 corresponding to the loading of io_lib_pretty.beam
for the first time from the async thread pool and it notifying the main
loop; at first i thought this was unsticking things but when i moved the
io:format way early the straces became more identical and the unsticking
still happened).

==============
working strace excerpt (erlang VM)
==============
...
1200209491.824843 writev(10, [{"\0\0\0:", 4}, {"\0\6\0\0\0\0\0\0\0debug\0\35\0\0\0\0\0\0\0mount.tmp"..., 58}], 2) = 62
1200209491.825303 poll([{fd=3, events=POLLIN|POLLRDNORM}, {fd=6, events=POLLIN|POLLRDNORM}, {fd=5, events=POLLIN|POLLRDNORM}, {fd=0, events=POLLIN|POLLRDNORM}, {fd=7, events=POLLIN|POLLRDNORM, revents=POLLIN|POLLRDNORM}], 5, 4981) = 1
1200209496.332997 read(7, "\0\0\0\v\1\2\0\0\0\0\0\0\0ok", 65536) = 15
1200209496.334975 poll([{fd=3, events=POLLIN|POLLRDNORM}, {fd=6, events=POLLIN|POLLRDNORM}, {fd=5, events=POLLIN|POLLRDNORM}, {fd=0, events=POLLIN|POLLRDNORM}, {fd=7, events=POLLIN|POLLRDNORM}], 5, 100) = 0
1200209496.438748 poll([{fd=3, events=POLLIN|POLLRDNORM}, {fd=6, events=POLLIN|POLLRDNORM}, {fd=5, events=POLLIN|POLLRDNORM}, {fd=0, events=POLLIN|POLLRDNORM}, {fd=7, events=POLLIN|POLLRDNORM, revents=POLLIN|POLLRDNORM}], 5, 368) = 1
1200209496.441124 read(7, "\0\0\0\22\0\2pr`\0\0\0\0\0\1\0\0\0\0\0\0\0", 65536) = 22
1200209496.442634 writev(10, [{"\0\0\0{", 4}, {"\1pr`\0\0\0\0\0\4\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\355"..., 123}], 2) = 127
1200209496.443743 poll([{fd=3, events=POLLIN|POLLRDNORM}, {fd=6, events=POLLIN|POLLRDNORM}, {fd=5, events=POLLIN|POLLRDNORM}, {fd=0, events=POLLIN|POLLRDNORM}, {fd=7, events=POLLIN|POLLRDNORM, revents=POLLIN|POLLRDNORM}], 5, 363) = 1
...
==============

==============
working strace excerpt (port driver)
==============
...
1200209496.332515 write(4, "\0\0\0\v", 4) = 4
1200209496.332760 write(4, "\1\2\0\0\0\0\0\0\0ok", 11) = 11
1200209496.333045 write(2, "fuserl_ll_output complete\n", 26) = 26
1200209496.334067 poll([{fd=3, events=POLLIN}, {fd=5, events=POLLIN, revents=POLLIN}], 2, -1) = 1
1200209496.334273 write(2, "fuserl_ll_ready_input\n", 22) = 22
1200209496.335027 read(5, "8\0\0\0\32\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 135168) = 56
1200209496.335177 fstat(1, {st_mode=S_IFREG|0644, st_size=42, ...}) = 0
1200209496.335463 write(1, "unique: 1, opcode: INIT (26), no"..., 52) = 52
1200209496.335618 write(1, "INIT: 7.8\nflags=0x00000003\nmax_r"..., 52) = 52
1200209496.335718 write(1, "   INIT: 7.8\n   flags=0x00000001"..., 85) = 85
1200209496.336155 write(1, "   unique: 1, error: 0 (Success)"..., 46) = 46
1200209496.336257 writev(5, [{"(\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0", 16}, {"\7\0\0\0\10\0\0\0\0\0\2\0\1\0\0\0\0\0\0\0\0\0\2\0", 24}], 2) = 40
1200209496.336380 write(2, "fuserl_ll_ready_input complete\n", 31) = 31
1200209496.336943 poll([{fd=3, events=POLLIN}, {fd=5, events=POLLIN, revents=POLLIN}], 2, -1) = 1
1200209496.438812 write(2, "fuserl_ll_ready_input\n", 22) = 22
1200209496.439720 read(5, "(\0\0\0\3\0\0\0\2\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 135168) = 40
1200209496.439864 write(1, "unique: 2, opcode: GETATTR (3), "..., 54) = 54
1200209496.440096 write(2, "fuserl_ll_op_getattr start 0x607"..., 36) = 36
1200209496.440679 write(4, "\0\0\0\22", 4) = 4
1200209496.440945 write(4, "\0\2pr`\0\0\0\0\0\1\0\0\0\0\0\0\0", 18) = 18
1200209496.441185 write(2, "fuserl_ll_op_getattr complete 0x"..., 39) = 39
1200209496.442722 write(2, "fuserl_ll_ready_input complete\n", 31) = 31
1200209496.443570 poll([{fd=3, events=POLLIN, revents=POLLIN}, {fd=5, events=POLLIN}], 2, -1) = 1
1200209496.443806 read(3, "\0\0\0{", 4) = 4
1200209496.443927 read(3, "\1pr`\0\0\0\0\0\4\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\355"..., 123) = 123
1200209496.444091 write(2, "fuserl_ll_output\n", 17) = 17
1200209496.444706 write(1, "   unique: 2, error: 0 (Success)"..., 47) = 47
...
==============

well alright these straces aren't really shedding light on why things are
working, although the messages being exchanged do look identical in both
cases.

complicating things:
  * invoking the test from the shell causes it to pass with or without the added io:format.
  * turning on [ { debug, [ trace ] } ] to the gen_server to see if Mod:handle_info/2 is being called doesn't seem to generate any trace messages when
invoked with --noshell --eval '...'

Optimism is an essential ingredient of innovation. How else can the
individual favor change over security?

  -- Robert Noyce



More information about the erlang-questions mailing list