[erlang-questions] Mysterious gen_server timeouts in MIX

John R. Ashmun john.ashmun@REDACTED
Sat Nov 2 05:21:29 CET 2013


I have implemented an emulator for MIX, Donald Knuth's notional 1960's
computer, as an Erlang application, as a way of learning to use Erlang.

As part of the emulator's startup currently, MIX loads Knuth's Program P
into its memory, then jumps to the starting location of Program P.  Program
P calculates the first 500 prime numbers, pops them into a table, and then
prints 10 prime numbers at a time on what it considers to be its line
printer.

The emulator usually runs fine, and when it does, it carries out Program P
perfectly.

Sometimes, however, things get stuck, and I have been unable to learn what
goes wrong.  When there is a problem, I usually see these two error reports:

=ERROR REPORT==== 1-Nov-2013::11:33:54===
** Generic server <0.38.0> terminating
** Last message in was timeout
** When server state == []
** Reason for termination ==
** {timeout,{gen_server,call,[io_controller,{wait_until_not_busy,18}]}}

=ERROR REPORT==== 1-Nov-2013::11:33:54===
** Generic server <0.4095.0> terminating
** Last message in was {'$gen_cast',{write,1995}}
** When Server state == {state,18,<0.4058.0>,24}
** Reason for termination ==
** {timeout,{gen_server,call,[io_controller,{set_ready,18}]}}

The first one seems clearly to tell me that MIX's io_controller
gen_server's API wait_until_not_busy( ) function sent the message that it
should have, but the handle_call( ) never found the readiness status for
MIX device 18, its line printer, not to be busy, and then the gen_server
was timed out.  This should never happen, but then there is the second
report to consider.

The second report confuses me.  Program P attempts to write a title (FIRST
FIVE HUNDRED PRIMES) to MIX's line printer.  This is what the $gen_cast is
attempting to do (1995 is the MIX address of the first character of the
title).  A MIX I/O operation (in this case, the write) begins by waiting
until the I/O unit that's addressed is not busy.  Apparently that's not
true over the timeout period on this run; here is the mystery:  the
emulator's I/O device gen_servers are all initialized with their readiness
state set to ready, their readiness is changed to busy only by a request to
perform an I/O operation, and the operation is actually carried out by an
io_operation gen_server that is started by the I/O device gen_server.  That
io_operation gen_server is the entity that requests the io_controller to
set the readiness of the device back to ready, once the io_operation has
performed the I/O, or the output, in this case, to the file that plays the
role of the line printer, in this case.  It shouldn't be possible for
io_controller:set_ready( Device ) to be called before io_controller:write(
Device, Address ).  Is this what the error report is telling me happened?

All right, you say, you've programmed your emulator incompetently, and I
suppose I may have.  Why, then, does all this work perfectly, over and over
again, on another machine, or even on the same machine, before going into
failure mode?  (I have been unable to identify anything causing things to
start failing nor to start working, and I'm not making changes other than
sometimes running using application:start( 'MIX' ) and sometimes booting
MIX as an Erlang release -- sometimes compiling using +debug_info (or not)
has seemed to change from success to failure (or, equally likely, the
opposite).

I need advice, please.

Regards,
John Ashmun
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20131101/cd0db89d/attachment.htm>


More information about the erlang-questions mailing list