file: module and character special files

Sat Jan 31 21:25:24 CET 2004

>>>>> "pn" == Patrik Nyblom <pan@REDACTED> writes:

pn> Richard A. O'Keefe wrote:

>>  Ad (c): it has NOT been true for a long time that "operations
>> won't block" on regular files.  Not since RFS and NFS came into
>> use.  The file system I see on my main machine is spread across
>> three different nodes; the machine I'm posting from brings in
>> another.  I have known read() operations on what *looked* like
>> "regular" files take 30 seconds.  (On one memorable occasion,
>> several hours.)

I feel compelled to add to this discussion.  :-)

For UNIX & Linux systems, both read(2) and write(2) system calls
*may* block when operating on file descriptors for *regular* files on
local file systems.  Furthermore, the OS will *never* tell you that an
operation will block -- the OS assumes that local disks are "fast
enough", so requests for non-blocking I/O for those file descriptors
will be ignored.

The POSIX AIO (asynchronous I/O) API can get you truly asynchronous
read & write behavior, but there is no corresponding non-blocking API
for file open, rename, and unlink ... so AIO only works well if you
open all of your files and keep them open for a very long time.

The two most vivid examples that come immediately to mind for working
around this problem (portably!) are:

1. The Squid HTTP proxy has long used external helper processes to
   perform unlink(2) system calls to avoid blocking the central
   select()/poll()-based process.

2. The Flash is an HTTP server that uses mincore(2) to figure out if
   all of the pages of a file are in core.  If not, it sends a request
   to an external helper process to fault the pages into RAM.  When
   the helper replies to the main process, the HTTP server process can
   touch those pages without (much) worry that it will be blocked by
   page fault disk I/O.  (Look for "Flash" at
   http://www.usenix.org/publications/library/proceedings/usenix99/technical.html)

>> Ad (a): while there may be UNIX systems whose implementors have
>> disgraced themselves by failing to make poll() or select() work on
>> files, [...]

Unfortunately, it isn't disgrace.  It is caused by Proper
Implementation of Specification.  I no longer recall the exact
standard that says so, alas, but a Web search should be able to find
it.

In general, blocking means that you have no idea when data will become
available.  For the case of local disk, you usually know: unless
there's a disk failure, you'll get your data pretty soon.

According to the definition of O_NONBLOCK in
http://www.opengroup.org/onlinepubs/007904975/functions/open.html:

    O_NONBLOCK
        When opening a FIFO with O_RDONLY or O_WRONLY set: 

        [...] 

        When opening a block special or character special file that
        supports non-blocking opens:

        [...] 

        Otherwise, the behavior of O_NONBLOCK is unspecified.

Sorry I can't be more helpful is citing the exact standards doc &
section.  (Not enough coffee this afternoon?)

pn> Tony Rogvall (together with others) has made a remarkable *real,
pn> working, and portable* implementation of a threaded file driver,
pn> submitted it to us, and that is the one nowdays present in the
pn> emulator.

On platforms with kernel-schedulable threads, it does indeed work
well.  If I recal correctly, you need to use "+Ax" on the "erl"
command line (where "x" is the number of threads in the VM's thread
pool) to enable it.

FreeBSD 4.x and earlier do not have kernel-schedulable threads, so
Tony's driver doesn't help there.  User-space thread scheduling is why
portable software like Squid & Flash use external OS processes as
helpers rather than threads.

-Scott