[erlang-questions] Request for enhancement: Sparse files

Richard O'Keefe ok@REDACTED
Wed Jun 10 05:21:12 CEST 2009


In "UNIX" systems (BSD, System V, Solaris, Linux, MacOS) blocks in
a file that do not exist are read back as all bytes zero.  So if
you want to copy a file without introducing unnecessary bytes,
you have to check for all-zero blocks (and it does not matter whether
an all-zero block was real or faked).  That check can be done in
application logic IF you know what the block size of the file that
you are *writing* actually is.

The Single Unix Specification is quite explicit:

blksize_t st_blksize
     A file system-specific preferred I/O block size for this object.
     In some file system types, this may vary from file to file.
blkcnt_t st_blocks
	Number of blocks allocated for this object.
Reminder: st_blocks says how many blocks were allocated for the
_original_ file.  The copy might be on another file system or for
some other reason have a different block size from the original.

You could see this as a crude form of data deduplication.

Actually, st_blksize is the recommended size of an I/O *transfer*,
not necessarily the allocation unit on disc.  Considering that the
hardware block size on IDE discs is defined by the interface to be
512 bytes, it would probably be sufficient for a program to check
512 bytes at a time.



More information about the erlang-questions mailing list