Request for enhancement: Sparse files
Mon Jun 8 17:00:18 CEST 2009
I've written a directory scanner in Erlang that calculates usages per
Unix user and group. It works well, except that the results are wrong
when it encounters sparse files (holes in the files):
http://en.wikipedia.org/wiki/Sparse_file. We have some users that have
files that seem to be 1-2 terabytes, but in reality occupy less than a
gigabyte on disk.
The C stat struct has two fields I'd need (shown by "stat" command in Unix):
blksize_t st_blksize; /* optimal I/O size */
blkcnt_t st_blocks; /*allocated 512-byte blocks */
If st_size > st_blocks * 512, the file is sparse. Unfortunately, the
read_file_info/read_link_info doesn't provide the blocks information.
Any chance this could be included in some future Erlang release?
Also, any chance file:copy would support copying sparse files? :-) I
tested, the target file becomes non-sparse.
Erlang has been great help in our environment, where simple rsync has
become too slow...
P.S. I tried to add blocks information to my R13B Erlang environment,
but I broke the build system somehow (something to do with the fact
that prim_file is preloaded? "make preloaded" got me a bit further,
but I gave up when compile:compile/3 became undef). Below are the
changes I made to otp_src_R13B sources:
# diff efile_drv.c efile_drv.c.original
< put_int32(d->info.block_size, &resbuf[1 + (29 * 4)]);
< put_int32(d->info.blocks_high, &resbuf[1 + (30 * 4)]);
< put_int32(d->info.blocks_low, &resbuf[1 + (31 * 4)]);
< #define RESULT_SIZE (1 + (32 * 4))
> #define RESULT_SIZE (1 + (29 * 4))
# diff erl_efile.h erl_efile.h.original
< Uint32 block_size; /* Optimal I/O size. */
< Uint32 blocks_low; /* Allocated 512-byte blocks,
lower 32 bits. */
< Uint32 blocks_high; /* Allocated 512-byte blocks,
higher 32 bits. */
# diff unix_efile.c unix_efile.c.original
< pInfo->blocks_high = 0;
< pInfo->blocks_high = (Uint32)(statbuf.st_blocks >> 32);
< pInfo->blocks_low = (Uint32)statbuf.st_blocks;
< pInfo->block_size = (Uint32)statbuf.st_blksize;
# diff prim_file.erl prim_file.erl.original
< [Mode, Links, Major, Minor, Inode, Uid, Gid, Access|Tail4] = Tail3,
< [BlockSize, HighBlocks, LowBlocks] = Tail4,
< Blocks = HighBlocks * 16#100000000 + LowBlocks,
> [Mode, Links, Major, Minor, Inode, Uid, Gid, Access] = Tail3,
< gid = Gid,
< block_size = BlockSize,
< blocks = Blocks}.
> gid = Gid}.
# diff file.hrl file.hrl.original
< gid :: integer(), % Group id for owner.
< block_size :: non_neg_integer(), % On Unix, optimal I/O size.
< blocks :: non_neg_integer()}). % On Unix, allocated
> gid :: integer()}). % Group id for owner.
More information about the erlang-questions