Request for enhancement: Sparse files

Ville Silventoinen ville.silventoinen@REDACTED
Mon Jun 8 17:00:18 CEST 2009


Hi,

I've written a directory scanner in Erlang that calculates usages per
Unix user and group. It works well, except that the results are wrong
when it encounters sparse files (holes in the files):
http://en.wikipedia.org/wiki/Sparse_file. We have some users that have
files that seem to be 1-2 terabytes, but in reality occupy less than a
gigabyte on disk.

The C stat struct has two fields I'd need (shown by "stat" command in Unix):
blksize_t st_blksize;   /* optimal I/O size */
blkcnt_t st_blocks;  /*allocated 512-byte blocks */

If st_size > st_blocks * 512, the file is sparse. Unfortunately, the
read_file_info/read_link_info doesn't provide the blocks information.
Any chance this could be included in some future Erlang release?

Also, any chance file:copy would support copying sparse files? :-) I
tested, the target file becomes non-sparse.

Erlang has been great help in our environment, where simple rsync has
become too slow...

Thanks,
Ville

P.S. I tried to add blocks information to my R13B Erlang environment,
but I broke the build system somehow (something to do with the fact
that prim_file is preloaded? "make preloaded" got me a bit further,
but I gave up when compile:compile/3 became undef). Below are the
changes I made to otp_src_R13B sources:

# diff efile_drv.c efile_drv.c.original
1823,1827c1823
<               put_int32(d->info.block_size,        &resbuf[1 + (29 * 4)]);
<               put_int32(d->info.blocks_high,       &resbuf[1 + (30 * 4)]);
<               put_int32(d->info.blocks_low,        &resbuf[1 + (31 * 4)]);
<
< #define RESULT_SIZE (1 + (32 * 4))
---
> #define RESULT_SIZE (1 + (29 * 4))

# diff erl_efile.h erl_efile.h.original
105,107d104
<     Uint32 block_size;                /* Optimal I/O size. */
<     Uint32 blocks_low;                /* Allocated 512-byte blocks,
lower 32 bits. */
<     Uint32 blocks_high;               /* Allocated 512-byte blocks,
higher 32 bits. */

# diff unix_efile.c unix_efile.c.original
872d871
<     pInfo->blocks_high = 0;
875d873
<     pInfo->blocks_high = (Uint32)(statbuf.st_blocks >> 32);
878,879d875
<     pInfo->blocks_low = (Uint32)statbuf.st_blocks;
<     pInfo->block_size = (Uint32)statbuf.st_blksize;

# diff prim_file.erl prim_file.erl.original
1022,1024c1022
<     [Mode, Links, Major, Minor, Inode, Uid, Gid, Access|Tail4] = Tail3,
<     [BlockSize, HighBlocks, LowBlocks] = Tail4,
<     Blocks = HighBlocks * 16#100000000 + LowBlocks,
---
>     [Mode, Links, Major, Minor, Inode, Uid, Gid, Access] = Tail3,
1038,1040c1036
<               gid = Gid,
<               block_size = BlockSize,
<               blocks = Blocks}.
---
>               gid = Gid}.

# diff file.hrl file.hrl.original
60,62c60
<        gid    :: integer(),           % Group id for owner.
<        block_size :: non_neg_integer(),       % On Unix, optimal I/O size.
<        blocks :: non_neg_integer()}).         % On Unix, allocated
512-byte blocks.
---
>        gid    :: integer()}).         % Group id for owner.


More information about the erlang-questions mailing list