[erlang-questions] Request for enhancement: Sparse files

Alex Arnon <>
Tue Jun 9 11:26:40 CEST 2009


Can you tell where actual data blocks reside, using application logic?
(I know this might not be solving your problem, but introducing such
nonportable features into the standard distribution probably has small
chance of being incorporated).


On Mon, Jun 8, 2009 at 6:00 PM, Ville Silventoinen <
> wrote:

> Hi,
>
> I've written a directory scanner in Erlang that calculates usages per
> Unix user and group. It works well, except that the results are wrong
> when it encounters sparse files (holes in the files):
> http://en.wikipedia.org/wiki/Sparse_file. We have some users that have
> files that seem to be 1-2 terabytes, but in reality occupy less than a
> gigabyte on disk.
>
> The C stat struct has two fields I'd need (shown by "stat" command in
> Unix):
> blksize_t st_blksize;   /* optimal I/O size */
> blkcnt_t st_blocks;  /*allocated 512-byte blocks */
>
> If st_size > st_blocks * 512, the file is sparse. Unfortunately, the
> read_file_info/read_link_info doesn't provide the blocks information.
> Any chance this could be included in some future Erlang release?
>
> Also, any chance file:copy would support copying sparse files? :-) I
> tested, the target file becomes non-sparse.
>
> Erlang has been great help in our environment, where simple rsync has
> become too slow...
>
> Thanks,
> Ville
>
> P.S. I tried to add blocks information to my R13B Erlang environment,
> but I broke the build system somehow (something to do with the fact
> that prim_file is preloaded? "make preloaded" got me a bit further,
> but I gave up when compile:compile/3 became undef). Below are the
> changes I made to otp_src_R13B sources:
>
> # diff efile_drv.c efile_drv.c.original
> 1823,1827c1823
> <               put_int32(d->info.block_size,        &resbuf[1 + (29 *
> 4)]);
> <               put_int32(d->info.blocks_high,       &resbuf[1 + (30 *
> 4)]);
> <               put_int32(d->info.blocks_low,        &resbuf[1 + (31 *
> 4)]);
> <
> < #define RESULT_SIZE (1 + (32 * 4))
> ---
> > #define RESULT_SIZE (1 + (29 * 4))
>
> # diff erl_efile.h erl_efile.h.original
> 105,107d104
> <     Uint32 block_size;                /* Optimal I/O size. */
> <     Uint32 blocks_low;                /* Allocated 512-byte blocks,
> lower 32 bits. */
> <     Uint32 blocks_high;               /* Allocated 512-byte blocks,
> higher 32 bits. */
>
> # diff unix_efile.c unix_efile.c.original
> 872d871
> <     pInfo->blocks_high = 0;
> 875d873
> <     pInfo->blocks_high = (Uint32)(statbuf.st_blocks >> 32);
> 878,879d875
> <     pInfo->blocks_low = (Uint32)statbuf.st_blocks;
> <     pInfo->block_size = (Uint32)statbuf.st_blksize;
>
> # diff prim_file.erl prim_file.erl.original
> 1022,1024c1022
> <     [Mode, Links, Major, Minor, Inode, Uid, Gid, Access|Tail4] = Tail3,
> <     [BlockSize, HighBlocks, LowBlocks] = Tail4,
> <     Blocks = HighBlocks * 16#100000000 + LowBlocks,
> ---
> >     [Mode, Links, Major, Minor, Inode, Uid, Gid, Access] = Tail3,
> 1038,1040c1036
> <               gid = Gid,
> <               block_size = BlockSize,
> <               blocks = Blocks}.
> ---
> >               gid = Gid}.
>
> # diff file.hrl file.hrl.original
> 60,62c60
> <        gid    :: integer(),           % Group id for owner.
> <        block_size :: non_neg_integer(),       % On Unix, optimal I/O
> size.
> <        blocks :: non_neg_integer()}).         % On Unix, allocated
> 512-byte blocks.
> ---
> >        gid    :: integer()}).         % Group id for owner.
>
> ________________________________________________________________
> erlang-questions mailing list. See http://www.erlang.org/faq.html
> erlang-questions (at) erlang.org
>
>


More information about the erlang-questions mailing list