[erlang-questions] Request for enhancement: Sparse files

Ville Silventoinen ville.silventoinen@REDACTED
Tue Jun 9 12:11:06 CEST 2009


Hi Alex,

What do you mean by application logic? I can write an Erlang driver
that gets the actual block size of a file, is that then part of an
application?

I would think the blocks information would be useful to Unix, Linux
and Mac users who have to deal with files.

How is providing two more fields in file_info record non-portable?
What needs porting? You can just ignore them on Windows (until they
decide to become Unix-based as well ;-), or don't provide the fields
at all (like Python stat).

Cheers,
Ville

On Tue, Jun 9, 2009 at 10:26 AM, Alex Arnon<alex.arnon@REDACTED> wrote:
> Can you tell where actual data blocks reside, using application logic?
> (I know this might not be solving your problem, but introducing such
> nonportable features into the standard distribution probably has small
> chance of being incorporated).
>
>
> On Mon, Jun 8, 2009 at 6:00 PM, Ville Silventoinen
> <ville.silventoinen@REDACTED> wrote:
>>
>> Hi,
>>
>> I've written a directory scanner in Erlang that calculates usages per
>> Unix user and group. It works well, except that the results are wrong
>> when it encounters sparse files (holes in the files):
>> http://en.wikipedia.org/wiki/Sparse_file. We have some users that have
>> files that seem to be 1-2 terabytes, but in reality occupy less than a
>> gigabyte on disk.
>>
>> The C stat struct has two fields I'd need (shown by "stat" command in
>> Unix):
>> blksize_t st_blksize;   /* optimal I/O size */
>> blkcnt_t st_blocks;  /*allocated 512-byte blocks */
>>
>> If st_size > st_blocks * 512, the file is sparse. Unfortunately, the
>> read_file_info/read_link_info doesn't provide the blocks information.
>> Any chance this could be included in some future Erlang release?
>>
>> Also, any chance file:copy would support copying sparse files? :-) I
>> tested, the target file becomes non-sparse.
>>
>> Erlang has been great help in our environment, where simple rsync has
>> become too slow...
>>
>> Thanks,
>> Ville
>>
>> P.S. I tried to add blocks information to my R13B Erlang environment,
>> but I broke the build system somehow (something to do with the fact
>> that prim_file is preloaded? "make preloaded" got me a bit further,
>> but I gave up when compile:compile/3 became undef). Below are the
>> changes I made to otp_src_R13B sources:
>>
>> # diff efile_drv.c efile_drv.c.original
>> 1823,1827c1823
>> <               put_int32(d->info.block_size,        &resbuf[1 + (29 *
>> 4)]);
>> <               put_int32(d->info.blocks_high,       &resbuf[1 + (30 *
>> 4)]);
>> <               put_int32(d->info.blocks_low,        &resbuf[1 + (31 *
>> 4)]);
>> <
>> < #define RESULT_SIZE (1 + (32 * 4))
>> ---
>> > #define RESULT_SIZE (1 + (29 * 4))
>>
>> # diff erl_efile.h erl_efile.h.original
>> 105,107d104
>> <     Uint32 block_size;                /* Optimal I/O size. */
>> <     Uint32 blocks_low;                /* Allocated 512-byte blocks,
>> lower 32 bits. */
>> <     Uint32 blocks_high;               /* Allocated 512-byte blocks,
>> higher 32 bits. */
>>
>> # diff unix_efile.c unix_efile.c.original
>> 872d871
>> <     pInfo->blocks_high = 0;
>> 875d873
>> <     pInfo->blocks_high = (Uint32)(statbuf.st_blocks >> 32);
>> 878,879d875
>> <     pInfo->blocks_low = (Uint32)statbuf.st_blocks;
>> <     pInfo->block_size = (Uint32)statbuf.st_blksize;
>>
>> # diff prim_file.erl prim_file.erl.original
>> 1022,1024c1022
>> <     [Mode, Links, Major, Minor, Inode, Uid, Gid, Access|Tail4] = Tail3,
>> <     [BlockSize, HighBlocks, LowBlocks] = Tail4,
>> <     Blocks = HighBlocks * 16#100000000 + LowBlocks,
>> ---
>> >     [Mode, Links, Major, Minor, Inode, Uid, Gid, Access] = Tail3,
>> 1038,1040c1036
>> <               gid = Gid,
>> <               block_size = BlockSize,
>> <               blocks = Blocks}.
>> ---
>> >               gid = Gid}.
>>
>> # diff file.hrl file.hrl.original
>> 60,62c60
>> <        gid    :: integer(),           % Group id for owner.
>> <        block_size :: non_neg_integer(),       % On Unix, optimal I/O
>> size.
>> <        blocks :: non_neg_integer()}).         % On Unix, allocated
>> 512-byte blocks.
>> ---
>> >        gid    :: integer()}).         % Group id for owner.
>>
>> ________________________________________________________________
>> erlang-questions mailing list. See http://www.erlang.org/faq.html
>> erlang-questions (at) erlang.org
>>
>
>


More information about the erlang-questions mailing list