[erlang-questions] How would you implement a blob store

Benoit Chesneau bchesneau@REDACTED
Thu Jun 12 10:57:41 CEST 2014


On Thu, Jun 12, 2014 at 10:46 AM, Joe Armstrong <erlang@REDACTED> wrote:

> Hello
>
> I want some opinions on how to implement a blob store.
>
> I want a simple key-value store.
>
>
> To fix our ideas
>
>   - The values are variable size binaries (max 56 KB)
>   - The keys are SHA1 hashes of the values
>   - I want to store max 1M blobs
>   - Efficiency is not a concern (though it would be a deciding factor
> given two
>     equally beautiful solutions)
>
> The *simplest* way I can think of is to use the file store
> a blob with (hex) hash "a2e34a32..." gets stored in 2-level directory
> structure
> in a file called a2/e3/a2e34a32
>
> Even this might have problems - for example is file:write_file/2 atomic?
> What happens if two process try to write the same file at the same time
> with the same
> content? (and I know "at the same time" is meaningless, but it's shorter
> to say than
> ' if one processes has made a write_file request and a second process
>  makes a write_file
> request before the first request issued by the first process has completed
> ...)
>


not sure to see the problem here, if you use an hash then you can check if
you're already writing it or not on the file system by testing if it's
exist on it.

The problem is to make sure that the write won't have any read at the same
time that could happen on the filesystem which imply to upload the file to
a temporary file and rename it at the end. Or such things.


>
> The next simplest way I can think of is to make a single huge blob store
> file
> (max 56GB) and use an ets table to map hashes to addresses in the file -
> if this is a good idea or not would depend upon how well the host OS
> handles sparse files
> and so on.
>
>

> The third alternative would be to use a raw disk partition(very non
> portable etc.)
>
> The fourth alternative would be to use a library like bitcask
>

or any other key/value api, which was is doing leofs if i remember well.
you can store {keys, offset start, size} / chunks in the k/v to point on
large file. You will probably need to split the file.



>
> The fifth alternative would be use use some other library.
>

we are releasing coffer next week which is providing such features:

 http://refuge.io/learnmore/platform.html#blob-server

The blob server can fullly be used as a library. It abstract blobs upload
to different storages services or the filesystem.

Current code online is not the one that will be released but can be useful.

Hope it helps,

- benoit
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20140612/e9f9c5eb/attachment.htm>


More information about the erlang-questions mailing list