[erlang-questions] How would you implement a blob store

Joe Armstrong erlang@REDACTED
Thu Jun 12 10:46:36 CEST 2014


I want some opinions on how to implement a blob store.

I want a simple key-value store.

To fix our ideas

  - The values are variable size binaries (max 56 KB)
  - The keys are SHA1 hashes of the values
  - I want to store max 1M blobs
  - Efficiency is not a concern (though it would be a deciding factor given
    equally beautiful solutions)

The *simplest* way I can think of is to use the file store
a blob with (hex) hash "a2e34a32..." gets stored in 2-level directory
in a file called a2/e3/a2e34a32

Even this might have problems - for example is file:write_file/2 atomic?
What happens if two process try to write the same file at the same time
with the same
content? (and I know "at the same time" is meaningless, but it's shorter to
say than
' if one processes has made a write_file request and a second process
 makes a write_file
request before the first request issued by the first process has completed

The next simplest way I can think of is to make a single huge blob store
(max 56GB) and use an ets table to map hashes to addresses in the file -
if this is a good idea or not would depend upon how well the host OS
handles sparse files
and so on.

The third alternative would be to use a raw disk partition(very non
portable etc.)

The fourth alternative would be to use a library like bitcask

The fifth alternative would be use use some other library.

My instinct points me to the *simplest* way (above) or bitcask.

Now I know that Richard will reply "do them all and measure" - but possibly
somebody somebody has done this before - so I can benefit from their wisdom.

All ideas are welcome


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20140612/ea450ee8/attachment.htm>

More information about the erlang-questions mailing list