[erlang-questions] How would you implement a blob store
Joe Armstrong
erlang@REDACTED
Thu Jun 12 10:46:36 CEST 2014
Hello
I want some opinions on how to implement a blob store.
I want a simple key-value store.
To fix our ideas
- The values are variable size binaries (max 56 KB)
- The keys are SHA1 hashes of the values
- I want to store max 1M blobs
- Efficiency is not a concern (though it would be a deciding factor given
two
equally beautiful solutions)
The *simplest* way I can think of is to use the file store
a blob with (hex) hash "a2e34a32..." gets stored in 2-level directory
structure
in a file called a2/e3/a2e34a32
Even this might have problems - for example is file:write_file/2 atomic?
What happens if two process try to write the same file at the same time
with the same
content? (and I know "at the same time" is meaningless, but it's shorter to
say than
' if one processes has made a write_file request and a second process
makes a write_file
request before the first request issued by the first process has completed
...)
The next simplest way I can think of is to make a single huge blob store
file
(max 56GB) and use an ets table to map hashes to addresses in the file -
if this is a good idea or not would depend upon how well the host OS
handles sparse files
and so on.
The third alternative would be to use a raw disk partition(very non
portable etc.)
The fourth alternative would be to use a library like bitcask
The fifth alternative would be use use some other library.
My instinct points me to the *simplest* way (above) or bitcask.
Now I know that Richard will reply "do them all and measure" - but possibly
somebody somebody has done this before - so I can benefit from their wisdom.
All ideas are welcome
Cheers
/Joe
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20140612/ea450ee8/attachment.htm>
More information about the erlang-questions
mailing list