[erlang-questions] How would you implement a blob store

Thu Jun 12 10:58:57 CEST 2014

Hello,

I see here two approaches (both used by me for blob store).

a) use sharded dets. e.g. first byte of your key defines the dets partition.
I’ve build a simple library https://github.com/fogfish/d3 to use pid for dets
operation. Therefore, you can use gproc or similar to bind your partition 
identity with dets shard. You can spawn all partition within supervisor using, etc.
One of advantage of this option, it is built from erlang native libraries.

b) use eleveldb. https://github.com/basho/eleveldb
This is very easy to use. I’ve migrated some of my dets based implementation to leveldb.
It might not be suitable if you need to port the solution to some other platform. 
I’ve tried to build this library on Intel CPU only (dunno if it works on ARM, PowerPC, etc)

Best Regards, 
Dmitry

On 12 Jun 2014, at 11:46, Joe Armstrong <erlang@REDACTED> wrote:

> Hello
> 
> I want some opinions on how to implement a blob store.
> 
> I want a simple key-value store. 
> 
> 
> To fix our ideas 
> 
>   - The values are variable size binaries (max 56 KB) 
>   - The keys are SHA1 hashes of the values
>   - I want to store max 1M blobs
>   - Efficiency is not a concern (though it would be a deciding factor given two
>     equally beautiful solutions)
> 
> The *simplest* way I can think of is to use the file store
> a blob with (hex) hash "a2e34a32..." gets stored in 2-level directory structure
> in a file called a2/e3/a2e34a32
> 
> Even this might have problems - for example is file:write_file/2 atomic?
> What happens if two process try to write the same file at the same time with the same
> content? (and I know "at the same time" is meaningless, but it's shorter to say than
> ' if one processes has made a write_file request and a second process  makes a write_file
> request before the first request issued by the first process has completed ...)
> 
> The next simplest way I can think of is to make a single huge blob store file
> (max 56GB) and use an ets table to map hashes to addresses in the file -
> if this is a good idea or not would depend upon how well the host OS handles sparse files
> and so on.
> 
> The third alternative would be to use a raw disk partition(very non portable etc.)
> 
> The fourth alternative would be to use a library like bitcask
> 
> The fifth alternative would be use use some other library.
> 
> 
> My instinct points me to the *simplest* way (above) or bitcask.
> 
> Now I know that Richard will reply "do them all and measure" - but possibly somebody somebody has done this before - so I can benefit from their wisdom.
> 
> All ideas are welcome
> 
> Cheers
> 
> /Joe
> 
> 
> 
> 
>  
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions