[erlang-questions] Using Key/Value store with shared storage

Sean D seand-erlang@REDACTED
Mon Dec 10 21:54:19 CET 2012


Thanks for the comments.  I have included a few points inline.

Cheers,
Sean

On Mon, Dec 10, 2012 at 10:21:32AM -0800, Mahesh Paolini-Subramanya wrote:
>      This seems flawed to me. Do you want resiliency, or shared storage?
> 
>    Ooooh.  Troll! :-)
> 
>    Seriously though - as Max Lapshin points out, w/ the exception of doing
>    some very interesting (read complex, and potentially destabilizing)
>    architecting w/ infiniband, multiple links and cascade setups, you are
>    going to be seriously bottle-necking on your shared storage.

I'm wondering if the term "Shared Storage" has caused some confusion.  When
I was talking about shared storage, I was talking about using a high-end
SAN.

The reason for this is because I believe they are designed to deal with
this type of scenario.  This is obviously dependent on the quality of the
SAN though.  Budgetary constraints are likely to determine whether or not
this will be worth my while.

Does anyone have any views on how much overhead there is in keeping nodes in
sync in a Riak-type solution.  Does this affect I/O performance or does it
simply require extra more processing power?

>    And, to Garret's point, resiliency and shared storage are (kinda)
>    orthogonal.  An appropriately spilled can of coke can wreak havoc to
>    your shared storage, and BigCouch/Riak/Voldemort/... can be remarkably
>    resilient.

Again, SANs are designed to be highly resilient.  I would hope that the
appropriate spilling of a can of coke would need to be highly malicious in
order to bring down a SAN.

I am not doubting any of these technologies are not resilient.  I would just
rather avoid duplicating data.
  
>    A few years back, we migrated out of the "eggs in one basket" SAN
>    approach to BigCouch. Our SAN setup was increasingly starting to look
>    something that would give even Rube Goldberg nightmares, and the sheer
>    amount of hackery associated with this was starting to keep me up at
>    nights.
> 
>    Anyhow, just my two bits...

Now this is interesting to me.  High maintenance overheads would be
offputting.  Was your SAN configuration particularly complex?  What
sort of issues were you having?

>    [1]Mahesh Paolini-Subramanya
>    That Tall Bald Indian Guy...
>    [2]Google+  | [3]Blog   | [4]Twitter  | [5]LinkedIn
>    On Dec 10, 2012, at 9:18 AM, Garrett Smith <[6]g@REDACTED> wrote:
> 
>      Aye, as well, this is curious:
> 
>      we would like to make the solution more resilient by using shared
>      storage
> 
>      On Mon, Dec 10, 2012 at 10:59 AM, Max Lapshin
>      <[7]max.lapshin@REDACTED> wrote:
> 
>      You speak about many servers but one SAN.
>      What for? Real throughput is limited about 3 gbps. It means that you
>      will be
>      limited in writing no more than 10000 values ok 30kb per second.
>      There will be no cheap way to scale after this limit if you use
>      storage, but
>      if you shard and throw away your raid, you can scale.
>      Maybe yo
>      On Monday, December 10, 2012, Sean D wrote:
> 
>      Hi all,
>      We are currently running an application for a customer that stores a
>      large
>      number of key/value pairs.  Performance is important for us as we
>      need to
>      maintain a write rate of at least 10,000 keys/second on one server.
>       After
>      evaluating various key/value stores, we found Bitcask worked
>      extremely
>      well
>      for us and we went with this.
>      The solution currently has multiple servers working independently of
>      each
>      and we would like to make the solution more resilient by using
>      shared
>      storage. I.e.  If one of the servers goes down, the others can pick
>      up the
>      work load and add to/read from the same store.
>      I am aware that Riak seems to be the standard solution for a
>      resilient
>      key-value store in the Erlang world, however from my initial
>      investigations,
>      this seems to work by duplicating the data between Riak nodes, and
>      this is
>      something I want to avoid as the number of keys we are storing will
>      be in
>      the range of 100s of GB and I would prefer that the shared storage
>      is used
>      rather than data needing to be duplicated.  I am also concerned that
>      the
>      overhead of Riak may prove a bottle-neck, however this isn't
>      something
>      that
>      I have tested.
>      If anyone here has used a key/value store with a SAN or similar in
>      this
>      way,
>      I'd be very keen to hear your experiences.
>      Many thanks in advance,
>      Sean
>      _______________________________________________
>      erlang-questions mailing list
>      [8]erlang-questions@REDACTED
>      http://erlang.org/mailman/listinfo/erlang-questions
> 
>      _______________________________________________
>      erlang-questions mailing list
>      [9]erlang-questions@REDACTED
>      http://erlang.org/mailman/listinfo/erlang-questions
> 
>      _______________________________________________
>      erlang-questions mailing list
>      [10]erlang-questions@REDACTED
>      http://erlang.org/mailman/listinfo/erlang-questions
> 
> References
> 
>    1. http://www.gravatar.com/avatar/204a87f81a0d9764c1f3364f53e8facf.png
>    2. https://plus.google.com/u/0/108074935470209044442/posts
>    3. http://dieswaytoofast.blogspot.com/
>    4. https://twitter.com/dieswaytoofast
>    5. http://www.linkedin.com/in/dieswaytoofast
>    6. mailto:g@REDACTED
>    7. mailto:max.lapshin@REDACTED
>    8. mailto:erlang-questions@REDACTED
>    9. mailto:erlang-questions@REDACTED
>   10. mailto:erlang-questions@REDACTED

> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions




More information about the erlang-questions mailing list