[erlang-questions] Re: Conceptual questions on key-value databases for RDBMs users

Tue Nov 9 10:43:14 CET 2010

> One of the core concepts in relational systems is that of
> enforced integrity constraints, via primary keys and foreign keys.
> If you don't have any integrity constraints that you care to tell the
> data base about, you probably don't need a relational system.

This was exactly what I meant with "invariance" earlier. If you have (strong) requirements for invariance in your model - an RDBMS is almost the only solution as integrity constraints etc are all first class concepts. If you don't have a requirement for invariance (as in Edmund's example):

> It is important to understand that SQL is not a good example of a relational system.  

Yes - SQL is a functional language where you state the solution in terms of relational concepts and ask the RDMBS to solve it and present you with an answer. Some of the Non-SQL systems uses search terms "Name = 'John'" or jscript for the same purpose. QLC is also an example (in the case of mnesia).

> I think a big reason kv-stores are winning over a lot of us long-time RDBMSs users is they allow us to model things-that-have-things-inside-them in the database much closer to how they are modeled in our applications. Orders/receipts/invoices with their items, users with their permissions, all these map nicely in kv-stores from db to application. This allows us to model relationships only when WE REALLY WANT RELATIONSHIPS (this receipt belongs to that invoice). That fact alone won me over and I've never looked back.

However, it is only simple to store things this way (the example of Orders/Receipts/Invoices with items "inside" them), if your only interest in the data is that the "outer" or container object encapsulates the items. Typically you want to read or write the whole thing in one operation. In real life, (and in my experience), you will pretty soon find that someone wants to know how many orders during the last 60 days included item X with quantities larger than say 6. 

If your design decision was to store this whole thing (the order and its items) as one big document (my term for it), the only way to retrieve this data is to literally open up every order, filter out the items you want and aggregate them. The only way to make this fast is to avoid reading every single document and processing it over and over. And to do this optimisation, you need an index, or more likely several indices - on dates, on items types, etc. Indices require that you understand what is inside your "document" (in this case line items).  By definition, this implies a relationship - orders have among other things - lines. Completely independent of the fact that you are storing the items inside the document/order. 

So as a summary from my side - all data has some sort of structure, be it words within documents, or line items within orders. You can represent this any way you want. 

In the distant past we wrote all the items on a single piece of paper called an order. It was all on one physical page. The page contained all the information. For the same reason it is difficult to query pieces of paper (you need to either index them or summarise them in another way), in the same way it is difficult to query data with implied relations stored in a single "thing" (blob/object/values). 

- It is very difficult to enforce invariance in KV stores
- It is very difficult to index KV stores
- It is hard work to query KV stores. 
- It is trivial to read from or write into KV stores
- It is hard to read from or write to database (drivers, SQL, ...)
- RDMBS systems are hard to scale
- KV stores scale easily

Rudolph van Graan

On Nov 9, 2010, at 1:59 AM, Richard O'Keefe wrote:

> 
> On 8/11/2010, at 4:06 AM, Steve Davis wrote:
>> It appears to me that this discussion is another expression of the
>> 'strong vs weak/dynamic vs static type' discussion.
>> 
>> ...it makes me suspect that an imperative and strongly-typed language
>> paradigm has been a very strong motivator in the evolution of SQL
>> databases; and perhaps the popularity of NoSQL/NotSQL is an expression/
>> outcome of the rise of recent trends in programming language uptake.
> 
> You *cannot* call the types in classic SQL "strong".
> Numbers, strings, and byte strings for everything is what Joe is complaining
> of and he is right to do so.  Encoding something as a string or blob basically
> results in the data base itself having no clue about the structure or meaning
> of the data.
> 
> It is important to understand that SQL is not a good example of a relational
> system.  A move away from SQL *could* be a move towards relational!
> such as 
> One of the core concepts in relational systems is that of
> enforced integrity constraints, via primary keys and foreign keys.
> If you don't have any integrity constraints that you care to tell the
> data base about, you probably don't need a relational system.
> 
> 
> ________________________________________________________________
> erlang-questions (at) erlang.org mailing list.
> See http://www.erlang.org/faq.html
> To unsubscribe; mailto:erlang-questions-unsubscribe@REDACTED
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3822 bytes
Desc: not available
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20101109/c3294e5b/attachment.bin>