[erlang-questions] Re: Conceptual questions on key-value databases for RDBMs users

Tue Nov 9 17:01:08 CET 2010

Another observation...

I think a lot of it has to do with conceptual difference between SQL-RDBMS  
and NoSQL/kv-db/free-structure in terms of how both camps view the *role*  
of the database.

I've found that SQL-RDMBSs tend to think of the database as central to the  
design+development of an application. So naturally things must follow the  
db. My issue with this is that as an app developer, I found myself always  
begging and pleading with the database. The conversation would go  
something like...

Me: "Please, please store this data for me so I can query it later."

DB: "Not so fast. First, split that into three tables. Add a constraint  
here, foreign key there. Replace that with a type I can understand then  
I'll see what I can do. BTW, you'll have to have another conversation with  
my bro, the SQL query engine later. Go on... Off you go!"

The other camp seems to view the database role as more of a supplementary  
one in the design+development of an application. A helper. A tool. The  
conversation for me has gone more like...

Me: "Oi, Couch! Come over here. Hang onto this for me. I'll need that  
back."

DB: "Sure thing boss!"

- Edmond -

On Wed, 10 Nov 2010 00:38:37 +1100, Edmond Begumisa  
<ebegumisa@REDACTED> wrote:

> Hello Rudolph,
>
> Some comments...
>
> On Tue, 09 Nov 2010 20:43:14 +1100, Rudolph van Graan  
> <rvg@REDACTED> wrote:
>
>>
>>> One of the core concepts in relational systems is that of
>>> enforced integrity constraints, via primary keys and foreign keys.
>>> If you don't have any integrity constraints that you care to tell the
>>> data base about, you probably don't need a relational system.
>>
>> This was exactly what I meant with "invariance" earlier. If you have  
>> (strong) requirements for invariance in your model - an RDBMS is almost  
>> the only solution as integrity constraints etc are all first class  
>> concepts. If you don't have a requirement for invariance (as in  
>> Edmund's example):
>>
>
> In many cases (at least for a migrating RDBMS user), invariance is  
> *usually* indeed very important. And as you pointed out earlier there  
> are ways of faking this in the emerging class of NoSQL/kv-db, the  
> absence of which means tougher checks on code that's writing data (the  
> db is unlikely to do this for you.)
>
> I've found that when one is making the transition (from SQL-RDBMS to  
> NoSQL/Kv-db), the benefits of easier mapping of db-object to  
> application-object normally out-weight the loss of things like  
> constraints. I guess you have to trust more that programmers know what  
> they are doing and won't write bad data! Some people cringe when hearing  
> this, but I've found it's as bad as it sounds!
>
> However, I think as this newer class of DB start to attract more and  
> more RDBMS migrants, they will have to figure out a ways of solving  
> these issues. Some have already stared (CouchDB has some concept of data  
> validation.)
>
>>> It is important to understand that SQL is not a good example of a  
>>> relational system.
>>
>> Yes - SQL is a functional language where you state the solution in  
>> terms of relational concepts and ask the RDMBS to solve it and present  
>> you with an answer. Some of the Non-SQL systems uses search terms "Name  
>> = 'John'" or jscript for the same purpose. QLC is also an example (in  
>> the case of mnesia).
>>
>
> I'd go further and say SQL has evolved into a primary API for accessing  
> todays RDBMSs. You can do much more that just query for data. You can  
> design/redesign the db, add/remove indicies. Many vendors add elaborate  
> extensions like Microsoft's Data Shaping for hierarchical data. In many  
> ways SQL is like the non-standard standard shell language for many  
> databases.
>
>>> I think a big reason kv-stores are winning over a lot of us long-time  
>>> RDBMSs users is they allow us to model  
>>> things-that-have-things-inside-them in the database much closer to how  
>>> they are modeled in our applications. Orders/receipts/invoices with  
>>> their items, users with their permissions, all these map nicely in  
>>> kv-stores from db to application. This allows us to model  
>>> relationships only when WE REALLY WANT RELATIONSHIPS (this receipt  
>>> belongs to that invoice). That fact alone won me over and I've never  
>>> looked back.
>>
>> However, it is only simple to store things this way (the example of  
>> Orders/Receipts/Invoices with items "inside" them),
>
> I disagree, I store things this way not because it's simple for the  
> database but because that's how my applications understand those things  
> (BTW, those were *REAL* examples I was using orders/receipts/invoices  
> from real code in development. Code previously written against  
> SQL-RDBMSs and being re-writing against CouchDB. The latter is proving  
> to be a much more pleasant experience!)
>
> I've seen many SQL-RDBMS developers try to achieve a similar end-result  
> using object-relational mapping. You know, so when they create an  
> instance of a user class in say Java, a new user is automatically added  
> the users table in the db and corresponding entries are made in the  
> permissions table. I've never agreed with that way of doing things but  
> it *does* illustrate that there is a disconnect between how developers  
> want to organise data and how SQL-RDBMSs want them organise data.
>
>> if your only interest in the data is that the "outer" or container  
>> object encapsulates the items.
>
> This isn't true. At least not with the class of kv-database I'm  
> referring to. CouchDB (and I assume Mnesia) allows you to access the  
> 'inner' objects (to whatever depth) inside your queries. You can very  
> easily access the items inside an invoice.
>
>> Typically you want to read or write the whole thing in one operation.  
>> In real life, (and in my experience), you will pretty soon find that  
>> someone wants to know how many orders during the last 60 days included  
>> item X with quantities larger than say 6.
>>
>
> Easy. I do that sort of thing everywhere (no accounting tool would be  
> very helpful if it couldn't do those kind of complex queries).
>
> I don't know about query engines for other kv-dbs like Mnesia, but  
> CouchDB handles that sort of thing very well. Just create a view that  
> gathers the inner data according to the criteria you are looking for  
> (map step) then do the necessary tallying (reduce step).
>
>> If your design decision was to store this whole thing (the order and  
>> its items) as one big document (my term for it), the only way to  
>> retrieve this data is to literally open up every order, filter out the  
>> items you want and aggregate them. The only way to make this fast is to  
>> avoid reading every single document and processing it over and over.
>
> Couch is very smart at ensuring that any lengthy read/map-reduce  
> operations are cached in an internal data structure (B-tree I think) and  
> tries it's best to keep the cache update-to-date. So you normally don't  
> have to worry too much about the whole vs partial read/write operations  
> -- you just get on with business of writing your application using  
> whatever structures make sense to you. But if the query is one that is  
> rarely executed and lengthly, you can run it at startup so the next time  
> it runs it's faster.
>
> This is one of the things I mean when I say we have to make clear the  
> class of kv-store that the original poster should look into. An RDBMS  
> user looking for a suitable alternative isn't looking for just a plain  
> kv-store. You're normally looking for something that has a nice well  
> thought-out query engine on top and has figured a lot of these issues  
> out. The Couch team have done an *excellent* job here.
>
>> And to do this optimisation, you need an index, or more likely several  
>> indices - on dates, on items types, etc.
>
> Correct. For couch's case, when you're creating a query (a view in couch  
> lingo) what you're essentially doing at the map step is asking couch to  
> create an index for you. Couch tries it's best to be very smart at how  
> it handles/optimises that index.
>
>> Indices require that you understand what is inside your "document" (in  
>> this case line items).
>
> Correct. You can look at couch's map step as providing the query engine  
> an understanding of the documents being stored, including any contained  
> or deeply hierarchical* data.
>
> *YAY! I've _ALWAYS_ wanted this and tried desperately to simulate it in  
> SQL-RDBMSs with varying degrees of success.
>
>> By definition, this implies a relationship - orders have among other  
>> things - lines. Completely independent of the fact that you are storing  
>> the items inside the document/order.
>>
>
> Okay, I see what you're saying here. Indeed...
>
> http://wiki.apache.org/couchdb/EntityRelationship
>
> ... you can view embedded data as related data. I hadn't thought of it  
> that way.
>
> Let me make my argument clear by referring to what SQL-RDBMS folk tend  
> to think of as a relationship -- that is -- a constraint based on a  
> foreign-key. The kv-db equivalent of this would be a member of the  
> document that is merely a key pointing to another document rather than  
> meaningful data in it's own right. Now, using *that* definition...
>
> For Couch's case, the map-reduce paradigm can be effectively used so  
> that you *DON'T HAVE TO CREATE RELATIONSHIPS (of the external key  
> variety) IF YOU DON'T WANT THEM*
>
> This is my major argument against SQL-RDMBSs and in favor of  
> NoSQL/kv-dbs like Couch -- give me containers when I want containers and  
> give me relationships when I specifically want relationships.
>
> I don't want my orders to *relate* to lines elsewhere, this is not how  
> my application looks at orders. I want orders to *contain* items. On the  
> other-hand, I want invoices to *relate* to orders. Only in the latter  
> case does it make sense for me to have and foreign-key/id reference  
> stored in the orders. I want data in my db to look like data in my  
> application.
>
> The db should fit my application, my application shouldn't have to fit  
> the db. I always found I would invest considerable time with SQL-RDBMSs,  
> first trying to force the db to understand how I want my data organised,  
> then giving up and just adapting the code of my app to think like the db  
> -- in terms of tables and foreign keys.
>
> I've found NoSQL dbs like Couch treat me more like a grown up. Give me  
> the freedom to define and enforce my own rules in a way that makes sense  
> to me not in some generic way the *supposedly* adapts to every  
> situation. Sure, the RDBMS style may adapt to a wide array of  
> situations, but from my experience, this is never without pain.
>
>> So as a summary from my side - all data has some sort of structure, be  
>> it words within documents, or line items within orders. You can  
>> represent this any way you want.
>>
>
> True, but with most SQL-RDBMSs, things can get particularly hard. Aside  
>  from my relationship rant, a lot of data just doesn't want to be stored  
> in tables, and when you force it, exponential complexity arises.
>
>> In the distant past we wrote all the items on a single piece of paper  
>> called an order. It was all on one physical page. The page contained  
>> all the information. For the same reason it is difficult to query  
>> pieces of paper (you need to either index them or summarise them in  
>> another way), in the same way it is difficult to query data with  
>> implied relations stored in a single "thing" (blob/object/values).
>>
>> - It is very difficult to enforce invariance in KV stores
>
> I agree. And constraints in general are painful.
>
>> - It is very difficult to index KV stores
>
> This depends. For couch, map-reduce is a form of indexing -- a very  
> effective and surprisingly flexible one at that. Doors that are normally  
> closed in SQL-RDBMS world are suddenly opened wide in Couch.
>
>> - It is hard work to query KV stores.
>
> I disagree. I haven't found a query for *my* data yet that I cannot  
> express in a couch view. I'd go further and say I've found it easier  
> than SQL because I can organise data in a way that makes more sense to  
> my application. This somehow always translates to easier query  
> construction for the application domain.
>
> Perhaps it's a question of learning curve. SQL-RDBMS users are so used  
> to thinking in terms of SELECTS and JOINS, but that doesn't make SQL  
> easy! It just makes it easy once you know how. I think the same applies  
> to many kv-dbs.
>
>> - It is trivial to read from or write into KV stores
>> - It is hard to read from or write to database (drivers, SQL, ...)
>> - RDMBS systems are hard to scale
>
> I agree with all.
>
>> - KV stores scale easily
>
> Largely true. But sometimes it's not as easy as they promise. Some  
> things work well locally and break when you replicate. And you only  
> realise this when you try and scale out. There is some element of false  
> advertising when NoSQL folk promise simple scaling, there should be a  
> big red disclaimer attached.
>
> - Edmond -
>
>
>> Rudolph van Graan
>>
>>
>> On Nov 9, 2010, at 1:59 AM, Richard O'Keefe wrote:
>>
>>>
>>> On 8/11/2010, at 4:06 AM, Steve Davis wrote:
>>>> It appears to me that this discussion is another expression of the
>>>> 'strong vs weak/dynamic vs static type' discussion.
>>>>
>>>> ...it makes me suspect that an imperative and strongly-typed language
>>>> paradigm has been a very strong motivator in the evolution of SQL
>>>> databases; and perhaps the popularity of NoSQL/NotSQL is an  
>>>> expression/
>>>> outcome of the rise of recent trends in programming language uptake.
>>>
>>> You *cannot* call the types in classic SQL "strong".
>>> Numbers, strings, and byte strings for everything is what Joe is  
>>> complaining
>>> of and he is right to do so.  Encoding something as a string or blob  
>>> basically
>>> results in the data base itself having no clue about the structure or  
>>> meaning
>>> of the data.
>>>
>>> It is important to understand that SQL is not a good example of a  
>>> relational
>>> system.  A move away from SQL *could* be a move towards relational!
>>> such as
>>> One of the core concepts in relational systems is that of
>>> enforced integrity constraints, via primary keys and foreign keys.
>>> If you don't have any integrity constraints that you care to tell the
>>> data base about, you probably don't need a relational system.
>>>
>>>
>>> ________________________________________________________________
>>> erlang-questions (at) erlang.org mailing list.
>>> See http://www.erlang.org/faq.html
>>> To unsubscribe; mailto:erlang-questions-unsubscribe@REDACTED
>>>
>>
>
>

-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/