[erlang-questions] Conceptual questions on key-value databases for RDBMs users

Mon Nov 8 14:49:35 CET 2010

Alceste,

Very true. Hierarchical data can be expressed even in simple SQL databases  
that don't support the self-reference/common-table constructs you  
describe. You can do interesting things with indices and almost-circular  
SQL-statements. Some DBs even have a whole API devoted to this sort of  
thing -- I used Microsoft's "Data Shaping" extension to SQL for quite some  
time. Nevertheless, it never feels as natural and is never as easy as it  
is in NoSQL/kv-databases.

Your historical note is interesting. It's almost as though we are going  
backwards with NoSQL, like in a lot of other areas of development that are  
proclaimed as "new" (e.g. I always looked at Ajax/Web 2.0 and thought -  
"wait a minute, that's just rich/smart client development that's been  
around since the 90s!")

Your observation of data creeping into a database that's not consistent  
with the application model is one of the arguments used for RDBMs against  
NoSQL. The idea that relationships and constraints can be used to  
safeguard bad data. I guess this is one of the things that you  
weight/prioritise when choosing either to go NoSQL or SQL.

My problem with relationships is that they tend to be used to implement  
too many things and introduce excessive complexity in an attempt to guard  
against bad data. From my experience, SQL is uses relationships/joins for  
two different things...

1) Containment: Orders contain items --- create a relationship.
2) Actual Relationships: Orders have corresponding invoices -- create a  
relationship.

The problem is containment is such a common thing that a good chunk of  
your relationships aren't really relationships. This is where relationship  
diagrams and SQL queries start to bloat. To the extent that I sometimes  
question whether the benefits gained from guarding against bad data are  
negated by the complexity introduced.

NoSQL/kv-dbs on the on-the-other-hand tend to give you real containment  
when you want it (even deep hierarchies if you need them), and  
relationships when you actually want relationships. The cost is less  
sophistication in auto-guarding against bad data.

The "SQL-way" is: don't trust the programmer, make sure everything fits  
the per-defined data model (even if it is inconvenient/doesn't map nicely  
against the application's model.)

The "NoSQL-way" is: trust the programmer, make it easier to model data  
that is convenient for/maps nicely against the application's model.

For me, the NoSQL-way makes more sense. The database should work for the  
programmer, not the other way around.

- Edmond -

On Mon, 08 Nov 2010 21:06:19 +1100, Alceste Scalas <alceste@REDACTED>  
wrote:

>
>  On Mon, 08 Nov 2010 04:16:43 +1100, "Edmond Begumisa"  
>  <ebegumisa@REDACTED> wrote:
>> The main reason I've switched to NoSQL (CouchDB) after many years of
>> using  SQL-RDBMSs is for modeling hierarchical data. RDBMSs force you
>> to do this  with relationships across different tables even when the
>> data belongs to  the same domain/object -- this feels unnatural and
>> results in the DB  equivalent of spaghetti code with hours staring at
>> relationship diagrams.
>
>  The relational model can express hierarchical relations.  SQL cannot
>  express the whole relational algebra, but it does allow to handle
>  hierarchical relations.  You can use self-referencing tables:
>
>  CREATE TABLE hierarchy (
>      id        INTEGER PRIMARY KEY,
>      parent_id INTEGER
>                REFERENCES hierarchy(id)
>                ON UPDATE CASCADE
>                ON DELETE RESTRICT,
>      ...
>  );
>
>  And then you could use common table expressions in queries:
>
>      http://en.wikipedia.org/wiki/Common_table_expressions
>
>  Funny note: the relational model was developed 40 years ago in
>  order to provide a declarative and formal method for structuring
>  data, and overcome the limitations of the DBMSs of the time ---
>  which were, in fact, mostly hierarchical and document-oriented,
>  and required a procedural interface in order to perform queries.
>
>  As a result, the data in hierarchical DBMSs was structured without
>  formal analysis, and only considering the immediate requirements of
>  the application that was going to use it.  When other applications
>  needed to access the DB, or the application itself required updates
>  which broke the mapping between DB objects and application objects,
>  then *huge* pains would arise...
>
>  Regards,

-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/