Simple DB Access

Sat Aug 5 01:13:37 CEST 2006

OCI is very mature and would probably work ok as a linked in driver.
Our in-house API just makes OCI calls so you dont have to - OCI is
very... baroque to say the least.  You don't want to deal with that
stuff just to do the basics.  No wire-format necessary - you don't
want to go down that path, you're asking for a world of hurt.  For
example, most libraries which implement the oracle 'wire protocol'
can't handle things like RAC.

As for the atom thing - you can't have a SQL statement as an atom.  I
dont think a 5000 character atom is really supported well.  SQL
statements do get that big.  Typically a few hundred chars - still not
really atom-land.  Like I said, just hash the string and use that as a
unique key into a table of prepared statements.  Or whatever.

As for the SQL 92 parser - that is a little unnecessary.  Your client
library doesn't have to understand SQL.  If the statement doesn't
begin with 'select' (adjusting for white space) then it doesn't return
a resultset generally.  Underlying APIs don't always make you
understand the difference, and tell you specifically after it executes
what kind of statement it was.  Don't try to parse the SQL - that is
not your job.  Your job is to smooth over the C api and make it tasty
to erlang developers.

Again, re: your talk about resultSet metadata... go for the 80% use
case and make it easy.  Almost all the time you just want to get your
data, you know what order it is in, and just give me the data!  now!
All the extra meta data stuff is great for writing flexible GUIs which
let you do this and that, but most of the queries dont need that.  It
should be available somehow, but not as part of the main result set -
meaning don't return big huge data structures that have all the column
names in it, just return the raw data and nothing but the data.

As for the returning tags to indicate the kind of result, that sounds
good - probably better than my thought :-)

Again, re: transactions, never ever ever assume people will only need
auto-commit mode.  I can't stand that - transactions are THE reason to
use a RDBMs.  Otherwise just use BDB (which supports transactions
btw).  Ok, data richness too, but ssssh.

On to the topic of large resultsets.  The primary interface should
return huge amounts of data - if you forgot 'where' and now you
returned 10mm rows, an out of memory crash is as good as any.  Now,
the foldl() interface is in fact the only way large data sets should
be supported. The other option is to return slices of the dataset at a
time.  That seems kind of icky.

While its good to make Erlang attractive to non-functional
programmers, we must not forget the functional heritage of Erlang _is_
one of its core strengths.  If erlang ever introduces a 'for loop' I
will get very bent out of shape.

-ryan

On 8/4/06, Christian S <chsu79@REDACTED> wrote:
> On 8/4/06, Ryan Rawson <ryanobjc@REDACTED> wrote:
> > I dont think you have to make it harder to use adhoc concat SQL, just
> > make it stupidly easy to use binds.  Screw the whole 'prepare
> > statement' b.s. - that is why people dont use bind variables, you have
> > to create tons of extra variables and tracking things just to use a
> > bind.  Instead, push all that down into the API, and just provide an
> > api like:
> >
> > execSql( string SQLText, bind1value, bind2value, bind3value )
> >
> > or something.  Hash the SQLText in the API so you can see if this was
> > already prepared or not.
>
> Since it is easy to use pattern matching to tell if SQLText is an atom one could
> still allow registering statements, and only then take advantage of
> statement preparation.
> If it is a string, one just parse it over again. Erlang doesnt have
> varargs, but we have
> list litterals. Is it more fun to do:
>
> edbc:execute(Db, "SELECT * FROM foobar WHERE a = ? AND b = ?", ["foo", "bar"])
>
> than
>
> edbc:execute(Db, "SELECT * FROM foobar WHERE a = ?a AND b = ?b", [{a,
> "foo"}, {b, "bar"}])
>
> The later has potential to report better error messages, telling you
> that "parameter b still unbound" if no {b, Value} was given. Of
> course, erlang has a tradition of giving vague error messages. :)
>
> > Now, what is easier to write:
> >
> > execSql( "insert into foo (a,b,c)  values(" + escapeForSql( variable1
> > ) + "," + .  <etc etc> )
>
> Yeah, I get it. :) I think SQLite mainly had an
> escape-this-string-for-inclusion-as-a-litteral-in-a-statement
> approach. They do have it for blobs anyway. (Just me braindumping, I
> investigated that api to get a sense of sql api/protocol diversity.)
>
> << Anyone have a SQL92 parser written in erlang/yecc alraedy? :)  Not?
> Know where the bnf for SQL92 can be found? I recall seeing it in a
> book appendix once.>>
>
> Oh yeah. Postgres has a number of non-standard operators for quering
> map databases. With a SQL92 parser front end one would reject sql
> statements using them and force those users to "hit the metal" with
> the postgres driver directly. Good/Bad?
>
> > if it is a function that returns values, return a list of tuples
> > (result set).  Remember that for like 90% of the database APIs unless
> > you do "magic" the client side will retrieve the entire resultset from
> > the server anyways.  Many times I've thought that I could retrieve one
> > row at a time, then my client crashed because it tried to pull down
> > 500,000 rows.  You need a special call series for those cases (which
> > are generally special) - optimize the most common case, that would be
> > returning a small to medium # of rows (eg: 1-500 rows).
>
> Good that you bring this up, one ends up wondering how to handle huge
> query results while making sql apis. How about the simple query
> returns all as a result, leading to crashes if the result is big
> enough to exhaust memory, and then a secondary version that can handle
> some result-set like approach.
>
> Is it better to use a lists:fold like approach than returning a result
> set object for which one have various accessors and operators on? I
> suspect the first leads to compact code which is easily readable for
> erlang programmers with good grasp of first class functions, but quite
> the opposite for newbies with an imperative-programming mindset.
>
> > Also, for SQL that returns no rows (update, insert, etc), the # of
> > rows affected is your resultset.  Don't provide 2 API calls that the
> > programmer needs to figure out which one to call depending on select
> > or update.
>
> Returning different tuples for different statement results is
> effectively two kinds of calls. But something like the following isnt
> bad, is it?
>
> case edbc:execute(...) of
>   {result, Columns} ->
>     ...;
>   {update, Count) ->
>     ...;
>   . . .
> end
>
> Databases return different amount of information in their results.
> Postgres is quite rich, afaik it returns column names and column types
> (coded to numbers), beyond just the resulting rows. For every kind of
> query it will also tell if you are outside a transaction, inside a
> transaction, or inside an aborted transaction.
>
> One needs to know what every db will present in results. I suspect
> JDBC have done that job for us?
>
> > That is my API design guide.  Having using our own internal API to OCI
> > which behaves like I described more or less (except more like
> > sprintf/scanf in terms of providing binds, ie: pointers) vs JDBC,
> > Hibernate, PostgreSQL C api, i highly prefer our own API, despite the
> > C warts it has.
>
> I spent an hour clicking through the OCI docs a few days ago. Seems like plenty
> information about the C OCI api there in.
> They're not documenting the wire protocol, are they? How is your
> in-house lib built, reverse engineering or some NDA deal with Oracle?
> Have any experience with the OCI lib? Good enough to use as a
> linked-in-driver? Reentrant to run in a separate thread?
>
> > So for the love of all good API design, PLEASE don't make the
> > developer do all the work!
>