[erlang-questions] comma-less lists and tuples

Thu Sep 21 15:47:45 CEST 2006

Hi,

>
> What I mean is that IMHO if one wants a structured representation of
> something in textual form, the best way is to use a parser. As the
> lazy programmer I am, I don't want to do manually do the parser's
> work, while at the same time making the representation harder to read
> and a nightmare to maintain...
>
> > Do you mean teaching the compiler to parse SQL, or to leave
> > it as is and letting the compiler remain ignorant? ;-)
>
> How could the Erlang compiler help me write better SQL statements?
> Yes, if I write quasi-SQL, I get some syntax checking, but I still can
> write stupid things like
>     {select, {where, {x, '=', 3}}, foo, {from, [table]}}
> which is a correct Erlang term, but not correct quasi-SQL.
>
> In the particular case of SQL, we can also skip any parsing and do
> fine with strings - especially since sql is not only hard to parse,
> but there are many different dialects too. I'd rather let the
> database's parser take care of that.
>
> I don't really see what's wrong with concatenating strings :-)
> Escaping parts that come from outside needs to be done even with
> quasi-sql. Using a parse transform as per my previous example would
> make it even almost as readable as pure sql.
>
> > This should then be a metric for evaluating any
> > special syntax: do we actually gain something by forcing
> > the programmer to write quasi-SQL rather than SQL directly?
>
> Precisely. If it isn't obvious already, my opinion is that there is no
> gain in completely erlang-izing the sql language. But that's just me.
>

The first reason to use the "quasi-SQL" for is that it *guarantees*
that you will not expose yourself to SQL injection attacks, whereas
string contcatentation does not. It's like the difference between
coding in a high level language and coding in C. Can you avoid buffer
overflow attacks when writing in C? Sure. But for some reason many
smart people have a hard time avoiding them, which is one reason that
many projects avoid C/C++. Even if you're really smart and you were
able to prevent 99.9% of all buffer overflows in a large codebase,
that single one you didn't prevent can cost you a fortune. Plus, some
programmers on your team may not be as skilled as you in avoiding such
pitfalls. Using a tool that always prevents them can save you a lot of
pain down the road.

Users of ErlyDB and the MySQL driver have asked me, "Your examples use
(some) SQL directly, which makes you volnerable to SQL injection
attacks. How can I prevent them?" That's one of the primary reasons I
made ErlSQL: I wanted to tell them -- "here, use this." And it had to
be easy to use, to look like SQL, and to express a very large subset
of SQL because otherwise they would (legitimately) complain that it's
not intuitive to learn and/or that doesn't do everything they want it
to do.

For the second point: I have said a few times that ErlSQL code looks
much better than string concatenation code when it does a little more
than just literal translation. I'll try to drive the point home with
an example taken from ErlyDB, which makes the query for getting
related records in a many-to-many relation.

Here's the string concatention way:

make_get_related_many_to_many_query(OtherModule, JoinTable, Obj, WhereClause,
				    ExtraClause) ->
    OtherTableStr = atom_to_list(OtherModule),
    ThisTableStr = get_module_str(Obj),
    JoinTableStr = atom_to_list(JoinTable),
    WhereClause1 = make_nested_where_clause(WhereClause),
    FieldStrs = make_str_for_fields(OtherModule),
    Query =
	lists:append(
	  ["SELECT ", FieldStrs, " FROM ",
	   OtherTableStr, ",", JoinTableStr,
	   " WHERE (", OtherTableStr,
	   ".id = ", JoinTableStr,
	   ".", OtherTableStr, "_id AND ",
	   JoinTableStr, ".", ThisTableStr, "_id=",
	   get_id_str(Obj), ")", WhereClause1, ExtraClause]),
    Query.

make_str_for_fields(Module) ->
    TableStr = atom_to_list(Module),
    {Result, _} =
	lists:foldl(
	  fun(Field, {Str, IsFirst}) ->
		  FieldName = TableStr ++ "." ++ atom_to_list(Field),
		  Str1 =
		      if
			  IsFirst ->
			      FieldName;
			  true ->
			      FieldName ++ ","
		      end,
		  {Str1 ++ Str, false}
	  end, {"", true}, lists:reverse(Module:fields())),
    Result.

make_nested_where_clause(WhereClause) ->
    case WhereClause of
	undefined ->
	    " ";
	_ ->
	    " AND (" ++ WhereClause ++ ") "
    end.

Here's the ErlSQL way:

make_get_related_many_to_many_query(OtherModule, JoinTable, Obj,
WhereClause, ExtraClause) ->
    {select, OtherModule:shown_fields(),
     {from, [OtherModule:table(), JoinTable]},
     {where,
       {{{OtherModule, id},'=',
         {JoinTable, get_id_field(OtherModule)}},
          'and',
        {{JoinTable, get_id_field(Obj)}, '=', get_id(Obj)}},
           'and', WhereClause)},
     ExtraClause}.

Which do you think is more readable? Which is more writable?

Even putting readability aside, when the Mnesia driver is passed this
structure, it doesn't have to parse anything. It can pattern-match on
the queries that it knows it can handle and breaks on the rest. That's
a big win in simplifying the code.

Best regards,
Yariv