[erlang-questions] A question of style

Tue Mar 17 14:55:00 CET 2015

A Question of Style

The other day I was writing a program that called some library code
that I had written.

The library had a routine that could parse a string:

    -module(my_lib).
    -export([parse_string/1]).
     ...

In the code I was writing I had a binary B containing a string that I
wanted parsing.

I was faced with two alternatives:

   1)

       call my_lib:parse_string(binary_to_list(B))

 Or

   2)

     Add an addition interface routine to my_lib and call this from the module
     I was writing, so my_lib now becomes:

       -module(my_lib).
       -export([parse_string/1, parse_binary/1]).

       parse_binary(B) -> parse_string(binary_to_lost(B))

     and the calling code is

        my_lib:parse_binary(B)

Now for many years I have almost automatically chosen option 2) - with
the reasoning "make the library easy to use with lot's of interface
functions"

But suddenly an alternative thought struck me:

"adding lots of convenience interface functions to the library code
makes the library code difficult to understand" - it difficult to see
at a glance what the "essential functionality" of the library and to
distinguish the essential functionality from the convenience and
non-essential functions.

So is this true:

    easy to use the library == not easy to understand the library code

Looking through the Erlang library code, we see the nightmarish
results of the philosophy of "making the library code easy to call" -
there are loads of convenience functions cluttering up the code to the
extent that it becomes difficult to see what is going on.

(this seems paradoxical - libraries/frameworks that are "easy" to use
tend to be large - and horribly difficult to understand when they
don't do exactly what you want, but understanding how they work seems
essential in using them correctly)

Example: File names; are they strings, binaries, atoms or deep-lists?
I guess you'll find all of these used in an inconsistent manner.

This multiple representation of filenames seems to be an example of
chronic "can't make you mind up ism", is it a bird or a plane? I
dunno, it's both.

With directory names things get even worse - it's all the complexity
of a filename with the added problem of wondering whether or not the
directory name ended with a "/" or not. Half the code in the system
does, the other half doesn't :-)

I got to thinking about libraries in general, it had been my
misfortune to program some C using the mac audio API - here there are
so many functions that it's virtually impossible to distinguish the
essential functions (from which all other functions can be
constructed) from the convenience functions that merely call the
essential functions, omitting the odd argument or so and performing
the odd data-type conversion.

After a bit of thought I decided to re-write parts of my library
code. I decide that internally I'd only use lists and I'd remove all
the convenience functions.

The result was library code that was far shorter and easier to
understand.  I made a design decision to minimize the use of binaries
for string processing and only use lists of integers (on input I use
binaries and convert them to lists) on output I convert the lists to
binaries (but no messing in the middle of my code). Previously I have
a lot of code with binary_to_list and list_to_binary all over the
place - all my problems with utf8/latin1 etc. almost vanished. The
data comes in as a UTF8 binary (or something) but then gets
immediately converted to a list of integer character Unicode code
points and stays that way as long as possible.

The more I think about it the more I come to the conclusion that we
should not be writing polymorphic interfaces to libraries and making
them easy to use. Instead we should be writing minimal libraries
containing only essential features.

We should make our minds up about things like filenames, directory
names etc. representations and we should enforce them uniformly
accross all libraries. (My choice would be that filenames are always
represented by flat lists of Unicode integers, directory names always
have a trailing "/") etc.

Paradoxically the goal of making a particular library easy to use by
offering multiple polymorphic entry points not only makes it far more
difficult to understand, but difficult to compose the code in
different modules.  Polymorphic data types muck up the type signatures
since the types of the polymorphic arguments tend to escape the module
and propagate silly types all over the place.

Note: a similar argument can be made for code that provides default
arguments to a generic function. Suppose we have an essential function
with seven arguments, should we provide half a dozen helper functions
that provide default arguments the the big function in different ways?

So am I right? - should we junk all the convenience functions in a
module and stick to essential functionality offering only one way to
do something?

I'll do this for a while and see what happens.

Cheers

/Joe