[erlang-questions] Why do we need modules at all?

Tue May 24 10:06:19 CEST 2011

Why do we need modules at all?

This is a brain-dump-stream-of-consciousness-thing. I've been
thinking about this for a while.

I'm proposing a slightly different way of programming here
The basic idea is

    - do away with modules
    - all functions have unique distinct names
    - all functions have (lots of) meta data
    - all functions go into a global (searchable) Key-value database
    - we need letrec
    - contribution to open source can be as simple as
      contributing a single function
    - there are no "open source projects" - only "the open source
      Key-Value database of all functions"
    - Content is peer reviewed

These are discussed in no particular order below:

Why does Erlang have modules?

There's a good an bad side to modules:

Good: Provides a unit of compilation, a unit of code
distribution. unit of code replacement

Bad: It's very difficult to decide which module to put an individual
function in. Break encapsulation (see later)

Aside: lib_misc.erl

When I'm programming I often get to the point were I say there should
a function foo/2 in lists.erl but their isn't. There should be but
there isn't - foo/2 is a small self contained thing. Why should it be
in lists.erl because it "feels right".

Strings are lists, so why do we have two modules lists.erl and
string.erl how should I decide in which module my new string/list
processing function should go.

To avoid all mental anguish when I need a small function that
should be somewhere else and isn't I stick it in
a module elib1_misc.erl.

My elib1_misc exports the following:

added_files/2                 make_challenge/0
as_bits/1                     make_response/2
as_bits_test/0                make_response_test/0
bdump/2                       make_test_strings/1
bin2hex/1                     make_test_strings_test/0
bin2hex_test/0                make_tmp_filename/2
check_io_list/1               merge_kv/1
collect_atom/1                merge_kv_test/0
collect_atom_test/0           mini_shell/0
collect_int/1                 module_info/0
collect_int_test/0            module_info/1
collect_string/1              ndots/1
collect_string_test/0         nibble_to_hex_char/1
collect_word/1                nibble_to_hex_char_test/0
complete/2                    odd/1
complete_test/0               on_exit/2
dos2unix/1                    out_of_date/2
downcase_char/1               outfile/2
dump/2                        padd/2
dump_tmp/2                    perms/1
duplicates/1                  perms_test/0
ensure_started/2              pmap/2
eval_file/1                   pmap1/2
eval_file_test/0              pmap1_test/0
eval_string/1                 pmap_test/0
eval_string_test/0            priority_receive/0
every/3                       random_seed/0
expand_env_vars/1             random_string/1
expand_file_template/3        random_string/2
expand_string_template/2      read_at_most_n_lines/2
expand_tabs/1                 read_at_most_n_lines_test/0
expand_tabs_test/0            remove_duplicates/1
expand_template/2             remove_duplicates_test/0
extract_attribute/2           remove_leading_and_trailing_whitespace/1
extract_attribute_test/0      remove_leading_and_trailing_whitespace_test/0
extract_prefix/2              remove_leading_whitespace/1
fetch/2                       remove_prefix/2
fetch_test/0                  remove_prefix_test/0
file2lines/1                  remove_trailing_whitespace/1
file2lines_test/0             replace/3
file2md5/1                    root_dir/0
file2numberedlines/1          rpc/2
file2numberedlines_test/0     safe/1
file2paras/1                  show_loaded/1
file2stream/1                 signed_byte_to_hex_string/1
file2string/1                 signed_byte_to_hex_string_test/0
file2template/1               skip_blanks/1
file2term/1                   skip_blanks_test/0
file_size_and_type/1          skip_to_nl/1
find_src/1                    skip_to_nl_test/0
first/1                       sleep/1
flatten_io_list/1             spawn_monitor/3
flush_buffer/0                split_at_char/2
for/3                         split_at_char_test/0
force/1                       split_list/2
foreach_chunk_in_file/3       split_list_test/0
foreach_word_in_file/2        string2exprs/1
foreach_word_in_string/2      string2exprs_test/0
forever/0                     string2html/1
get_erl_section/2             string2latex/1
get_line/1                    string2lines/1
get_line/2                    string2lines_test/0
have_common_prefix/1          string2stream/1
have_common_prefix_test/0     string2stream_test/0
hex2bin/1                     string2template/1
hex2bin_test/0                string2template_test/0
hex2list/1                    string2term/1
hex2list_test/0               string2term_test/0
hex_nibble2int/1              string2toks/1
hex_nibble2int_test/0         string2toks_test/0
id/1                          sub_binary/3
include_dir/0                 template2file/3
include_file/1                term2file/2
interleave/2                  term2string/1
is_alphanum/1                 test/0
is_blank_line/1               test1_test/0
is_prefix/2                   test_function_over_substrings/2
is_prefix_test/0              tex2pdf/1
is_response_correct/3         time_fun/2
keep_alive/2                  time_stamp/0
lines2para/1                  to_lower/1
list2frequency_distribution/1 to_lower_test/0
list2frequency_distribution_tetrim/1
longest_common_prefix/1       trim_test/0
longest_common_prefix_test/0  unconsult/2
lookup/2                      unsigned_byte_to_hex_string/1
lorem/1                       unsigned_byte_to_hex_string_test/0
ls/1                          which/1
                              which_added/1

Now I find this very convenient when I write a new small utility function
I stick in in elib1_misc.erl - no mental anguish in choosing a module
name is involved.

The observation that I find this very-convenient is telling me something
about modules - I like my elib1_misc it feels right.

(aside - It seems many development projects have their own private
lib_miscs ...)

Which brings me to the point of my question.

Do we need module's at all? Erlang programs are composed of lots of small
functions, the only place where modules seem useful is to hide a letrec.

The classic example is fibonacci. We want to expose fib/1 but hide the
helper function fib/3. Using modules we say

-module(math).
-export([fib/1]).

fib(N) ->
    fib(N, 1, 0).

fib(N, A, B) when N < 2 -> A;
fib(N, A, B) -> fib(N-1, A+B, A).

The downside is we have had to *invent* one module name math - whose *only*
purpose is to hide the definition of fib/3 which we don't want to be made
callable.

If we put a second function into the module math, then this second function
could call fib/3 which breaks the encapsulation of fib/3.

We could say:

let fib = fun(N) -> fib(N, 1, 0) end
in
   fib(N, A, B) when N < 2 -> A;
   fib(N, A, B) -> fib(N-1, A+B, A).
end.

I hardly dare suggest a syntax for this since I've been following
another thread in this forum where syntax discussion seem to encourage
much comment.

** Please do suggest alternative syntax's here - but do not comment on
other peoples suggestions ...

I would like to just talk about why we have modules.

Another question:

Does the idea of a module come from the idea that functions have to be
stored somewhere, so we store them in a file, and we slurp the
file (as a unit) into the system, so the file becomes a module?

If all the files were store by themselves in a database would this
change things.

I am thinking more and more that if would be nice to have *all* functions in
a key_value database with unique names.

lookup(foo,2) would get the definition foo foo/2 from a database.

The unique names bit is interesting - is this a good idea. Qualified
names (ie names like xxx:foo/2) or (a.b.c.foo/2) sounds like a good
idea but but when I'm programming I have to invent the xxx or the
a.b.c which is very difficult. It also involves the "decision problem"
if the namespaces xxx and a.b.c already exist I have to *choose* which
to put my new function in.

I think there might be a case for alises here joe:foo/2 could be used
while developing "joe" would expand to a horrible random local string the
real name being ab123aZwerasch123123_foo/2  but I would not be able to
publish my code or make it available to a third_part before I had
chosen a sensible name.

(( managing namespaces seems really tricky, a lot of peoople seem
to thing that the problem goes away by adding "." 's to the name
but managing a namespace with namees like foo.bar.baz.z is just as complex
as managing a namespace with names like foo_bar_baz_z or names like
0x3af312a78a3f1ae123 - the problem is that we have to go from a symbolic
name like www.a.b to a reference like 123.45.23.12 - but how do we discover
the initial name www.a.b? - there are two answers - a) we are given the name
(ie we click on a link) - we do not know the name but we search fo it ))

When programs are small we can live with "just the code" in "a few
modules" the ratio of code to meta data is high.

When programs are large we need a lot of meta-data to understand them.

I would like to see all functions with all meta-data in a data base.

I'd like to say:

   lookup(foo,2,Attribute) when Attribute =

      code|source|documentation|type signatures|revision history|authors|...

The more I think about it the more I think program development should
viewed as changing the state of a Key-Value database.

So I imagine:

    1) all functions have unique names
    2) there are no modules
    3) we discover the name of a function by searching metadata
       describing the function in a database
    4) all public functions (think open source) are in the same
       database

We could make a system to do this.

I think this would make open-source projects easier, since the
granularity of contribution goes down. You could contribute
a single function - not an entire application.

(( A problem with GUT style open source projects is there is
   not one database of functions, I often what one function from
   this project, another function from another project -- the
   granularity of reusable parts should be the individual function.

   functions are really easy to reuse
   modules are more difficult to reuse
   entire applications are very difficult to reuse
     (Unless there are isolated through a communication channel))

Possible extensions.

    1) Voting for promotion
    2) A review process

Given a raw database will *all* functions in it - we could derive an
"approved" functions database.

Popular functions could be moved to the approved database - the
review process would need to be discussed - so kind of peer-review/wiki
stuff.

Comments?

Volunteers?

/Joe
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20110524/2c016dc8/attachment.htm>