New trading systems platform

Mon Jul 11 00:11:54 CEST 2005

I asked a trick question:

	> What should {1,2} + {10,20,30} do, and why?

and James Hague <james.hague@REDACTED> fell RIGHT into the trap.

	It should exit with a badarith code.  Why?  There's no clear meaning
	to adding vectors of different lengths.  (I tried it in J and got back
	"length error.").  Likewise, {1,2} + {1,two} should also result in
	badarith.

But there is another, considerably more popular, array language in which
applying a binary operation to vectors of different lengths IS defined,
and for excellent reasons:

	> c(1,2) + c(10,20,30)
	[1] 11 22 31

I am not going to say "this one is right, that one is wrong", my point
is that there is no *OBVIOUS* answer and it takes a great deal of hard
thinking to produce a real design.  It's NOT just a matter of hacking
on the VM.

Oh yes, and {1,2} + {1,two} tells us that there is a question about what,
precisely, the type error should complain about.  Should it complain about
(2, two) or should it complain about ({1,2}, {1,two})?  You can argue this
one either way.  We need a coherent principle (or small set of such
principles) which will let us decide such questions consistently.

	True, I can understand that point.  But at the same time, with
	test-driven development, I don't see it as any different than other
	issues caused by dynamic typing.

My point here is that it smashes a powerful new debugging tool for Erlang,
the type inference program we've been hearing about recently.  Adding this
feature is NOT just a matter of hacking on the VM, it would require
serious work on the type inference program and other high-powered tools.

I didn't say that it was any different from other issues (although I would
be prepared to argue that).  What I said meant that it ADDS to other issues.
There's a famous paradox in philosophy:
    one grain of sand is not a heap.
    adding a grain of sand to a bunch of sand is obviously too small
    a change to convert a non-heap into a heap.
    yet if you keep on adding grains of sand, eventually you DO have a heap.
An addition could perfectly well be similar in kind to other things in a
non-heap, and you might think that making one more change won't convert a
non-heap language into a useless-heap language, but if you KEEP making such
changes, a useless-heap is what you will get.

I still have unpleasant memories of PL/I, where you could add just about
anything to just about anything, whether it made sense or not.

Take for example (1+'2').  There obviously *is* a number inside that quoted
atom, so why *not* let it be extracted?  (Would you get 3 or '3'?)  If a
binary happens to be the term_to_binary() representation of a number, or
of a tuple with numbers inside it, &c, why *not* hack on the VM so that
it automatically tried binary_to_term() any time that a binary as such made
no sense?

We have to draw the line *somewhere*, and what we have now is tolerably
coherent.

	> It might, for example, be better to introduce a whole new "array" data type;
	> that would be much more work, but it could yield better performance (using
	> long-known techniques from APL) without sacrificing any of the run-time
	> type checking we now have.

	Strictly from a selfish point of view based on the kind of
	applications I work on, I'd like to see "array of float" as a
	fundamental type.  Floats are individually heap allocated, so there's
	a big win to putting them in a homogeneous array (OCaml has taken a
	similar route).

It's also what Squeak Smalltalk has done, and if I've understood correctly
the GHC Haskell compiler supports this in a rather clever way.  (And of
course Clean has supported for *ages* without having to be clever about it.)

	But from a conceptual point of view,
	arrays and tuples are the same thing, so why split them up?

Because from a conceptual point of view arrays and tuples *AREN'T* the
same thing.  Valid analogues are array:list and record:tuple.  (As you
may have noticed, Erlang -records *are* tuples.)  Sure, some tuples may
*happen* to have fields that are all the same type, but that's not usual,
just as C structs may happen to have fields all of the same type, but
usually don't.

There are oodles of array operations that make no sense at all on tuples;
it could be nice having a data type which supported those operations.