[erlang-questions] The compiler "eats" structures which are not separated by commas

Mon Apr 30 03:05:42 CEST 2012

On 27/04/2012, at 11:08 PM, Jan Burse wrote:
> If it were possible to create a frame data structure
> with very good footprint and performance properties, I
> would have used it already back then.

As described in the frames proposal,
a frame with n slots takes between n+1 and 2(n+1) words
depending on whether the structure was expressed statically
or dynamically.

At the lower end (which is what is comparable to OO systems)
it _is_ comparable to OO systems and if you know a way to do
better I would be delighted to hear it.  At the upper end,
it is at least a lot better than a hash table, and if you
know a better way to store dynamically constructed associations
I would be delighted to hear that too.

As for performance, the only performance that matters to me (as
this is Erlang) is performance relative to records.  I now have
some measurements which are quite encouraging.  Even better, I
can see how a module-level data flow analysis (thus one that is
compatible with hot loading) could make frames precisely as
efficient as records in some important uses in Erlang, though I
have not had time to write that up yet.

> But when I looked
> at it back then, I found some inherent limitations that
> I guess cannot be overcome,

I guess you could tell us what they are.
> 
> I guess there are theoretical papers around that
> show these limitations inherent in polymorphism.

If they are inherent in polymorphism, then they apply every bit
as much to Erlang records and Java objects as they do to frames.

It is often possible to evade 'inherent limitations'; what we
want to do on any particular occasion often has some special
property missing from the general case that is limited.  (As
a very well known example, the Simplex algorithm is in principle
exponential, but people happily used it quite successfully on
very large problems for years, and it almost always worked fine.)

The limitations are presumably just constant factor ones, and in
a world where people happily use things like Java, constant factors
that are not _too_ big need not stop something being useful.

> But statements as follows:
> 
> ROK wrote:
> > Frames make do without *any* static type information;
> > that's what the design is about.  But they can take
> > advantage of type information if available.
> 
> Are especially dangerous. They insuinate that late
> binding (*) can be solved performance wise.

Rubbish.  My statement means precisely what it says, no more
and no less.  Frames *DO* in fact make do without static type
information.  And they *CAN* take advantage of type information
*IF* it is available.

This is not a claim that 'late binding can be solved performance
wise' (whatever that means), but on the contrary and quite
obviously a claim that if what you are doing is *NOT* late
binding that can be noticed and taken advantage of.

It is also not a claim that when type information *is* exploited
the result is as efficient as it could be in a system that did
not allow late binding.

This is *precisely* like the way that Java objects are theoretically
heap allocated, not stack allocated, but papers have been published
showing that an 'escape analysis' can allow a Java compiler to
allocate some objects on the stack after all: the language has no
restriction on object lifetimes but IF you are using an object in
a way which IS restricted that can be noticed and exploited.

Far from making any outrageous claims about solving impossible
problems, this is a commonplace of programming language design and
implementation.  There are many examples.  For example, in Smalltalk,
closures are in principle allocated whenever execution hits a block,
but if certain built-in methods are not overridden, many blocks can
be inlined, many other blocks can be statically allocated (about a
third of the ones that aren't inlined, in my system), and a simple
static analysis lets you discover a substantial number of blocks
that can be stack-allocated (like Pascal or GNU C nested functions)
provided you box exceeding clever in #doesNotUnderstand: .
It's almost as simple as noticing that a program that doesn't create
any threads other than the initial one doesn't need to bother with
locks in stdio.

The important fact for frames is that frames are meant to be used as
a replacement for Erlang records, and Erlang records *are* used in a
statically-typed way.  

> I am primarily
> concerned with the idea that some borders can be pushed
> that have not already been push for example by my contrast
> OO language Java. So I am awaiting clarification:
> 
> JB wrote:
> > In which proposal is this substantiated?

In which proposal is WHAT substantiated?

The exploitability of type information?

Come ON:  given the representation described in frames.pdf,
if you see a slot reference ~foo(Bar) and you have type
information (Bar is a frame with these fields having those
types) similar to the information currently available about
records, then the compiler would know just as much about that
reference as a Pascal, C, or indeed Java compiler would know
about Bar^.foo (Pascal), Bar->foo (C), or Bar.foo (Java).
That would not stop the *same* value being passed to a
function that *didn't* have the type information and being
used there.