geometric memory growth

Mon Nov 28 00:54:50 CET 2005

--- Ulf Wiger <ulf@REDACTED> wrote:

> Den 2005-11-25 13:17:31 skrev Thomas Lindgren
> <thomasl_erlang@REDACTED>:
> >
> > --- Ulf Wiger <ulf@REDACTED> wrote:
> >
> >> The obvious fix:
> >>
> >>     Channel = State#state.channel,
> >>     SendF = fun(To, Msg) ->
> >> 	          msg(Channel,
> >> 	          ChNo, To, Msg)
> >> 	      end,
> >>
> >> Now, wouldn't it be great if the compiler could
> >> figure out how to do this
> >> at compile-time?
> >
> > Hoisting the expression out of the fun is only
> safe if
> > State is known to be a #state record at this
> point.
> > Otherwise, you can get an exception in the wrong
> > place.
> 
> Forgive me, but I can't help thinking that an
> exception
> when the fun is applied would be the wrong place,
> rather
> than when the fun is instantiated. The fun's
> environment
> is bound when it the fun is bound, so I think it
> would
> be intuitive to get the exception at the binding
> occurrence rather than later. After all, the record
> field
> access can have no side effects (apart from the
> run-time
> error when accessing the field of a non-record), 

In this case, it's the possible runtime error that is
the problem. In principle, the compiler could
speculatively generate something like:

if
   record(State, state) -> Channel =
State#state.channel, fun ... end;
   true -> fun ... State#state.channel ... end
end

(But in that case, is the cost of the extra code worth
the saved effort?)

> and
> semantically, the access _has_ to be done at the
> binding
> occurrence - if it is delayed, one has to lug the
> entire
> record along to make sure that the record field
> access
> has the same result as it would have had at the
> binding
> occurrence.

I would say that semantically this evaluation MUST NOT
be done at the binding occurrence, IF we can observe a
functional difference compared to the original program
evaluated naively. You have told the compiler that the
program shall behave AS IF the evaluation occurs
inside the closure; changing program behaviour is
usually considered bad form.

> You said there are corner cases. I'd like to hear
> about
> them. I would say that it isn't obvious either from
> the
> erlang 4.7 specification or the erlang reference
> manual
> what exactly is supposed to happen.

If one considers moving expressions out of closures in
general, then there are plenty of cases, some of them
mentioned by other posters. (See below for a
discussion on record fields in particular.) First, the
new program must preserve side effects (including
exceptions), which ones occur and in what order.
Second, the new program should (almost always) be
faster than the original program. Given that source
programs vary widely, there are lots of possibilities.
Here is one:

A = ..., F = fun() -> B = g(A), ... end.

Let's assume g(A) can be evaluated safely. Should the
compiler move B out of the closure? If yes, what if B
is a large data structure? What if F is copied here
and there? Recomputing g(A) MAY be cheaper than
dragging around B. And what if g(A) is expensive and F
is never used, or seldom used? Computing B may then
often be a waste of effort. The compiler thus needs to
convince itself that the optimization is not only safe
but also profitable.

(I remember encountering a similar, though not
identical, code-motion decision problem in an
unfinished pass in an early HIPE compiler, by the way:
the compiler saw this code:

t1 = load(x+k);
t2 = load(x+k2);
...
if (test) then L1;
/* t1 ... tn forgotten afterwards */

In this case, we are doing a sequence of loads that
are partially dead. So ... should the compiler move
the loads to L1, where they are _definitely_ needed,
and as a consequence make the fall-through path
faster? [see below for answer])

For plain record accesses: assuming safeness, the
compiler basically has to consider the following trade
off for profitability: if we extract N fields from the
record State, when is the cost of creating a closure
with (N-1) more free variables less expensive than
performing the accesses every time the closure is
used? How many times is the closure used? (The
compiler can sometimes be clever about how closures
and free variables are implemented, so this decision
must also take that into account.) In particular, are
there cases when the closure is not used?

The compiler usually resorts to heuristics at this
point. For a production compiler, the idea is usually
to avoid making the rare but spectacularly bad
decisions :-)

Best,
Thomas

* The answer: in this case, the jump to L1 was usually
taken and t1,...,tn were used shortly afterwards. So
keeping them put meant fewer load stalls compared to
moving them down, though at the cost of unnecessary
work in the infrequent case. 

__________________________________ 
Yahoo! Mail - PC Magazine Editors' Choice 2005 
http://mail.yahoo.com