[erlang-questions] potential compiler improvement?

Tue Oct 27 17:37:08 CET 2009

I suspect unpacking 7 or 5 arguments takes more or less the same time.

This involves accessing *p and *(p+1) etc. (in C) My understanding of
things was once you
had accessed *p accessing *(p+1) was free. Accessing *p pulls the data you need
into the cache (which is potentially expensive) once in the cache
accessing *(p+1) is
free. What matters is cache misses. Aligning the start of the tuple on
a cache line
boundary should give a great win, also making sure that the entire
tuple will fit
in cache - this is very processor specific, so you'd need a JIT and
deep magic to do this.

Sticking stuff in different parts of memory and making sure that code
does not span
page boundaries will also help - don't have a tight loop spanning a
page boundary.

Erik Hagersten used to optimize C by physically moving C subroutines in a file
- shuffle - recompile and measure. And the programs went faster ( a
bit) we thought he was nuts, now he has a company that measures this
:-) - It's the cache.

Tricky stuff

/Joe

On Mon, Oct 26, 2009 at 10:02 PM, Robert Virding <rvirding@REDACTED> wrote:
> 2009/10/26 James Hague <james.hague@REDACTED>
>
>>
>> In a pattern match, all referenced values in tuples are loaded into BEAM
>> registers. This happens right up front as part of the matching process. And
>> it happens even if the values aren't used until some time later, and even
>> if
>> they're not used at all in a particular branch of code.  Here's an example:
>>
>> test({A, B, C, D, E, F, G}) ->
>>   case A of
>>      this -> B + C + D;
>>      that -> E + F + G
>>   end.
>>
>> In this case, all seven tuple elements, A-G, are loaded into registers
>> before the "case" is executed. This is even though three values are
>> unneeded
>> in each of the two possible branches.
>>
>
> Now the compiler just looks at which elements from the tuple may accessed
> and only extracts those. So if you only used A, B and C it would only
> extract the first 3 elements. It could save the whole argument and only
> extract the arguments when they are used but this would entail saving the
> whole tuple for the future as long as it may be used. In the general case it
> could maybe create more instructions, or maybe less, depending on what
> follows. The analysis would be more difficult, though probably not
> excessively so. Now the analysis and code generation is easier. It could
> also delay freeing the tuple which would mean more work the garbage
> collector.
>
> I have absolutely no idea of the results of these trade-offs.
>
> Robert
>