[erlang-questions] speeding up 3N+1

Thu Apr 12 08:08:04 CEST 2007

On 8 Apr 2007, at 1:18 am, June Kim wrote:
> Please have a look at this famous problem. http://acm.uva.es/p/ 
> v1/100.html

cycle_length(N) when is_integer(N), N >= 1 ->
     cycle_length(N, 1).

cycle_length(1, L) -> L;
cycle_length(N, L) ->
     cycle_length(case N band 1 of 1 -> 3*N+1; 0 -> N div 2 end, L+1).

The conditions of the problem ensure that this will not in fact involve
any cycle at all; "cycle length" is an astoundingly bad name for the
quantity to be computed.

max_cycle_length(I, J) when is_integer(I), is_integer(J), I =< J ->
     max_cycle_length(I+1, J, cycle_length(I)).

max_cycle_length(I, J, M) when I > J -> M;
max_cycle_length(I, J, M) ->
     max_cycle_length(I+1, J, max(L, cycle_length(J)).

max(X, Y) when X > Y -> X;
max(_, Y) -> Y.

Simple.  Quite fast enough.  I just asked for max_cycle_length 
(1,1000000)
and got an answer in 43.8 CPU seconds on a 500 MHz machine.  I know you
speak of your "sluggish machine", but was it as slow as mine?  How fast
does the calculation NEED to be?

> cycle(N) ->
>     case get(N) of

Yes, memoisation can give you very worthwhile speedups; IF you have a  
big
problem with repeated calculations to start with.  However, maintaining
the memory of function arguments and results itself takes time, and  
above
all, space.  And on today's machines, more space definitely means less
space.  (I've been measuring load times as
	from L1 cache:	2-3 cycles
	from L2 cache:  10 cycles
	from main memory
	(sequential): ~100 cycles
	(random):    ~1000 cycles
on three different kinds of machines lately.  The C version of the code
above runs entirely from registers; making it poke around in hash tables
would probably slow it down by a factor of a thousand just because of  
using
more memory.

If you do want to memoise, it's probably worth keeping the memo small.
Try keeping the cache to just 2*(0..1023)+1.

The comparatively small ratio between 3 seconds on your "sluggish  
machine"
and 43.8 seconds on my 500 MHz machine (14.6) suggests to me that  
there IS
a certain amount of repeated calculation but not a huge amount, so a  
small
cache might work very well.