[erlang-questions] Erlang: Who uses it for games?

Thu Jul 8 16:47:29 CEST 2010

While yes the SPUs are mainly for vector processing, imo, they could be
suited quite well in practice for erlang processes.
256kb is total space they have for code/data as such would be great for
small processes. Erlang processes already have all their data immutable
meaning this would imply little need to dma info in/out of spus. While not
designed for it, the SPUs can still be used for general processing and in
fact, it is better to use them with something they aren't so good at rather
than not use them at all. You'll find that a large number of ps3 games, even
today make use of less than 75% of the SPUs.

Since most erlang applications are already message passing based, the
processes need not share their memory with each other. Obviously, the size
being the key factor here, one would still have to fit the code in <150kb to
be able to do much with the processes. For long lists processing or large
binaries, one might be able to do double buffering (or tripple) and still
keep decent speeds. This could be done in erlang with minimal nif support,
see below.

I'd see it like this: you'd spawn a process, you'd mark it as being "spu
friendly" so that the spus can find it when they look for the next process
to execute. SPUs would then dma the code & process data and run it and then
move on to the next process. For long running processes, one could
artificially "yield" to allow the spu to run another task before returning
to it, something like a yield() nif. Obviously, only a subset of erlang
would run out of the box, as things like io/kernel stuff would require
message passing with a process on the PPU to execute them. I could see
number crunching processes having a partner process on the PPU who's job
would be to partition/segment the data and feed it to the SPU process on
demand. The spu process could then do something like:

%% Process data in 50kb chunks, or perhaps less
loop(SomeData, CurrPos) ->
PPUPartner ! {send_next, CurrPos}, %% DMAed and sent to the PPU partner
process executing on the PPU, might not be needed if the partner process
just splits everything up front and queues all the processing messages or
keeps track via the 'processed' message and keeps the queue full or at a
decent size
process current pos. stuff here, part 1... %% This doesn't have to be simd
stuff, but very well could be (using NIFs)
begin_receive_msgs(), %% NIF to start dma of messages, this would start
streaming in the next data chunk, optional, only there to reduce latency
from blocking receive statement which has to dma any new messages and wait
for their data
process current pos. stuff here, part 2...
PPUPartner ! {processed, CurrPos, Result}, %% Sends the result calculated
back to the ppu partner process to gather them back, come to think of it,
this pattern is similar to map-reduce
receive {data, Data, Pos} -> loop(Data, Pos), done -> ok end.

When the process ends execution, run GC if needed, wait for all pending DMAs
to finish and grab the next one. On the PPU side, the partner process mostly
idles waiting for the SPU to be done with the data and request more or send
back results. It would do simple operations like appending the result to the
head of a list for max performance then at the end, simple run a
lists:reverse (as done traditionally anyway) or for long lists, run that
operation on the SPUs as well (lists:spu_reverse?). 1 PPU partner process
could easily split work among several SPU processes, feeding them and
gathering/sorting their results.

Anyhow, sorry for the lengthy train of thoughts. It is a shame now that the
PS3 is no longer supporting the "other OS" option otherwise I would love to
experiment with something like this. Then again, it appears IBM will no
longer develop new cell processors in the future so this effort might be
moot. (I think they are stopping at something like 2 ppu hyper threaded
cores (4 virtual cores) and 32 spu cores per chip? alas I forgot the
details) I'm also not sure if the other os option allowed the OS to access
to the spus in the first place as if I remember correctly, a number of
things are "turned off" and inaccessible so mostly all of this talk is
speculation :).

2cents :)
Nicholas

On Wed, Jul 7, 2010 at 8:05 PM, Richard O'Keefe <ok@REDACTED> wrote:

>
> On Jul 8, 2010, at 4:27 AM, Nicholas Frechette wrote:
>
>> hand optimize code which will require C++/assembly) and not so much on the
>> PS3 where you may get a hard time using the SPUs from erlang (as I'm
>> guessing the VM won't run on them out of the box and you might have to
>> spend
>> considerable time to get erlang processes to run on SPUs (feasible I'm
>> sure)).
>>
>
> It's not clear that it _is_ feasible in any interesting sense.
> The SPUs are essentially numeric vector engines.  They have 128
> registers, each 128 bits wide; loads are only 128 bits (16 bytes)
> and stores are only 128 bits (16 bytes).  Memory is byte addressed
> and addresses are 32 bits, but the 2009 manual talks about 256 kB.
>
> One could imagine SPU _nodes_ doing vector crunching that Erlang
> processes running on the PPU communicated with as if Erlang.  But
> Erlang would not be the language of choice for programming SPUs.
> (An APL dialect would be ideal, or Fortran 95.)
>
>
>