[erlang-questions] Keeping massive concurrency when interfacing with C

Tue Oct 4 05:05:26 CEST 2011

Sorry, I should've explained in more detail what we're trying to do.
That would help, eh? :)

In a nutshell, our goal is take a portfolio of securities (namely
bonds and derivatives), and calculate a risk/return analysis for each
security. For risk, interest rate shock, and for return, future cash
flows. There are different kinds of analyses you could perform.

Here's a more concrete example. Pretend you're an insurance company.
You have to pay out benefits to your customers, so you take their
money and make investments with it, hoping for a (positive) return, of
course. Quite often insurance companies will buy bonds, especially if
there are restrictions on what they can invest in (e.g., AAA only).

You need to have an idea of what your risk and return are. What's
going to happen to the value of your portfolio if yields rise or fall?
Ideally you want to know what your cash flows will look like in the
future, so you can have a reasonable idea of what shape you'll be in
depending on the outcome.

One such calculation would involve shocking the yield curve (yields
plotted against maturity). If yields rise 100 basis points, what
happens to your portfolio? If they fall far enough how much would
yields need to fall before any of your callable bonds started being
redeemed?

Part of the reason why I think Erlang would work out well is the
calculations for each security are independent of each other -- it's
an embarrassingly parallel problem. My goal was to spawn a process for
each scenario of a security. Depending on how many securities and
scenarios you want to calculate, there could be tens or hundreds of
thousands, hence why I would be spawning so many processes (I would
distribute these across multiple machines of course, but we would have
only a few servers at most to start off with).

Because Erlang is so efficient at creating and executing thousands of
processes, I thought it would be feasible to create that many to do
real work, but the impression I get is maybe it's not such a great
idea when you have only a few dozen cores available to you.

CGS, could you explain how the dynamic library would work in more
detail? I was thinking it could work like that, but I wasn't actually
sure how it would be implemented. For example, if two Erlang processes
invoke the same shared library, does the OS simply copy each function
call to its own stack frame so the data is kept separate, and only one
copy of the code is used? I could see in that case then how 20,000
Erlang processes could all share the same library, since it minimizes
the amount of memory used.

David, the solution you described is new to me. Are there any
resources I can read to learn more?

Joe (your book is sitting on my desk as well =]), that's rather
interesting Erlang was purposely slowed down to allow for on-the-fly
code changes. Could you explain why? I'm curious.

We are still in the R&D phase (you could say), so I'm not quite sure
yet which specific category the number crunching will fall into (I
wouldn't be surprised if there are matrices, however). I think what
I'll do is write the most intensive parts in both Erlang and C, and
compare the two. I'd prefer to stick purely with Erlang though!

We have neither purchased any equipment yet nor written the final
code, so I'm pretty flexible to whatever the best solution would be
using Erlang. Maybe next year I can pick up one of those 20K core
machines =)