[erlang-questions] Keeping massive concurrency when interfacing with C

Tue Oct 4 13:21:34 CEST 2011

On Tue, Oct 4, 2011 at 5:05 AM, John Smith <emailregaccount@REDACTED> wrote:
> Sorry, I should've explained in more detail what we're trying to do.
> That would help, eh? :)
>
> In a nutshell, our goal is take a portfolio of securities (namely
> bonds and derivatives), and calculate a risk/return analysis for each
> security. For risk, interest rate shock, and for return, future cash
> flows. There are different kinds of analyses you could perform.
>
> Here's a more concrete example. Pretend you're an insurance company.
> You have to pay out benefits to your customers, so you take their
> money and make investments with it, hoping for a (positive) return, of
> course. Quite often insurance companies will buy bonds, especially if
> there are restrictions on what they can invest in (e.g., AAA only).
>
> You need to have an idea of what your risk and return are. What's
> going to happen to the value of your portfolio if yields rise or fall?
> Ideally you want to know what your cash flows will look like in the
> future, so you can have a reasonable idea of what shape you'll be in
> depending on the outcome.
>
> One such calculation would involve shocking the yield curve (yields
> plotted against maturity). If yields rise 100 basis points, what
> happens to your portfolio? If they fall far enough how much would
> yields need to fall before any of your callable bonds started being
> redeemed?
>
> Part of the reason why I think Erlang would work out well is the
> calculations for each security are independent of each other -- it's
> an embarrassingly parallel problem. My goal was to spawn a process for
> each scenario of a security. Depending on how many securities and
> scenarios you want to calculate, there could be tens or hundreds of
> thousands, hence why I would be spawning so many processes (I would
> distribute these across multiple machines of course, but we would have
> only a few servers at most to start off with).
>
> Because Erlang is so efficient at creating and executing thousands of
> processes, I thought it would be feasible to create that many to do
> real work, but the impression I get is maybe it's not such a great
> idea when you have only a few dozen cores available to you.
>
> CGS, could you explain how the dynamic library would work in more
> detail? I was thinking it could work like that, but I wasn't actually
> sure how it would be implemented. For example, if two Erlang processes
> invoke the same shared library, does the OS simply copy each function
> call to its own stack frame so the data is kept separate, and only one
> copy of the code is used? I could see in that case then how 20,000
> Erlang processes could all share the same library, since it minimizes
> the amount of memory used.
>
> David, the solution you described is new to me. Are there any
> resources I can read to learn more?
>
> Joe (your book is sitting on my desk as well =]), that's rather
> interesting Erlang was purposely slowed down to allow for on-the-fly
> code changes. Could you explain why? I'm curious.

I said "slow by design" - perhaps an unfortunately choice of words -
What I meant was that there was design decision to allow code changes
on the fly and that a consequence of this design decision
means that all intermodule calls have one extra level of indirection
which makes them slightly slower to implement then calls to code which
cannot be changed on the fly.

Suppose you have some module x executing some long-lived code
(typically a telephony transaction) - you discover a bug in x. So you
fix the bug. Now you have two versions of x. The x that is still
currently executing, and the modified x that you will use when you
start new
transactions.

We want to allow all the old processes running the old version of x to
"run to completion" - new processes will get the next version of x.

This is achieved as follows: if you call x:foo/2 you always call the
latest version of the code, but inlined calls call the current version
of the code.

Let me give an example:

Imagine the following:

     -module(foo).

     fix_loop(N) ->
         ...
         fix_loop(N+1).

     dynamic_loop(N) ->
          ...
          foo:dynamic_loop(N+1)

In the above fix_loop and dynamic_loop have *entirely different behaviors *

if we compile and reload a new version of foo, then any existing processes
running fix_loop/1 inside x will continue running the old code.

Any old processes running dynamic_loop/1 will jump into the new
version of the code when they make the (tail) call to
foo:dynamic_loop/1

To implement this requires one level of indirection in making the subroutine
call. We can't just jump to the address of the code for loop, we have to
call the function via a pointer. The ability to change code on the fly
introduces
a slight overhead in all function calls where you call the function
with an explicit module name - if you omit the module name then the
call will be slightly
fast, since the address cannot be changed later. so calling fix_loop/1
in the above is slightly faster than calling dynamic_loop/1.

Why do we want to do all this anyway?

We designed Erlang for telecomms applications - we deploy applications that
run for years and want to upgrade the software wihout disrupting services.

If a user runs some code in a transaction that takes a a few minutes and
we change the code we don't want to kill ongoing transactions using
the old code - nor can we wait until all transactions are over before
introducing new code (this will never happen).

Banks turn off their transactions systems while upgrading the software -
(apart from Klarna :- ) - aircraft upgrade the software while the
planes are on the ground (I hope) - but we do it as we run the system
(we don't want to loose calls just because we are upgrading the
software)

Now suppose you discover a fault in your software that causes to you
buy or sell shares at a catastrophically bad rate - what do you do -
wait for everything to stop before changing the code? - or pump in new
code to fix the bug in mid session. Just killing everything might
leave (say) a data base in an inconsistent state and make restarting
time-consuming.

Dynamic code change is useful to have under your feet just in case you need
it one day - in the case on online banking companies like Klarna use
this for commercial advantage :-)

/Joe

 >
> We are still in the R&D phase (you could say), so I'm not quite sure
> yet which specific category the number crunching will fall into (I
> wouldn't be surprised if there are matrices, however). I think what
> I'll do is write the most intensive parts in both Erlang and C, and
> compare the two. I'd prefer to stick purely with Erlang though!
>
> We have neither purchased any equipment yet nor written the final
> code, so I'm pretty flexible to whatever the best solution would be
> using Erlang. Maybe next year I can pick up one of those 20K core
> machines =)
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>