[erlang-questions] Keeping massive concurrency when interfacing with C
Wed Oct 5 10:54:49 CEST 2011
Sorry for delay in my answer (internet problems here).
To answer your question now. Yes, the main difference in between static
libraries (.a) and dynamic libraries (.so) is that the dynamic library is
loaded only once for all the instances requiring that library without
overlapping the processes data. In some other words, what you said it's
true, the heaps are kept separately for all the distinct processes calling
the same shared library, but the memory usage is less because the library
resides in the memory only once. Following this idea, some LINUX developers
were expressing their thoughts to make all the libraries as shared ones. But
that is quite a joke because shared libraries are loaded on demand basis and
not when the application starts and it is discarded when no application is
using it. Just to exemplify, consider you build a graphic application which
knows how to handle hundreds of graphic formats (bitmap, jpeg, gif, pixel
map and so on). You don't want to use static libraries because you may not
have enough RAM to load them all. Then, you pay the price for your
application to be a bit slower, but you use shared libraries which are
loaded only when your application requests them and discarded when your
application doesn't need them anymore.
Another example for you to understand the concept behind shared (aka
dynamic) libraries. Let's say you want to compress with GNUZip (aka gzip)
different folders to make backups on different storage elements (from your
computing element to archive one folder which goes on storage element A and
another folder to go on storage element B). So, you need to run gzip twice,
but you don't want to wait for first archive to finish itself in order for
you to start the second. Then, you open two sessions (either two terminals
or start the first process and put it to work in background) and start
archiving your folders (for you to be able to drink a coffee until they are
finished :) ). Both of them use zlib.so (or zlibc.so, but it's not important
for this example) which is loaded once, but the archives do not mix their
data (if you don't believe me, try it :) ). That means, as you said, the
heaps do not overlap for the two processes even if your library resides in
the memory only once at the time (you can check if you don't believe me).
I think the second example is what you need. So, all you need is a shared
library which needs to reside in the memory (e.g., called by a dummy program
which does nothing but keeps the library in the memory). In this way, your
Erlang threads can call it at any time (keep in mind that Erlang is not able
by itself to ask OS to load any external library).
I hope I brought some light in the concept of shared library (under MS
Windows are called dynamic and they can be identified by the suffix dll -
dynamic linked library) as I tried to give it in simple words (please, those
who are experts, do not kill me for not expressing myself in the terminology
required or if I skipped some technical details). If you still have
questions, let me know.
Now about your project. Because what you need I think it doesn't require a
lot of time to be processed (I computed trendlines and filtered them on the
live securities downloaded from internet with cURL and I know it goes pretty
fast the calculus), you don't need to run all of them in parallel, so, you
can divide them into a number of parallel threads suitable to your hardware
system. In this way you make the best of your hardware.
Two more things here. Firstly, yes, Erlang can help you to update the
software while running it. That will help you a lot because in C/C++ you
will need to device an intelligent system to update your code without
stopping it (I did once something like that, but believe me, it's not worth
the work when you have this already implemented). Secondly, if you already
have the code written in C/C++, then it's worth making the connection, but
otherwise you can work in Erlang because you won't feel big difference for
what you need.
But, of course, these are just suggestions. At the end of the day, it's your
responsibility, so, you will decide what approach to follow. So, I can only
wish you good luck! ;)
On Tue, Oct 4, 2011 at 6:05 AM, John Smith <emailregaccount@REDACTED>wrote:
> Sorry, I should've explained in more detail what we're trying to do.
> That would help, eh? :)
> In a nutshell, our goal is take a portfolio of securities (namely
> bonds and derivatives), and calculate a risk/return analysis for each
> security. For risk, interest rate shock, and for return, future cash
> flows. There are different kinds of analyses you could perform.
> Here's a more concrete example. Pretend you're an insurance company.
> You have to pay out benefits to your customers, so you take their
> money and make investments with it, hoping for a (positive) return, of
> course. Quite often insurance companies will buy bonds, especially if
> there are restrictions on what they can invest in (e.g., AAA only).
> You need to have an idea of what your risk and return are. What's
> going to happen to the value of your portfolio if yields rise or fall?
> Ideally you want to know what your cash flows will look like in the
> future, so you can have a reasonable idea of what shape you'll be in
> depending on the outcome.
> One such calculation would involve shocking the yield curve (yields
> plotted against maturity). If yields rise 100 basis points, what
> happens to your portfolio? If they fall far enough how much would
> yields need to fall before any of your callable bonds started being
> Part of the reason why I think Erlang would work out well is the
> calculations for each security are independent of each other -- it's
> an embarrassingly parallel problem. My goal was to spawn a process for
> each scenario of a security. Depending on how many securities and
> scenarios you want to calculate, there could be tens or hundreds of
> thousands, hence why I would be spawning so many processes (I would
> distribute these across multiple machines of course, but we would have
> only a few servers at most to start off with).
> Because Erlang is so efficient at creating and executing thousands of
> processes, I thought it would be feasible to create that many to do
> real work, but the impression I get is maybe it's not such a great
> idea when you have only a few dozen cores available to you.
> CGS, could you explain how the dynamic library would work in more
> detail? I was thinking it could work like that, but I wasn't actually
> sure how it would be implemented. For example, if two Erlang processes
> invoke the same shared library, does the OS simply copy each function
> call to its own stack frame so the data is kept separate, and only one
> copy of the code is used? I could see in that case then how 20,000
> Erlang processes could all share the same library, since it minimizes
> the amount of memory used.
> David, the solution you described is new to me. Are there any
> resources I can read to learn more?
> Joe (your book is sitting on my desk as well =]), that's rather
> interesting Erlang was purposely slowed down to allow for on-the-fly
> code changes. Could you explain why? I'm curious.
> We are still in the R&D phase (you could say), so I'm not quite sure
> yet which specific category the number crunching will fall into (I
> wouldn't be surprised if there are matrices, however). I think what
> I'll do is write the most intensive parts in both Erlang and C, and
> compare the two. I'd prefer to stick purely with Erlang though!
> We have neither purchased any equipment yet nor written the final
> code, so I'm pretty flexible to whatever the best solution would be
> using Erlang. Maybe next year I can pick up one of those 20K core
> machines =)
> erlang-questions mailing list
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the erlang-questions