[erlang-questions] What are the "Most valuable libraries?"...and a few other questions

Fri May 20 20:56:15 CEST 2011

On Wed, May 18, 2011 at 02:25, Todd <t.greenwoodgeer@REDACTED> wrote:

Let me be quite blunt here...

> 1. In general, what are the most valuable libraries to learn, both within
> the Erlang dist and external?
>
> 2. Is there a consolidated/curated repository of libraries that is industry
> standard? I know the erlware folks have a repo...is that both a complete and
> accepted authoritative repo? From reading the list, it sounds like there's
> also a fair bit of stuff scattered about in github, too.

I think this approach to Erlang is wrong. Rather than ask for a set of
"standard" modules to look into you should attack it on a on-demand
basis when you find a need for a specific library. Personally, I
really like the Agner system which aims to be a system listing
available software so you can use it. It intermingles with the Rebar
build system in a neat way.

Erlang seeks to provide tools, not solutions. As such, you will find a
lot of tools in the OTP distribution and elsewhere which will give you
stuff to write your own solutions. But you won't find any prebaked
solutions which magically solves the problem you are looking at.

> 3. How does one easily multithread an app? For instance, there's pmap in
> clojure and something similar in akka that lets you map a function across a
> list, and it allocates threads accordingly...

There is no easy way to multi-thread an app so it gives good speedup
when adding more cores. There are some general guidelines you can
follow when writing the program, but they do not always yield a
speedup. Here is a simple module:

-module(foo).
-compile(export_all).

m(X) ->
    X*2.

test_input() ->
    lists:seq(1, 10000).

t1(L) ->
    timer:tc(fun() ->
                     [m(X) || X <- L]
             end).

t2(L) ->
    timer:tc(rpc, pmap, [{foo, m}, [], L]).

where t1/1 and t2/1 are our tests. t1 uses a list comprehension and t2
uses the pmap function of the rpc module to execute in parallel on my
two cores. A simple experiment in the shell:

Eshell V5.8.4  (abort with ^G)
1> c(foo).
{ok,foo}
2> L = test_input().
** exception error: undefined shell command test_input/0
3> L = foo:test_input().
[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
 23,24,25,26,27,28,29|...]
4> X = foo:t1(L).
{2491,
 [2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,
  44,46,48,50,52,54|...]}
5> Y = foo:t2(L).
{65111,
 [2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,
  44,46,48,50,52,54|...]}
6>

shows how t1 is much much faster than t2. You need to know a lot about
the problem at hand to make it faster. If your m/1 function is altered
to this:

m(X) ->
    timer:sleep(3),
    X*2.

so we in the parallel example can do other work in between, then the
numbers are different:

3> X1 = foo:t1(L).
{40112834,
 [2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,
  44,46,48,50,52,54|...]}
4> X2 = foo:t2(L).
{105134,
 [2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,
  44,46,48,50,52,54|...]}

in much in favor of t2. Hopefully this shows you need to know about
your problem to make it go faster. There is no magical solution.

It is also important to note that in Erlang, concurrency was added to
build fault tolerant programs. Not to make programs run faster. It is
neat that it is often the case that it helps on multi-core machines,
but it was not the initial goal.

> 4. Along that note, does anyone have any ideas as to how to tackle the
> Typesafe 'getting started tutorial?'
>
> http://typesafe.com/resources/getting-started/tutorials/getting-started-first-scala.html

Yikes! All that code in Erlang is:

calc_pi_worker({Start, N}) ->
    lists:sum(
      [4.0 * (1 - (I rem 2) * 2) / (2 * I + 1) ||
          I <- lists:seq(Start, Start+N-1)]).

calc_pi() ->
    K = 10000,
    lists:sum([calc_pi_worker({I*K, K}) || I <- lists:seq(0,K-1)]).

if we run it sequentially it gives the correct result, but is fairly
slow since we generate a lot of data that subsequently becomes
garbage. calc_pi_worker/1 can be optimized by moving stuff out of the
main loop and not building the list with lists:seq/2 each time around.
calc_pi/0 can be optimized by using the plists library
(https://github.com/eveel/plists). So while we are at it, let us
parallelize. And let us not create all that boiler plate while doing
it! Here is the code:

-module(foo).
-compile(export_all).

calc_pi_worker({Start, N}) ->
    calc_pi_worker(Start, N, 0).

calc_pi_worker(_I, 0, Acc) -> Acc;
calc_pi_worker(I, K, Acc) ->
    S = (1 - (I rem 2) * 2) / (2 * I + 1),
    calc_pi_worker(I+1, K-1, Acc + S).

calc_pi() ->
    K = 10000,
    4.0 * lists:sum(plists:map(fun foo:calc_pi_worker/1,
                               [{I*K, K} || I <- lists:seq(0, K-1)])).

Yes, we don't have to change anything else. This is parallel using as
many cores as you have. It can be tuned some more, but for a start it
is magnificent, even though it is slower than the Akka-version.

> 4b. Side note: is anyone concerned about Akka/Typesafe stealing mindshare?

Well, if you want to write all the boiler plate code they have to
write, then go ahead. I'd rather not :)

Seriously though, the mindshare we will steal are from Non-concurrent
languages, be it Python, Ruby, Java, C# or ... -- We are in the same
boat as Scala/Akka, Go and so on. The influx of interested programmers
will be large, so we don't have to worry too much about who steals
from whom. Erlang has the distinct advantage of being old, tried,
battle-hardened and extremely robust. It has a main focus on fault
tolerance, which gives it some unique capabilities. Also, its focus on
using functional programming is a robustness advantage. You can't just
copy part of Erlang to obtain what it provides. You have to have all
of it and then some.

> 5. How does one push an app such that it self instantiates it's processes
> across the cluster? I can see how OTP is great at managing an app on a
> single node, but how do you say something like: "create one of these
> processes on each node in the cluster, and restart 1-for-1 if they die"...
> or something similar. I see mention of gproc, but honestly, I don't see how
> to use it. Likewise, if nodes are added to the cluster, how would you ensure
> that the necessary processes are pushed to the new node after it joins the
> cluster?

Essentially this is handled by the application if it is written to
adhere to some rules of Erlang and if you write application such that
it does not assume all of it is present locally on a single VM, but is
distributed across multiple. There is no automatic solution here
either. For each application you will need to define what to do. For
some applications a 'takeover' is enough. There is one running and if
the node it runs on crashes, then another node will take over the job
and arrange that requests are now redirected to the new node. For
other systems, like Riak, all nodes are simultaneously runing and the
application on each node talks to the same application on other nodes
to internally manage state.

Yes, it is a hard problem. But Erlang provides you with the tools to solve it.

> 6. How do you deploy and live code upgrade in real life? I've been looking
> at some of the work by the 'Dukes of Erl' ... is erlrc what folks commononly
> use?

I don't. The projects I am working on has the virtue that we can do
rolling upgrades by closing down machines and restart them. You will
have to ask someone else :)

> 7. Does anyone use dynamic load balancing of demand across a cluster (e.g.
> spinning up erlang processes to meet the demand curve?)

I am sure there are people who does this. But I'll let them answer the
question. It is not that hard to pull off.

> 8. What's the best way to integrate w/ other code bases. In akka, you'd use
> camel as an integration bus. What are the common ways to integrate with
> erlang? Is that what ports and nifs are for? Forgive my ignorance, but I
> always considered those as simply ways to write code in a different, perhaps
> more comfortable language...not as integration mechanisms.

Integration is perhaps Erlangs strength. NIFs are for writing small
hotspots in your code in C for speed. The calc_pi_worker/1 from above
comes to mind for instance.  Ports are used for several things. They
are a representation in the Erlang VM of something external. It can be
another process, where we have a pipe for communication. It can be a
file on disk. It can be a network socket. Or it can be a linked in
driver. Another option is to make a node in another language that
talks the Erlang distribution protocol. Yet another option is to use a
message queue like ZeroMQ or AQMP for communication. Finally, you can
do like my own project and simply implement the foreign protocol in
Erlang - BitTorrent in my case. Erlang rocks for implementing foreign
protocols.

> Also, I've continued to peck away at various newbie tutorials. Any
> comments/suggestions/corrections are welcome.

You never go wrong with Fred Herbert:

http://learnyousomeerlang.com/ - witty, informative, awesome -- even
if the octopus has too few tentacles.

-- 
J.