[erlang-questions] What are the "Most valuable libraries?"...and a few other questions

Sat May 21 07:44:33 CEST 2011

Jesper, thanks for the response. I've been a bit remiss in thanking 
folks for their responses to my lengthy questions...I'm a bit pressed 
for time as I'm packing in preparation for moving.

More inline:

On 5/20/11 11:56 AM, Jesper Louis Andersen wrote:
> On Wed, May 18, 2011 at 02:25, Todd<t.greenwoodgeer@REDACTED>  wrote:
>
> Let me be quite blunt here...
>
>> 1. In general, what are the most valuable libraries to learn, both within
>> the Erlang dist and external?
>>
>> 2. Is there a consolidated/curated repository of libraries that is industry
>> standard? I know the erlware folks have a repo...is that both a complete and
>> accepted authoritative repo? From reading the list, it sounds like there's
>> also a fair bit of stuff scattered about in github, too.
>
> I think this approach to Erlang is wrong. Rather than ask for a set of
> "standard" modules to look into you should attack it on a on-demand
> basis when you find a need for a specific library. Personally, I
> really like the Agner system which aims to be a system listing
> available software so you can use it. It intermingles with the Rebar
> build system in a neat way.

Thanks for mentioning Agner, that looks cool. I'll noodle around and 
figure out how it links with rebar.

For those of you that have been programming in erlang for some time, it 
would be useful for those of us, newer to erlang, to understand what 
libraries are your bread-and-butter, as opposed to domain specific 
libraries. As an example, if someone were to ask me this about Java, I'd 
suggest...understand threading semantics, java.util.collections, 
immutable collections (guava), IOC...and JUnit. The rest would be 
specific to the problem domain (e.g. Spring Framework, hibernate|JDBC, 
EJB3, and other more specific libraries. )

>
> Erlang seeks to provide tools, not solutions. As such, you will find a
> lot of tools in the OTP distribution and elsewhere which will give you
> stuff to write your own solutions. But you won't find any prebaked
> solutions which magically solves the problem you are looking at.
>
>> 3. How does one easily multithread an app? For instance, there's pmap in
>> clojure and something similar in akka that lets you map a function across a
>> list, and it allocates threads accordingly...
>
> There is no easy way to multi-thread an app so it gives good speedup
> when adding more cores. There are some general guidelines you can
> follow when writing the program, but they do not always yield a
> speedup. Here is a simple module:
>
> -module(foo).
> -compile(export_all).
>
> m(X) ->
>      X*2.
>
> test_input() ->
>      lists:seq(1, 10000).
>
> t1(L) ->
>      timer:tc(fun() ->
>                       [m(X) || X<- L]
>               end).
>
> t2(L) ->
>      timer:tc(rpc, pmap, [{foo, m}, [], L]).
>
> where t1/1 and t2/1 are our tests. t1 uses a list comprehension and t2
> uses the pmap function of the rpc module to execute in parallel on my
> two cores. A simple experiment in the shell:
>
> Eshell V5.8.4  (abort with ^G)
> 1>  c(foo).
> {ok,foo}
> 2>  L = test_input().
> ** exception error: undefined shell command test_input/0
> 3>  L = foo:test_input().
> [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
>   23,24,25,26,27,28,29|...]
> 4>  X = foo:t1(L).
> {2491,
>   [2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,
>    44,46,48,50,52,54|...]}
> 5>  Y = foo:t2(L).
> {65111,
>   [2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,
>    44,46,48,50,52,54|...]}
> 6>
>
> shows how t1 is much much faster than t2. You need to know a lot about
> the problem at hand to make it faster. If your m/1 function is altered
> to this:
>
> m(X) ->
>      timer:sleep(3),
>      X*2.
>
> so we in the parallel example can do other work in between, then the
> numbers are different:
>
> 3>  X1 = foo:t1(L).
> {40112834,
>   [2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,
>    44,46,48,50,52,54|...]}
> 4>  X2 = foo:t2(L).
> {105134,
>   [2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,
>    44,46,48,50,52,54|...]}
>
> in much in favor of t2. Hopefully this shows you need to know about
> your problem to make it go faster. There is no magical solution.
>

I'm pretty familiar with this sort of thing in other languages, but it's 
nice to see such a clear example. For me, the missing piece was the rpc 
module that contains the pmap function. I'll have to look closely at 
that module.

> It is also important to note that in Erlang, concurrency was added to
> build fault tolerant programs. Not to make programs run faster. It is
> neat that it is often the case that it helps on multi-core machines,
> but it was not the initial goal.
>
>> 4. Along that note, does anyone have any ideas as to how to tackle the
>> Typesafe 'getting started tutorial?'
>>
>> http://typesafe.com/resources/getting-started/tutorials/getting-started-first-scala.html
>
> Yikes! All that code in Erlang is:
>
> calc_pi_worker({Start, N}) ->
>      lists:sum(
>        [4.0 * (1 - (I rem 2) * 2) / (2 * I + 1) ||
>            I<- lists:seq(Start, Start+N-1)]).
>
> calc_pi() ->
>      K = 10000,
>      lists:sum([calc_pi_worker({I*K, K}) || I<- lists:seq(0,K-1)]).
>
> if we run it sequentially it gives the correct result, but is fairly
> slow since we generate a lot of data that subsequently becomes
> garbage. calc_pi_worker/1 can be optimized by moving stuff out of the
> main loop and not building the list with lists:seq/2 each time around.
> calc_pi/0 can be optimized by using the plists library
> (https://github.com/eveel/plists). So while we are at it, let us
> parallelize. And let us not create all that boiler plate while doing
> it! Here is the code:
>
> -module(foo).
> -compile(export_all).
>
> calc_pi_worker({Start, N}) ->
>      calc_pi_worker(Start, N, 0).
>
> calc_pi_worker(_I, 0, Acc) ->  Acc;
> calc_pi_worker(I, K, Acc) ->
>      S = (1 - (I rem 2) * 2) / (2 * I + 1),
>      calc_pi_worker(I+1, K-1, Acc + S).
>
> calc_pi() ->
>      K = 10000,
>      4.0 * lists:sum(plists:map(fun foo:calc_pi_worker/1,
>                                 [{I*K, K} || I<- lists:seq(0, K-1)])).
>
> Yes, we don't have to change anything else. This is parallel using as
> many cores as you have. It can be tuned some more, but for a start it
> is magnificent, even though it is slower than the Akka-version.
>

Wow. That's impressive. I'll take a closer look at the plists module, 
too. In a way, that's what I was talking about earlier in this email w/ 
respect to bread-and-butter libs. You pulled plists out of your back 
pocket...what else have you got there?

>> 4b. Side note: is anyone concerned about Akka/Typesafe stealing mindshare?
>
> Well, if you want to write all the boiler plate code they have to
> write, then go ahead. I'd rather not :)
>
> Seriously though, the mindshare we will steal are from Non-concurrent
> languages, be it Python, Ruby, Java, C# or ... -- We are in the same
> boat as Scala/Akka, Go and so on. The influx of interested programmers
> will be large, so we don't have to worry too much about who steals
> from whom. Erlang has the distinct advantage of being old, tried,
> battle-hardened and extremely robust. It has a main focus on fault
> tolerance, which gives it some unique capabilities. Also, its focus on
> using functional programming is a robustness advantage. You can't just
> copy part of Erlang to obtain what it provides. You have to have all
> of it and then some.
>
>> 5. How does one push an app such that it self instantiates it's processes
>> across the cluster? I can see how OTP is great at managing an app on a
>> single node, but how do you say something like: "create one of these
>> processes on each node in the cluster, and restart 1-for-1 if they die"...
>> or something similar. I see mention of gproc, but honestly, I don't see how
>> to use it. Likewise, if nodes are added to the cluster, how would you ensure
>> that the necessary processes are pushed to the new node after it joins the
>> cluster?
>
> Essentially this is handled by the application if it is written to
> adhere to some rules of Erlang and if you write application such that
> it does not assume all of it is present locally on a single VM, but is
> distributed across multiple. There is no automatic solution here
> either. For each application you will need to define what to do. For
> some applications a 'takeover' is enough. There is one running and if
> the node it runs on crashes, then another node will take over the job
> and arrange that requests are now redirected to the new node. For
> other systems, like Riak, all nodes are simultaneously runing and the
> application on each node talks to the same application on other nodes
> to internally manage state.
>
> Yes, it is a hard problem. But Erlang provides you with the tools to solve it.
>

I have to admit, this is something that really intrigues me... creating 
an app that can move around in a cluster as nodes go up or down...as 
well as responding to load and spinning up new processes on nodes to 
handle the load. One thought I've had is to use erlang to manage os 
processes to spin up new erlang nodes on remote machines. Or to have 
erlang spin up an external resource, like a database or message queue 
and link to it.

>> 6. How do you deploy and live code upgrade in real life? I've been looking
>> at some of the work by the 'Dukes of Erl' ... is erlrc what folks commononly
>> use?
>
> I don't. The projects I am working on has the virtue that we can do
> rolling upgrades by closing down machines and restart them. You will
> have to ask someone else :)

How do you insure that, as you roll a cluster from v1 to v2, the new v2 
nodes don't corrupt the data that v1 is using?

On all the large projects I've been involved with 
(java,jdbc,mysql,etc.), there are typically service level changes that 
are tightly coupled to sql ddl changes... thereby requiring downing the 
entire cluster, applying the ddl deltas to the dbs, and subsequently 
restarting the service instances.

So, what is it about your strategy that allows you to do rolling upgrades?

>
>> 7. Does anyone use dynamic load balancing of demand across a cluster (e.g.
>> spinning up erlang processes to meet the demand curve?)
>
> I am sure there are people who does this. But I'll let them answer the
> question. It is not that hard to pull off.
>
>> 8. What's the best way to integrate w/ other code bases. In akka, you'd use
>> camel as an integration bus. What are the common ways to integrate with
>> erlang? Is that what ports and nifs are for? Forgive my ignorance, but I
>> always considered those as simply ways to write code in a different, perhaps
>> more comfortable language...not as integration mechanisms.
>
> Integration is perhaps Erlangs strength. NIFs are for writing small
> hotspots in your code in C for speed. The calc_pi_worker/1 from above
> comes to mind for instance.  Ports are used for several things. They
> are a representation in the Erlang VM of something external. It can be
> another process, where we have a pipe for communication.

> It can be a file on disk.

Is inotify the standard way to monitor file changes on disk from erlang?

http://www.trapexit.org/forum/viewtopic.php?p=44414

It can be a network socket. Or it can be a linked in
> driver. Another option is to make a node in another language that
> talks the Erlang distribution protocol. Yet another option is to use a
> message queue like ZeroMQ or AQMP for communication.

ZeroMQ is definitely interesting, provided you can accept message losses.

Finally, you can
> do like my own project and simply implement the foreign protocol in
> Erlang - BitTorrent in my case. Erlang rocks for implementing foreign
> protocols.
>
Ah, that's a great idea. I really hadn't thought of integrating at the 
protocol layer.

>> Also, I've continued to peck away at various newbie tutorials. Any
>> comments/suggestions/corrections are welcome.
>
> You never go wrong with Fred Herbert:
>
> http://learnyousomeerlang.com/ - witty, informative, awesome -- even
> if the octopus has too few tentacles.
>
>

Yeah, that's a great site. I've also read all three books, several times 
actually. For me, erlang has been quite a different learning curve from 
other languages.

Thanks for your detailed responses. This has been really informative, 
especially the calc_pi() example.

-Todd