[erlang-questions] node.js vs erlang

Wed Jun 18 09:39:00 CEST 2014

(apologies for the length of this email .. i've been poking at it for a couple 
days since Joe's first email and finally combined the ramblings into something 
resembling a reply. it's a set of ponderings / reflections of someone new to 
erlang who has written non-trivial apps with node.js ... which seems to be 
pretty much exactly where Joe is trying to explore :)

On Tuesday, June 17, 2014 14:41:28 Joe Armstrong wrote:
> The node books say something like "make sure that callbacks
> run-to-completion quickly"
> I would say that complying with this advice is extremely difficult. It's
> very difficult to guess
> which computations will take a long time, and even if you know it's
> difficult to break them into
> small re-entrant chunks.

... which is generally why CPU-bound tasks are to be avoided in the 
node.js world. conventional wisdom (be that as it may) is that node is 
appropriate for I/O bound processes which operate on streams. anything else 
and one ends spawning processes and waiting for response (which makes it async 
in the node.js runtime again, but that is not particularly lightweight ...)

as such, your choice of benchmarks (fib) is not really going to produce 
representative benchmarks of node.js as used in the wild. (by sane people 
anyways :)

an i/o bound process is going to show a more typical node.js performance 
profile e.g.:

* call into a database (sql or key/value, whatever) and retrieve N rows from a 
large dataset based on the client query (i.e. an offset and limit), do some 
light post-processing of the retrieved data (if any) and return the results as 
a json response (typical of the ajax-y style apps node.js is often pressed 
into...)

* stream a file back to a client on request, reading from the local file and 
writing to the socket both happening in chunked async requests (made easy with 
the built-in file module)

in those workloads, or so the theory goes, much of the wall clock time is 
spent waiting for response from the data source (i.e. the database in the 
above example) and related i/o to the client. using that time to run other 
application code is a good way to handle concurrent requests at greater volume 
/ lower latency. and it does work reasonably well, if not presenting an 
optimal solution. (nothing in that paragraph is rocket science, of course :)

(p.s. benchmarking is hard, so props for taking a run at this ..)

> Node is very popular so I'm trying to understand what it is about node that
> is attractive.
> npm for example, looks well engineered and something we could learn from.

for many node.js projects, including ones i've worked on, this has been a HUGE 
part of the reason to go with it. the monstrous number of easily integrated 
3rd party modules is a significant asset, and helps make up for many of the 
annoyances.

the path to node.js seems to go something like:

* developer hears the hype about node.js (scalable, easy, ..)
* figuring how they already "know" javascript, developer dips a toe in the 
node.js water and finds that it is easy to do simple things (e.g. a toy project 
to learn the framework with)
* developer discovers npm and is hooked ("I don't have to write most of my app 
myself!")

that progression from "hear the hype" through to finding the killer feature of 
node (npm and the wide world of node.js modules) is quick and rewarding (via 
instant success).

perhaps that is sth that erlang can learn from :)

> The node
> web servers are easy to setup and run, so I'd like to see if we could make

express is indeed pretty damn fine, but some of the other modules such as 
async[1][2] are hugely critical by encapsulating a lot of the typical patterns 
that would otherwise be repetitive drudgework (mostly due to the async nature 
of node.js). without these sorts of modules, using node.js would be a lot less 
attractive (which is why they got written, obviously -> itch scratched!).

so when looking at what makes it easy to get started, the web server modules 
(e.g. express) are only part of the story.

(and of course, being able to easily share and include modules is a key part 
of why those modules exist in the first place ...)

comparing with cowboy, the differences are glaring. for instance, in the 
"getting started" guide for cowboy:

* we live in the microwave popcorn and 10-minutes-is-a-long-video-on-youtube 
age. yet the first FOUR sections are not about cowboy at all, but talking about 
the modern web and how to learn erlang. as someone moderately familiar with 
the web, i don't care about this. *just let me get started already!* if i'm 
reading the getting started guide for cowboy, i probably don't need to be sold 
on either the modern web OR erlang. 

* being a good, modern developer with the attention span of the average 
backyard squirrel i simply skipped straight to the "Getting Started" section. 
the FIRST sentence is this:

"Setting up a working Erlang application is a little more complex than for 
most other languages. The reason is that Erlang is designed to build systems 
and not just simple applications."

... aaaaaaaand cowboy just lost me as a user. i don't WANT complex[1], and my 
application IS simple. so cowboy is not for me! right?

* the rest of the "getting started" walks me through doing a ton of 
boilerplate stuff. it's nice to know how things work, but i'm a busy web dev 
(have i mentioned my lack of attention span yet? oh look, a peanut! and it's 
an event driven async peanut! yum! *runs off*). everything in that section 
ought to be boiled down to "run this one simple command and everything is done 
for you. click here to read about the gory details." and doing so should give 
me a fully functional application template that i can immediately start. that 
one command should probably take a simple config file with things like the app 
name and other variable details (such as which erlang apps to include in my 
awesome new project, including but also in addition to cowboy). basically, an 
npm-for-erlang.

oh, and bonus points if there is a file created just for route definitions which 
would then be automatically included by the foo_app.erl to be passed to 
cowboy_router:compile. having a "well known" place to define routes will 
standardize cowboy using applications and allow the starting dev to focus on 
what they care about (routes and handlers) while ignoring the details like the 
app module. yes, yes, eventually they'll likely want to dig into that as well, 
but not at the beginning. (this is an area that cowboy+erlang could be even 
better than express+node.js)

* i couldn't find the bragging section of the docs. ;) more seriously, the 
getting started guide tries to sell me on the modern web and erlang's place in 
it, but how about a fun little one-pager that backs up the claims made in the 
main README: "Cowboy is a small, fast and modular HTTP server written in 
Erlang." and " It is optimized for low latency and low memory usage". show me 
the money^Hmeasurements! a simple set of charts showing how many simultaneous 
connections can be handled and what kind of latencies app the developers 
achieve on regular ol' hardware, along with a LOC-you-need-to-write-for-a-
barebones-app count would help convince people and would be the thing that 
would get passed around on stackoverflow, g+, twitter, etc. when justifying / 
recommending cowboy.

[1] really, i personally don't care; but i'm channeling the spirit of the 
average web dev here :)

> Performance is not a primary concern - easy of programming and correctness
> are.

this is a fundamental weakness with node.js imho. it is definitely easy to get 
*started* with a node.js based project, but keeping it sane takes discipline. 
which means most applications likely become horrible little monsters given 
enough time. i am convinced that it is unpossible (similar to, but not the 
same, as impossible ;) to write and maintain a large, reliable node.js 
application without *extensive* unit tests which are run continuously during 
development (not just integration). 

as the order of blocks in the source code does not correlate with the actual 
execution order, and that execution order will vary at runtime due to 
fluctuations in the surrounding aether (read: database and network latency), it 
is very difficult to write code that can be easily debugged simply by reading it 
or naively instrumenting it with debug statements. (yes, this is not how to 
maintain a code base, but it's how many devs actually work.) 

the answer is LOTS of unit tests, including for cases one normally wouldn't 
test for in other languages / frameworks (unless being paid by the test unit 
;). it was also uncomfortably common (i.e. more times than zero) to see 
tests fail due to the peculiarities of node.js messing with the test 
framework, resulting in false negatives that often take a fair amount of time 
to identify and then instrument around. yep: a testing run may not behave 
sufficiently similar to the production run, which kind of erodes the value of 
having tests ....

thankfully node has lovely tools like mocha and istanbul to drive unit 
tests and generate coverage reports relatively quickly, making the may-as-
well-be-enforced TDD manageable.

combine this with a language that requires defensive programming (resulting in 
more paths to test, obviously) and the ugliness becomes evident. even though 
it is a scripted environment, getting a node.js app to spew errors everywhere 
or even crash outright is far too easy without extensive defensive 
programming.

so it is pretty much expected that your node.js app will at some point fall 
over and die. which is why there are tools like forever and node modules 
offering integration with systemd so provide supervision. only if node.js had 
such concepts built in, how wonderful would that be? wish i knew of such a 
language ;)

if erlang can learn something from node it might be the value of loudly, 
clearly and enthusiastically telling the world what it is REALLY DAMN GOOD at. 
people will forgive warts and annoyances if they understand the thing they 
*want* about a given framework.

people think 25k connections on a multi-core system is mindblowing; that 
multi-process / threading *must* be hard (enough to avoid if you can, 
anyways); that one simple frameworks are going to be good at i/o or cpu bound 
tasks, but rarely both (to get both you get to use a complex 
language+framework) ... erlang addresses exactly those issues, and as such has 
an amazing story for the needs of modern network services (among other use 
cases). there are some truly amazing things written in erlang (something that 
i don't think enough people are aware of, btw) that would help in emphasizing 
that story ....

i had erlang on my "list of things to investigate" for a while, and when i 
finally got around to it i found it to be a minor revelation in terms of design 
and capabilities. not perfect, but closer to perfect than most of the 
alternatives i've used. i really should not have had to investigate to find 
that out, though: i ought to have been driven to erlang because that set of 
revelations was widely distributed in easy-to-digest form (and without a bunch 
of warnings about how string manip sucks, or how erlang runtimes are complex 
to set up, or ...)

[1] https://github.com/caolan/async 
[2] http://www.sebastianseilund.com/nodejs-async-in-practice

-- 
Aaron J. Seigo
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20140618/26b8d732/attachment.bin>