[erlang-questions] node.js vs erlang
Aaron J. Seigo
aseigo@REDACTED
Wed Jun 18 09:39:00 CEST 2014
(apologies for the length of this email .. i've been poking at it for a couple
days since Joe's first email and finally combined the ramblings into something
resembling a reply. it's a set of ponderings / reflections of someone new to
erlang who has written non-trivial apps with node.js ... which seems to be
pretty much exactly where Joe is trying to explore :)
On Tuesday, June 17, 2014 14:41:28 Joe Armstrong wrote:
> The node books say something like "make sure that callbacks
> run-to-completion quickly"
> I would say that complying with this advice is extremely difficult. It's
> very difficult to guess
> which computations will take a long time, and even if you know it's
> difficult to break them into
> small re-entrant chunks.
... which is generally why CPU-bound tasks are to be avoided in the
node.js world. conventional wisdom (be that as it may) is that node is
appropriate for I/O bound processes which operate on streams. anything else
and one ends spawning processes and waiting for response (which makes it async
in the node.js runtime again, but that is not particularly lightweight ...)
as such, your choice of benchmarks (fib) is not really going to produce
representative benchmarks of node.js as used in the wild. (by sane people
anyways :)
an i/o bound process is going to show a more typical node.js performance
profile e.g.:
* call into a database (sql or key/value, whatever) and retrieve N rows from a
large dataset based on the client query (i.e. an offset and limit), do some
light post-processing of the retrieved data (if any) and return the results as
a json response (typical of the ajax-y style apps node.js is often pressed
into...)
* stream a file back to a client on request, reading from the local file and
writing to the socket both happening in chunked async requests (made easy with
the built-in file module)
in those workloads, or so the theory goes, much of the wall clock time is
spent waiting for response from the data source (i.e. the database in the
above example) and related i/o to the client. using that time to run other
application code is a good way to handle concurrent requests at greater volume
/ lower latency. and it does work reasonably well, if not presenting an
optimal solution. (nothing in that paragraph is rocket science, of course :)
(p.s. benchmarking is hard, so props for taking a run at this ..)
> Node is very popular so I'm trying to understand what it is about node that
> is attractive.
> npm for example, looks well engineered and something we could learn from.
for many node.js projects, including ones i've worked on, this has been a HUGE
part of the reason to go with it. the monstrous number of easily integrated
3rd party modules is a significant asset, and helps make up for many of the
annoyances.
the path to node.js seems to go something like:
* developer hears the hype about node.js (scalable, easy, ..)
* figuring how they already "know" javascript, developer dips a toe in the
node.js water and finds that it is easy to do simple things (e.g. a toy project
to learn the framework with)
* developer discovers npm and is hooked ("I don't have to write most of my app
myself!")
that progression from "hear the hype" through to finding the killer feature of
node (npm and the wide world of node.js modules) is quick and rewarding (via
instant success).
perhaps that is sth that erlang can learn from :)
> The node
> web servers are easy to setup and run, so I'd like to see if we could make
express is indeed pretty damn fine, but some of the other modules such as
async[1][2] are hugely critical by encapsulating a lot of the typical patterns
that would otherwise be repetitive drudgework (mostly due to the async nature
of node.js). without these sorts of modules, using node.js would be a lot less
attractive (which is why they got written, obviously -> itch scratched!).
so when looking at what makes it easy to get started, the web server modules
(e.g. express) are only part of the story.
(and of course, being able to easily share and include modules is a key part
of why those modules exist in the first place ...)
comparing with cowboy, the differences are glaring. for instance, in the
"getting started" guide for cowboy:
* we live in the microwave popcorn and 10-minutes-is-a-long-video-on-youtube
age. yet the first FOUR sections are not about cowboy at all, but talking about
the modern web and how to learn erlang. as someone moderately familiar with
the web, i don't care about this. *just let me get started already!* if i'm
reading the getting started guide for cowboy, i probably don't need to be sold
on either the modern web OR erlang.
* being a good, modern developer with the attention span of the average
backyard squirrel i simply skipped straight to the "Getting Started" section.
the FIRST sentence is this:
"Setting up a working Erlang application is a little more complex than for
most other languages. The reason is that Erlang is designed to build systems
and not just simple applications."
... aaaaaaaand cowboy just lost me as a user. i don't WANT complex[1], and my
application IS simple. so cowboy is not for me! right?
* the rest of the "getting started" walks me through doing a ton of
boilerplate stuff. it's nice to know how things work, but i'm a busy web dev
(have i mentioned my lack of attention span yet? oh look, a peanut! and it's
an event driven async peanut! yum! *runs off*). everything in that section
ought to be boiled down to "run this one simple command and everything is done
for you. click here to read about the gory details." and doing so should give
me a fully functional application template that i can immediately start. that
one command should probably take a simple config file with things like the app
name and other variable details (such as which erlang apps to include in my
awesome new project, including but also in addition to cowboy). basically, an
npm-for-erlang.
oh, and bonus points if there is a file created just for route definitions which
would then be automatically included by the foo_app.erl to be passed to
cowboy_router:compile. having a "well known" place to define routes will
standardize cowboy using applications and allow the starting dev to focus on
what they care about (routes and handlers) while ignoring the details like the
app module. yes, yes, eventually they'll likely want to dig into that as well,
but not at the beginning. (this is an area that cowboy+erlang could be even
better than express+node.js)
* i couldn't find the bragging section of the docs. ;) more seriously, the
getting started guide tries to sell me on the modern web and erlang's place in
it, but how about a fun little one-pager that backs up the claims made in the
main README: "Cowboy is a small, fast and modular HTTP server written in
Erlang." and " It is optimized for low latency and low memory usage". show me
the money^Hmeasurements! a simple set of charts showing how many simultaneous
connections can be handled and what kind of latencies app the developers
achieve on regular ol' hardware, along with a LOC-you-need-to-write-for-a-
barebones-app count would help convince people and would be the thing that
would get passed around on stackoverflow, g+, twitter, etc. when justifying /
recommending cowboy.
[1] really, i personally don't care; but i'm channeling the spirit of the
average web dev here :)
> Performance is not a primary concern - easy of programming and correctness
> are.
this is a fundamental weakness with node.js imho. it is definitely easy to get
*started* with a node.js based project, but keeping it sane takes discipline.
which means most applications likely become horrible little monsters given
enough time. i am convinced that it is unpossible (similar to, but not the
same, as impossible ;) to write and maintain a large, reliable node.js
application without *extensive* unit tests which are run continuously during
development (not just integration).
as the order of blocks in the source code does not correlate with the actual
execution order, and that execution order will vary at runtime due to
fluctuations in the surrounding aether (read: database and network latency), it
is very difficult to write code that can be easily debugged simply by reading it
or naively instrumenting it with debug statements. (yes, this is not how to
maintain a code base, but it's how many devs actually work.)
the answer is LOTS of unit tests, including for cases one normally wouldn't
test for in other languages / frameworks (unless being paid by the test unit
;). it was also uncomfortably common (i.e. more times than zero) to see
tests fail due to the peculiarities of node.js messing with the test
framework, resulting in false negatives that often take a fair amount of time
to identify and then instrument around. yep: a testing run may not behave
sufficiently similar to the production run, which kind of erodes the value of
having tests ....
thankfully node has lovely tools like mocha and istanbul to drive unit
tests and generate coverage reports relatively quickly, making the may-as-
well-be-enforced TDD manageable.
combine this with a language that requires defensive programming (resulting in
more paths to test, obviously) and the ugliness becomes evident. even though
it is a scripted environment, getting a node.js app to spew errors everywhere
or even crash outright is far too easy without extensive defensive
programming.
so it is pretty much expected that your node.js app will at some point fall
over and die. which is why there are tools like forever and node modules
offering integration with systemd so provide supervision. only if node.js had
such concepts built in, how wonderful would that be? wish i knew of such a
language ;)
if erlang can learn something from node it might be the value of loudly,
clearly and enthusiastically telling the world what it is REALLY DAMN GOOD at.
people will forgive warts and annoyances if they understand the thing they
*want* about a given framework.
people think 25k connections on a multi-core system is mindblowing; that
multi-process / threading *must* be hard (enough to avoid if you can,
anyways); that one simple frameworks are going to be good at i/o or cpu bound
tasks, but rarely both (to get both you get to use a complex
language+framework) ... erlang addresses exactly those issues, and as such has
an amazing story for the needs of modern network services (among other use
cases). there are some truly amazing things written in erlang (something that
i don't think enough people are aware of, btw) that would help in emphasizing
that story ....
i had erlang on my "list of things to investigate" for a while, and when i
finally got around to it i found it to be a minor revelation in terms of design
and capabilities. not perfect, but closer to perfect than most of the
alternatives i've used. i really should not have had to investigate to find
that out, though: i ought to have been driven to erlang because that set of
revelations was widely distributed in easy-to-digest form (and without a bunch
of warnings about how string manip sucks, or how erlang runtimes are complex
to set up, or ...)
[1] https://github.com/caolan/async
[2] http://www.sebastianseilund.com/nodejs-async-in-practice
--
Aaron J. Seigo
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20140618/26b8d732/attachment.bin>
More information about the erlang-questions
mailing list