[erlang-questions] What are the cons and pros of using Erlang rather than java to develop server backend?

Fri May 22 22:19:08 CEST 2009

Thinking out loud ...

Let's solve the most difficult problem first - what is the most difficult
problem? Let's dimension for 10^6 users/unit.
then figure out how to scale the basic unit.

I'll start by trying to do everything (for 1M users) on one machine

1 M Users with 1 Meg data each = 1TB Data - this is far larger than any cache
so must be stored on disk.

How many visits/day? assume 1. Then we must serve 1,000,000/86400
requests/sec (=11/sec) so you have 100 ms CPU time /transaction to do
everything - at 10 visits per day we must serve 110 requests/sec and
so
have 9ms to do everything.

Assume sata disks these have a seek time of 9ms so you do 110 reads/sec.

So somewhere between 1 visits/day and 10 visits/day you will run into
either CPU problems or disk I/O problems (it's important to know which)

However you look at things I suspect the key issue will be scalability.
At a guess things will go well at low load - but not scale at high load.

Let's suppose you can do 5 request/sec at 5% cpu - great you say - then
I can do 100 request/sec at 100% cpu (say 80 sec at 80% CPU to be safe)

Scaling over multicores will be critical for performance. If you decide
to use SSDs instead of spinning disks you will find you *must* parelllise
the I/O to get decent performance.

It gets worse ... if you can't make you basic unit of 1M users run on one chip
you'll have to start writing a distributed system. Now you have a few more
tricky choices - are users "sticky" meaning they always arrive at the same
front-end machine or can they arrive at any machine. The latter implies
that RPCs between machines in the backend must be very fast.

Can you avoid an expensive load balancer in the front-end, or will you
use a DHT in the backend? What happens if you do use a single
point of entry to the system via a load-balancer and it cannot handle the load?

Do the clients handle redirects? If you use HTTP you issue redirects
and avoid a load balancer - but (and a big bug) you need co-ordination in
the back-end.

Making  systems is "easy" (And I say this with the deepest respect for
the difficulties involved, so don't misunderstand me) but making them
seamlessly scalable is not.

So what has this got to do with choice of programming languages? - a lot
actually.

Given that you can design algorithms that are scalable fault-tolerant
distributed etc (and you must do this) how easy is it to implement
these algorithms?

For performance reasons I would try to do "as much as possible on one machine"
and have a two level architecture that treats the architecture of what
happens between
machines as fundamentally different ot what happens on one machine. The
reason for this is that the latency and failure characteristics of what happens
internally on one machine is fundamentally different to what happens *between*
machines.

 For maximum performance  on one machine we're talking a top-end Intel box
with a Nehalem processor and 8-16 cores. How well erlang (or java) scales
on 8(16) core machines is critical.

Erlang score pretty well here - you can choose your architectural mapping
use SMP erlang and view all 8(16) cores as one unit - or make a 8(16) node
distributed Erlang cluster - you choose. Tweaking the architecture
will have a large
impact on performance.

Tbe questions you should ask of Erlang (or Java) are

    - how well should my application scale on a 8(16) core processor
   - how easy is it to tweak the architecture to take advantage of multi-core
     developments (given that things are changing rapidly)

In Erlang running on multi-cores is no big deal - tweaking
architectures is easy - writing
concurrent program is (relatively) easy (that's what Erlang was
designed for so it *should* be easy)

In Java you have lack of thread safety - I have no ideas how well Java runs on
SMP machines nor if running multiple JVM is easy. Try googling "java+smp" or
"java + thread safety" for more information

At a guess careful design of the disk data structures used for user data
and fast parsing of protocols is the area that is performance critical
(it always is)

Cheers

/Joe

On Fri, May 22, 2009 at 6:23 PM, G.S. <corticalcomputer@REDACTED> wrote:
> Hello Joe,
>
> Here are the answers:
> - how many users are simultaneously connected?
> unknown, expected 20+million total users within 5 years. Thus most likely
> we're talking about a million or so simultaneous users (of course what is
> expected and what occurs are not usually the same, but that is beside the
> point).
> - how long is a session?
> very short, a few seconds
> - what are the latency requirements?
>  soft, real-time,
> - how much data do you store per user?
> very little, under 1mb, perhaps 1/2 or a 1/4th.
> - what are your  requirements on scalability - fault-tolerance?
>  Required scalability, and fault-tolerance.
> - is it permissible to stop the system for upgrades?
> not desirable, if possible to avoid, then no stops should be done.
> - what protocols are used between the clients and servers?
> currently unknown.
> - do you want hot stand-by fail-over etc
>  Sure, if it's possible.
> - do you want load balancing, quality of service?
> Sure
> - what is your cost/user budget (both fixed and dynamic costs)
> Unknown
> - how long do you want to store user data and how reliably
> Very reliably, and preferably encrypted, hence I was thinking Mnesia, with
> the little snippets of data stored after running through aes.
> - do you want to run on a VM in the cloud or on a physical machine or
> machines
> It was suggested, but I voted against it, this buzz word "the cloud" might
> sound nice, but if you want true reliability, then one should run on his own
> servers, not to mention if you want to have total control of your data, you
> have to own the server. But I'm open to suggestions, do you think cloud is
> more reliable?
> - do you want to integrate with other systems (mysql, soapy things, ... etc)
> Unknown at this point, but my guess is: no
>  if so what ...
>
> Regards,
> -Gene
>
> On Fri, May 22, 2009 at 7:17 AM, Joe Armstrong <erlang@REDACTED> wrote:
>>
>> You have to tell us more about your problem in order to get sensible
>> feedback. So far you have said
>>
>> "develop a server back-end software with a large user base" - that's
>> not much to go on.
>>
>> Languages are good/bad fits to specific problems - I'd like to know
>>
>> - how many users are simultaneously connected?
>> - how long is a session?
>> - what are the latency requirements?
>> - how much data do you store per user?
>> - what are your  requirements on scalability - fault-tolerance?
>> - is it permissible to stop the system for upgrades?
>> - what protocols are used between the clients and servers?
>> - do you want hot stand-by fail-over etc
>> - do you want load balancing, quality of service?
>> - what is your cost/user budget (both fixed and dynamic costs)
>> - how long do you want to store user data and how reliably
>> - do you want to run on a VM in the cloud or on a physical machine or
>> machines
>> - do you want to integrate with other systems (mysql, soapy things, ...
>> etc)
>>  if so what ...
>>
>> The more detailed you are the better recommendations you will get -
>> note that these are all
>> "non functional" requirements - IMNSHO most projects start with a good
>> grasp of the functional
>> requirements (ie what the system should do) but fail miserably when
>> regarding non functional
>> requirements.
>>
>> Cheers
>>
>> /Joe
>>
>>
>>
>>
>>
>> On Fri, May 22, 2009 at 1:49 AM, G.S. <corticalcomputer@REDACTED> wrote:
>> > Hello fellow Erlangers,
>> >
>> > I have to make a decision on whether to develop a server back-end
>> > software
>> > with a large user base (millions) in Erlang rather than Java. My
>> > personal
>> > choice is to develop the software in Erlang, but I'm wondering whether
>> > any
>> > of you could come up with good reasons why Java should be used instead,
>> > thus
>> > far my reasoning is as follows:
>> >
>> > Erlang Pros as apposed to Java:
>> > Erlang is highly scalable,
>> > The code is much shorter and therefore easier to maintain,
>> > The software would be with a lot less bugs, and be much more robust,
>> > Erlang provides a high throughput.
>> > Prototyping is faster, and in general serverside, Erlang has been much
>> > more
>> > capable in my previous projects.
>> > With Erlang I can use Mnesia, which in it self is much more robust, and
>> > scalable rather than for example SQL...
>> >
>> > Cons:
>> > Less number of developers than Java (but I think the Erlangers are
>> > usually
>> > much more skilled, and it would be easy to find coders by for example
>> > posting an add on this mailing list).
>> > Security (but Erlang is also very secure I think, there are high profile
>> > websites that deal with banking/money written in Erlang, exp: Kreditor)
>> >
>> > -End,
>> > ps, anyone ever had more problems interacting with APIs using Erlang as
>> > opposed to Java?
>> >
>> > I appreciate any responses and contributions to the Cons/Pros list,
>> > -Gene
>> >
>> > _______________________________________________
>> > erlang-questions mailing list
>> > erlang-questions@REDACTED
>> > http://www.erlang.org/mailman/listinfo/erlang-questions
>> >
>
>