[erlang-questions] Re: Shared/Hybrid Heap

Mon Oct 18 13:28:45 CEST 2010

  Ulf,

I actually agree with most of what you say, and most of it is not at all 
in disagreement with my previous posts.

I was talking about issues like concurrency, message passing, locks from 
a philosophical point of view, and
I don't agree with the way Erlang is often presented as a way to have 
high concurrency without locks.
I don't agree with the statement that shared memory is bad.

You are talking about the Erlang platform, and its productivity. I like 
Erlang. I find it very easy to write programs, and
have the processes talk to each other, monitor them etc. My 
philosophical rants were not against the language or platform.

>
> With all due respect, your argument is one that I've heard many times
> in different settings, and it misses a key point (which Jachym brought 
> up).
> My experience from working in development of systems with complex
> concurrency patterns is that this argument breaks down completely
> and horribly expensively over time in larger projects.
>

The argument breaks down? You mean that it becomes hard to write and 
maintain code? But I
haven't made claims about that. I find Erlang programs much easier than 
C programs.

> Saying that there are no fundamental differences between Erlang and C
> essentially means that you claim that "correct by design" and "correct by
> convention" are the same thing. By this token, Erlang and Haskell are
> fundamentally the same too, since it really doesn't matter that the 
> Haskell
> compiler can verify many more aspects of a program at compile-time than
> Erlang can - you can simply avoid making type mistakes in Erlang and
> be done with it!
There are no fundamental difference between Erlang and C, when it comes to
the concepts of concurrency, locks, deadlocks, shared memory.
The only qualitative difference is between a sequential program and 
everything else.

There are huge differences in actually implementing things, and the 
syntax, of course.

> Your line of reasoning also suggests that we can put great faith in
> micro-benchmarks, since, if you can make it work and make it really
> fast in a small example in C, you can do it equally well in a larger and
> more complex example.
>
I believe in modularity and simplicity. You need to choose the right 
level to work in.
Erlang is just a layer on top of C. For many purposes, it is better to 
use Erlang than C.
For other purposes, it is better to use electric currents in nand gates. 
Of course, you need higher level apis.
Ulf, I agree totally with you.

> The place where Erlang and C become very different is when you start
> writing code that has "interesting" complexity. Automatic memory
> management may seem like a complete waste, until your program
> becomes sufficiently complex that you can no longer easily keep track
> of where you should lock, allocate and free memory. At this point, it is
> actually very difficult to outperform a good garbage collector with hand-
> written code.
>
I agree.

And you see what the next higher level will be. Automatic process 
collection, when a process can never be reached anymore.
In 20 years, people will have "gigaprocesses", the older folks will 
remember that there was something called "gigabytes",  and the next 
languages will have automatic
process spawning and process collection. Erlang will be a low level 
language. What will such a unit of processes be called? A brain :)

The brain languages will still have deadlocks, shared memory, locks etc, 
because those are philosophical/mathematical entities that are never 
"solved".

> I've seen lots of real-world examples where
> small teams of expert C++ programmers have managed to convince
> management to go with a well-trimmed, elegant execution model in
> C++, only to run in all sorts of problems when lesser programmers were
> to start adding features - huge dips in performance and stability, raving
> memory leaks, and frantic running around and fire fighting for the
> experts, who are the only ones who seem to manage to stick to the
> conventions and not break stuff. The beginning mantra, "how hard can
> it be?", is later answered with "apparently, endlessly hard".
>

Interesting. By the way, are there examples of Erlang teams that have 
switched away from Erlang?

> And we haven't even discussed the problem of one of these processes
> crashing yet. With two concurrent contexts explicitly writing to the same
> block of memory, ensuring the integrity of that memory block if one of
> the contexts throws an exception will be very, very hard. This is where
> Erlang's "share nothing" philosophy really pays off. The VM may decide
> to share data under the hood, but as a programmer, you can be absolutely
> certain that the death of another process - even in the same memory space
> - cannot ruin the integrity of your own data. It really is as if you 
> had your own
> distinct copies of everything.
>
This is where we disagree strongly. I am saying that you get the same 
problems again just at a higher level.
The equivalent situation is this.
P1 is a process, that keeps data, i.e. the so called shared memory.
P2 and P3 sends messages to P1, many each. Remember a put message is 
like a mutation.
If you want to serialize the updates from P2 and P3, a lock is needed. 
P1 for instance could wait for all messages from P2, before receiving 
from P3,
incorporate them, release its lock and accept the messages from P3. What 
should P1 do ,if the messages from P2 stop early. Roll back, ignore it.
The issues are exactly the same as at the lower byte level. Don't you 
see that these are deep principles, not something a language/VM can solve?

Memory space in Erlang is not the same as memory space for C. Everything 
is lifted one level. The equivalent of memory space in Erlang is the set 
of running processes.
The addresses are now Pids not RAM addresses. And if you want something 
interesting to happen, you must have some processes that accept messages 
and act on them.
C address space does not exist for the Erlang programmer as little as 
physical positions of transistors exist for the C program.  A C program 
does not crash if a transistor goes bad if
there is a "virtualization layer" in between that redirects the current 
to a another transistor.

> The problem of crashing processes is a particularly nasty _containment_
> problem, but when you scrutinise many of the concurrency models out
> there you soon find that there are usually many issues that are not well
> contained. Encapsulation and containment are best handled in the
> programming model, rather than relying on convention, and this is
> exactly the area where there is a fundamental difference.
>
Ulf, I agree that one should have higher level languages that 
encapsulate things, and maybe JAVA or C++ or whatever you are thinking 
about are too low level
for big projects. Fine with me.

On practical terms, I feel that Erlang some times gives up performance 
for unnecessary reasons. I am especially thinking about mutability.
There is no reason to give up mutation within a process which is already 
a serializer.

code like this

loop(Dict)
     receive
     {put, K, V} ->
         Dict2 = dict:store(K, V, Dict),
         loop(Dict2)
     end.

is just a performance wise inefficient way of doing

loop(Mut) ->
     receive
     {put, K, V} ->
         put(K,V, Mut),
         loop(Mut)
     end.

where Mut is a mutable data structure.
In the dict exampe, there is a unncecessary work and garbage collection 
for the functional data structure Dict.
And you gain nothing, since it is sequential code.

ETS gives you an unnecessary copy. The process dictionary does the job, 
but there is only one of them and it must be a hash table. It would be 
better with
PD1 = pd:new_hash_table(),
PD2 = pd:new_tree(),
etc.

> E.F. Codd, who wrote the twelve (13, really) rules for relational 
> databases, had
> this rule as No 12:
>
> Rule 12: The nonsubversion rule:
> *If the system provides a low-level (record-at-a-time) interface, then 
> that interface *
> *cannot be used to subvert the system, for example, bypassing a 
> relational security *
> *or integrity constraint.*
> (http://en.wikipedia.org/wiki/Codd's_12_rules 
> <http://en.wikipedia.org/wiki/Codd%27s_12_rules>)
>
> The importance of this rule is often underestimated, and it is quite 
> common to
> ditch it early in favour of performance, since the consequences of 
> subverting
> the consistency model are not readily apparent until much later, if 
> even then.
>

Interesting. I am not convinced. So basically, the programmer should be 
constrained, because he/she will make mistakes?
I guess it is a question of finding the right balance. So, you think 
dirty_write was a mistake for mnesia?

cheers,

Morten.

> BR,
> Ulf W
>
> On 18 Oct 2010, at 09:31, Morten Krogh wrote:
>
>> Hi Jachym
>>
>> There is one thing that I probably haven't made clear enough, and it 
>> will answer all points basically.
>>
>> Erlang has the loop receive loop process that serializes all access 
>> to the data.
>> The equivalent in C is a single byte. The semantics of communicating 
>> with a single byre is well defined in C.
>> A C thread communicates with the byte using message passing through 
>> the memory bus. The byte only allows get and put.
>> Nothing else makes sense.
>>
>> The general picture is that you have some units of atomicity and 
>> beyond that you need to lock, coordinate access etc.
>>
>> Erlang gives you a higher control of the granularity than C, but 
>> nothing is fundamentally different. I just object to all the Erlang 
>> statements about locks, concurrency etc.
>>
>> If you want to get rid of locks and shared memory, you lose 
>> concurrency. You can't have both. You can serialize everything. In C, 
>> that would be one thread. In Erlang,
>> one process. If you want concurrency, you need smaller blocks of 
>> data, which then needs to be coordinated.
>>
>> You can imagine an axis
>>
>> concurrency ~ locks, deadlocks etc  ~ high performance   in one end.
>>
>> serialized access ~ no locks ~ single threaded  ~ simple programing 
>> model    in the other end.
>>
>>
>> C actually beats Erlang in both ends of the axis (not the programming 
>> model, of course, Erlang syntax is always simpler).
>>
>> Erlang's sweet spot is in the middle. It is so easy to package an 
>> amount of data into a serialized agent with controller logic.
>>
>> The most concurrent program is not an erlang program, that would be a 
>> C program (or even lower) with a loop on each core, where all 
>> iterations of all loops updated the memory in all kinds of ways. Each 
>> iteration of each loop is a "context switch".
>>
>> Erlang is of course very convenient. You get a whole "block of bytes" 
>> with serialized access, and a programming language to control the 
>> "gate" to the block of bytes.
>>
>> The hardware manufacturers could make something like Erlang, if they 
>> wanted to. Chop the memory into blocks, and give each block 
>> serialized access and a small programmable controller sitting at the 
>> gate of that block. They would lose performance and concurrency if 
>> they did.
>>
>> It is all the same. The only question is at what granularity you 
>> serialize your data.
>>
>> Jachym, I will not go through your points. You can imagine my 
>> response. It is basically :
>>
>> One byte in C is the equivalent of one Erlang process. One byte 
>> doesn't need any controller logic besides get and put, and a 
>> guaranteed atomicity of get and put.
>>
>> Cheers,
>>
>> Morten
>>
>>
>>
>> On 10/17/10 11:46 PM, Jachym Holecek wrote:
>>> # Morten Krogh 2010-10-17:
>>>> The problem is of course that a process like this
>>>>
>>>> loop(State) ->
>>>>     receive
>>>>     {From, get, X} ->
>>>>          From ! {result, get(X, State)},
>>>>         loop(State)
>>>>
>>>>     etc.....
>>>>
>>>> Behaves exactly like shared memory seen from other processes point 
>>>> of view.
>>>>
>>>> Writing
>>>>
>>>> Pid ! {self(), get, 27}
>>>> receive
>>>>     {result, Result} ->
>>>>         Result
>>>> end.
>>>>
>>>> is like writing
>>>>
>>>> get(X, Pid) in C (the language) where Pid is pointer to a shared data
>>>> structure in C.
>>> "Apples and oranges" as they say. In your Erlang example:
>>>
>>>   * Server explicitely agrees to provide some of its data.
>>>   * Server is free to change this decision.
>>>   * Client has to follow certain protocol to access this data.
>>>   * All accesses to "shared" data are serialized.
>>>   * It is "physically impossible" to obtain inconsistent snaphost of 
>>> the data.
>>>   * It is "physically impossible" to bypass server's control.
>>>   * This is guaranteed at language level.
>>>   * Your example is implemented completely within this well-defined 
>>> semantics.
>>>
>>> Contrast that with C:
>>>
>>>   * All threads have equal access right to whole process address space.
>>>   * Sychronization can be done, but is a mere convention 
>>> (spinlocks/mutexes).
>>>   * Data consistency can be done, but is a mere convetion (memory 
>>> barriers).
>>>   * At language level memory access semantics is more or less undefined.
>>>   * Mutexes/memory barriers have to be done in assembly, ie. outside C.
>>>
>>> A little more fun on the C side:
>>>
>>>   * Compiler may reorder loads/stores (somewhat) unless you're careful.
>>>   * CPU may reorder loads/stores (somewhat) unless you're careful.
>>>   * Cache may do whatever it wants to (somewhat) unless you're careful.
>>>
>>> Normally these thing will be taken care of by whatever threads 
>>> implementation
>>> you use (in architecture- and compiler-specific way), but if we're 
>>> talking at
>>> language level then there's no way (as far as I know) to achieve 
>>> sane shared
>>> memory semantics in C alone (if we admit the existence of concurrent 
>>> threads
>>> of execution which the language knows nothing about, of course).
>>>
>>> But sure -- you can _model_ Erlang-like processes in C, and you can 
>>> _model_
>>> C-like shared memory in Erlang (including all of the quirks above if you
>>> really wanted to). But at that point you're imposing correspondence 
>>> of some
>>> sort by brute force; as opposed to capturing inherent properties.
>>>
>>> Just my two pence...
>>>
>>> Regards,
>>> -- Jachym
>>>
>>> ________________________________________________________________
>>> erlang-questions (at) erlang.org <http://erlang.org> mailing list.
>>> See http://www.erlang.org/faq.html
>>> To unsubscribe; mailto:erlang-questions-unsubscribe@REDACTED
>>>
>>
>>
>> ________________________________________________________________
>> erlang-questions (at) erlang.org <http://erlang.org> mailing list.
>> See http://www.erlang.org/faq.html
>> To unsubscribe; mailto:erlang-questions-unsubscribe@REDACTED
>>
>
> Ulf Wiger, CTO, Erlang Solutions, Ltd.
> http://erlang-solutions.com
>
>
>