[erlang-questions] Erlang newbie questions

Tue Oct 25 02:37:51 CEST 2011

Hi Garrett,

First of all thanks :-) I've edited our posting a bit to try to keep
the size down, please excuse me if I've cut too much.

Now on to the issues:

On 24 October 2011 13:57, Garrett Smith <g@REDACTED> wrote:

[snip]

>>> I use run_erl to run Erlang. This doesn't plug into typical watchdog
>>> apps, which look for pids, but it provides a way to communicate with
>>> Erlang (using to_erl) without using distributed Erlang. I've used this
>>> technique extensively and find it robust.
>>
>> I've also used run_erl and found it less than robust. I'd like to
>> highlight the following issues:
>
> I probably should have been more specific, since we have different
> interpretations of "robust".
>
> run_erl does what it claims to do without bugs (that I've
> encountered). It's an effective controller for Erlang processes that
> can obviate the need for pidfiles or distributed Erlang.

Yes it does work without bugs but I disagree that it's a replacement
for PID files or that it's robust in a sense that it can be used to
build robust production systems (on a Unix box).

An important reason to have a PID file is that it gives the system
administrators a way to control programs from outside them. If you're
using run_erl and to_erl and your emulator gets into a bad state such
that the Erlang shell can't respond (which I have had happen) then
to_erl is no use.

That's fine if you have a single emulator because you can use killall
or whatever to kill all beam.smp processes but what if you have more
than one? How do you tell which is which?

If there was PID file support /in the emulator itself/ and the PID
file name or path could be specified on the command line then we
wouldn't have this problem at all. You could write a simple init
script with a normal PID file that could integrate fairly simply into
normal init scripts. This is what most Unix services do.

It's certainly possible to build a solution to this: I wasn't saying
it was impossible to build something decent, just that everyone who
uses Erlang seriously on Unix will have to find a way to fix it
themselves.

So I say it's not robust because relying on to_erl in your scripts can
fail if the Erlang shell inside the emulator isn't working (or due to
corruption, see below). Having a PID file is robust because "kill -9"
is an OS operation that does not rely on anything inside the emulator
working.

[snip]

> I'm not sure what your critique here is. Surely you're point isn't
> merely that run_erl uses a directory to manage its files.
>
> Take a look at /var/run on any system you have handy -- you'll see
> that lots of applications do this.
>
> For those interested, it's trivial and safe to run multiple beam
> processes, each using run_erl.

It's not that it uses a directory, it's that run_erl seems to be
designed to share a single directory between several emulators
(because it automatically allocates the pipe files with incrementing
numbers and the official "start" script has hard coded "/tmp" as the
pipe directory) and yet it doesn't clean up the pipe files after a
crash and it doesn't have a way to identify which emulator is attached
to which pipe file.

For example:

$ run_erl /tmp/pipe/ /tmp/log "erl -name foo" & run_erl /tmp/pipe/
/tmp/log "erl -name bar" &

Now I have /tmp/pipe/erlang.pipe.1.[rw] and
/tmp/pipe/erlang.pipe.2.[rw] but which is which?

Actually... when I went to try that example I discovered that if you
leave the trailing slash off the pipe argument to run_erl then it
doesn't do the auto-incrementing thing at all and it does exactly what
I want: the file name you present is used or it fails (although the
error message could be better).

It would be nice if the documentation didn't specifically say that
this wasn't possible:

                    This  is  where  to  put the named pipe, usually /tmp/. It
                    shall be suffixed by a / (slash), i.e.  not  /tmp/epipies,
                    but /tmp/epipes/.

>> * The pipes aren't cleaned up if the emulator crashes or won't start.
>> Even if you run only a single emulator, it may be running on
>> /tmp/erlang.pipe.1 or /tmp/erlang.pipe.2 or whatever, depending on
>> previous crashes.
>
> This another surprising issue, considering how commonplace it is to
> have orphaned pidfiles hanging around after a process crash.
>
> If you need to cleanup after a crash, just delete the files. If you
> need that automated, automate it.

If I need to? Isn't everyone going to want to clean up after a crash?

For an orphaned PID file the service will automatically clean it up
when it's restarted but Erlang doesn't do that. In fact because the
PID file has a single static name it HAS to be cleaned up when the
service is restarted: Erlang's behaviour is to auto-increment and
create new files and not clean up the old one. Why would anyone ever
want this? (Other than on an embedded box, of course.)

I wasn't saying that it's impossible to clean up just that I, and
every other Unix Erlang user, shouldn't have to set this up ourselves.

>> * run_erl provides only a single connection. If a program or user
>> already has the pipe open, other connects will hang or fail. A locked
>> up session will therefore prevent you from establishing another
>> session to clean it up -- or to shut down the emulator.
>
> Not only is this not a problem, it's a benefit over pidfiles and
> process signals, which in themselves don't provide any control over
> concurrent access.
>
> And if your session is locked and you need to force your way into it,
> use the -F option of to_erl.

Well that's true about PID files, and I didn't know about -F...
possibly because there isn't any documentation for to_erl (or is
there? Neither I or google seemed to be able to find it.)
-F seems to make it better but because of my next point, it's still
not really any good: forcing another session out is almost certainly
going to leave some garbage in the stream so you won't be able to use
the session you've got.

>> * run_erl's single session is like a shared terminal: it has a shared
>> input buffer and the state isn't reset or cleared between connections.
>> If you set up a script or program to use it to control an emulator,
>> what happens if someone's session (or a bug in a program) leaves an
>> unfinished command or term in the buffer? (What happens is that all
>> subsequent commands fail, that's what!)
>
> You just said this: "You know what happens when I send garbage to a
> file? I corrupt that file, that's what!"

True but that's not all of it. As you said it's fine that sending
garbage doesn't work but the problem is that the garbage stays in the
connection until the emulator is killed.

Here's what I mean:

$ run_erl -daemon /tmp/pipe/ /tmp/log "erl"
$ to_erl /tmp/pipe/
Attaching to /tmp/pipe/erlang.pipe.2 (^D to exit)

1> io:fwrite("

Now at this point I use another terminal to do this (imagine this is
an init script trying to restart the service):

$ to_erl -F /tmp/pipe/

Another to_erl process already attached to pipe /tmp/pipe/erlang.pipe.2.
But we proceed anyway by force (-F).
Attaching to /tmp/pipe/erlang.pipe.2 (^D to exit)

run_erl has different version, using common protocol level 0
nit:stop().
1>

(BTW: I actually typed "init:stop()." but the "i" character was sent
to the old connection (a bug?).)

The system did not stop, and init:stop() wasn't executed... The Erlang
shell has seen something like this:

io:fwrite("init:stop().

Good luck getting that mess untangled... especially when it's a script
that needs to do it and it has no knowledge of what characters might
be in the stream.

I agree that in trivial cases with a single VM used by a single user,
to_erl is fine (and there are ways to work around this whole problem
using external C-nodes or "-remsh" type access) but again my point
wasn't that it's unusable it was: why does everyone have to handle it
themselves?

>> * Be careful using run_erl without specifying which pipe to use. It's
>> default is to try each in turn until it gets a connect.
>
> I assume you mean to_erl -- run_erl requires a pipe dir.
>
> to_erl looks in /tmp by default -- I haven't seen it hunt for
> different locations.
>
> In any case, it's a good idea to create a directory explicitly for
> these files and specify it when using to_erl.

Agreed. The only hunting I meant was along the list of erlang.pipe.1,
erlang.pipe.2 until it finds one with a working emulator attached.

[snip]

> Use run_erl to start the VM, as per the docs:
>
> http://www.erlang.org/doc/man/run_erl.html
>
> And use to_erl to connect to that process. You'll need to specify the
> pipe dir so to_erl knows what you want to talk to.

Using run_erl doesn't make it much better if you're a newbie. If you
get it right it works but if you don't...

$ run_erl -daemon /tmp / erl
$ echo $?
0

Same problem: Exit code is success, no logs are written, no output is
written. Erlang ain't running.

[snip]

> Also, don't kill -9 processes that you want to keep running.

Heh of course not... unless they REALLY misbehave, in which case you
must have -9 available so that you can restart it no matter what.

[snip]

These problems, if other people agree that they are problems, seem
easy to fix (I'll do some patching myself if time permits) and fixing
them would really ease adoption of Erlang in Unix environments: that's
why I'm so keen to discuss and hopefully fix them! I don't want to see
people turned off Erlang by these external issues when once you've got
the emulator going it's so good!

Peace,
Sam.