[erlang-questions] Frying pan bug

Sat Mar 3 16:54:50 CET 2012

On Fri, Mar 2, 2012 at 11:07 PM, Richard O'Keefe <ok@REDACTED> wrote:

>
> In fact the negative integers are the only points on the real line where
> factorial is
> undefined.  There are, however, an infinite number of them, and that
> doesn't mean that
> guards are inexpressible.
>

I was thinking more along the lines of topologies that look something like:

http://www.wolframalpha.com/input/?i=1+%2F%28+x+mod+y%29

which can become very expensive to sample with a Monte Carlo method or more
complex systems like:

http://www.wolframalpha.com/input/?i=Duffing+Differential+Equation

which in most forms can only be approximated through sampling.  There's a
lot of interesting surfaces which while attempting to sample to gain
resolution end up degenerating into the run-away case, but run away can be
either positive or negative, a guard may not help, especially when the
topology is driven by a heuristic. (i.e. I don't know the equation a head
of time, and I can't generate guards on the fly).

> I just tried it on a Mac.  "CPU A Temperature Diode" was initially at 32C.
> Running fac(-1), it rose over a couple of minutes to 38C, and then fell
> back to
> 34C, where it remained.
>

I have a Mac with a busted thermal sensor and it will continue to heat up
until the whole  thing locks.  Your test actually proved you have a working
thermal sensor and that the OS was safely limiting how quickly you were
wasting CPU and power.  The devices we're building do have termal sensors
so I can guarantee that they exist, I can also guarantee that it is
available in user mode (I control the kernel as well), but what interests
me is Erlang has its own scheduler and threading model on top of the OS's,
which means relying on Linux to "tune" performance this way is like taking
a big stick and beating all the Erlang processes for one bad apple.   All
of our server back off CPU frequency, and with designs like bigLITTLE from
ARM becoming more common place.  What do you do when the outside air
temperature is 50C, and if your CPU steps below a certain power threshold
it can no longer keep up with throughput to ensure real time response.

> I have programs that saturate I/O without anything being wrong.

I have programs that if they saturate I/O it will cause a failure as it
will no longer be able to guarantee real time response.  Having tooling in
Erlang so that one can identify a run-away I/O process would be generally
useful.  The wrongness of a program is a matter of how it fits it's
purpose.  You might be able to tolerate I/O saturation.  In my case, our
customers get very angry (and litigious).

> I've been reading about self-stablising systems lately, and the rock
> bottom answer seems
> to be "watchdog timers".  A watchdog timer would certainly have caught
> this.
>

Watchdog timers help, but you typically need more instrumentation to make
reasonable decisions.  In one system, I have a watchdog checking each long
running process every second, and a secondary process measuring the average
message volume processed by each entity.  We also have each process
announce memory pressure,  GC state, I/O  totals, and CPU load on a per OS
process level, which are used to determine how to dynamically route data
through the system.  Without sufficient data, a watchdog will not make the
right decision.

Basically, what I'd be interesting in seeing is power, heat, time, cycles,
and I/O traffic accounting on a per-Erlang process level, so that the
supervisors could do a better job at managing a system under stress.

-- 
-=-=-=-=-=-=-=-=-=-=- http://blog.dloh.org/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20120303/fa78d67f/attachment.htm>