[erlang-questions] Frying pan bug

Gustav Simonsson gustav.simonsson@REDACTED
Mon Mar 5 10:26:30 CET 2012


Afaik the temperature sensors of Intel CPU cores are designed to provide
thermal throttling and thermal shutdown - not to report accurate temperatures.

Even if the sensors are fast enough to react to temperature differences 
between scheduler time slices you wouldn't see that since they can typically
only report changes within 1 degree (and the error margin can be as high as 5-10
degrees depending on the CPU model).

Also in modern CPUs you typically see a few degrees difference between cores, part 
due to the error margin of the sensors base calibration and in part because the
cooling paste and heatsink mounting is never perfectly symmetrical over the chip surface.

// Gustav Simonsson

Sent from my PC

----- Original Message -----
> From: "Richard O'Keefe" <ok@REDACTED>
> To: "David Goehrig" <dave@REDACTED>
> Cc: "Erlang" <erlang-questions@REDACTED>
> Sent: Monday, 5 March, 2012 4:46:32 AM
> Subject: Re: [erlang-questions] Frying pan bug
> 
> 
> On 4/03/2012, at 4:54 AM, David Goehrig wrote:
> > 
> > Basically, what I'd be interesting in seeing is power, heat, time,
> > cycles, and I/O traffic accounting on a per-Erlang process level,
> > so that the supervisors could do a better job at managing a system
> > under stress.
> 
> Problem 1:  the machine I'm typing on has two cores, but only one
> temperature diode.
> If Erlang process X is running on Core 0, and
>    Erlang process Y is running on Core 1, and
>    process X is doing something to make the chip overheat
> the temperature diode cannot tell me which core is producing the
> heat, so it cannot tell me which Erlang process is doing it.
> 
> Problem 2:  something I have not been able to find an answer to
> yet (and I've asked a couple of people I was sure would know)
> is how *fast* are the temperature diodes?  Or more accurately,
> given a certain power change by a core and the thermal
> characteristics of the chip as a whole, and given that the
> sensors only report to the nearest 1 degree C, how long does
> it take before a power change causes a change in the reading
> from the temperature diode?  If that change is not less than
> half the time slicing interval used by the scheduler, it's not
> clear to me that you can discriminate two processes running on
> the same core.
> 
> I can imagine an averaging method to sort of deal with problem 2,
> but it doesn't survive problem 1.
> 
> Another alternative would be to monitor the power used by each core,
> if that is possible, but the readings I see are frankly insane.
> (As are the f_bsize results that I get back from statvfs() on a Mac;
> can the OS *really* be recommending 32 MiB as the block size for
> read() on a particular partition?  But that's another story.)
> 
> 
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
> 



More information about the erlang-questions mailing list