Never trust a statistics you didn't forge yourself
Andrae Muys
andrae@REDACTED
Thu Feb 23 02:48:12 CET 2006
On 23/02/2006, at 9:55 AM, Michael Suess wrote:
[snip]
> Andrae Muys wrote:
>> A truly representative survey would have identified a community of
>> users who
>> write parallel programs, and have targeted them with solicitations
>
> And this is exactly the reason, why I have claimed no statistical
> value in the
> data whatsoever. I even write in the report itself:
>
> "For this reason, please take all results of this survey with a
> grain of salt,
> as they are not statistically sound! For statistical signiÞcance,
> we would
> have to sample a proportional part of the parallel programming
> population,
> and we know of no way to do so (at least not within
> our budget). It is for this reason, that you will not Þnd any
> statistical
> measures applied to the data in this paper."
>
> And there you will also find the answer I would like to give to
> your claim. We
> know the data are not hard. And I have also done my homework and
> thought
> about how I could make these data more statistically useful, but I
> have come
> to the conclusion that I do not know how! And I am fairly sure when
> you go
> further than rough sketches of your plans to identify, sample and
> contact
> these subcommunities, you will come to a similar result, at least
> when you
> consider that this is just a side-track of my research and that we
> do not
> have as much money as we wish sometimes. But maybe you can prove me
> wrong and
> do a better survey, I would sure be interested in the results...
>
[snip]
> Marc van Woerkom wrote:
[snip]
>> As an example that bad rankings don't cause any stirr take
>> e.g. the big language shootout, which is discussed on this
>> list occassionaly.
>
> And the reason for this could be, that they do not comment on the
> results at
> all ?
Bingo.
Your disclaimer "they are not statistically sound" was spot on. And
yet you tried to draw conclusions from the survey anyway.
Now I am not a member of the erlang community, I just lurk here (this
is not the place for me to explain why :). Consequently my concern
is not because of any 'offense to erlang', but rather because the
paper is shoddy science. This really is a matter of getting back to
basics:
1. What was your hypothesis?
2. What was your methodology?
3. What were your results?
4. Can you draw meaningful conclusions that support or contradict
your hypothesis from your results given your methodology?
Hypothesis:
1. Everyone is using MPI these days.
2. Java threads is growing strong.
Methodology:
Prepare survey and publicise survey by posting to C/C++/Fortran
based newsgroups and mailing lists
Results:
1. Amongst C/C++/Fortran users MPI is going strong
2. Amongst C/C++/Fortran users Java threads is not.
Conclusions:
None. Unless you redefine 'everyone' to be 'C/C++/Fortran' users
of traditional computation-based libraries.
Of course doing a meaningful survey is hard. That your alternative
was easy does not have any impact on the validity of your results.
Andrae Muys
More information about the erlang-questions
mailing list