Never trust a statistics you didn't forge yourself
Michael Suess
msuess@REDACTED
Tue Feb 21 22:45:12 CET 2006
On Tuesday 21 February 2006 13:52, Marc van Woerkom wrote:
> Hello,
Dear Marc, dear members of the Erlang Mailing List,
I am the main author of the study that has been discussed in this thread and
also in this one:
http://www.erlang.org/ml-archive/erlang-questions/200602/msg00355.html
I have thought long about whether or not I should try to defend our study
here, especially since we seem to have raised a lot of hostility here for
some reason. I will therefore start with answering the points you made in
this mail, and secondly add some opinions to the points raised in the other
thread.
> I just got a copy of your survey results and I am upset
> about it:
You are of course free to feel that way. I knew that I would upset some people
with this study, especially the ones whose "pet system" has not gotten many
votes. What surprises me though is, that I the only flames I got until now
come from the Erlang-community, although Erlang has gotten way more than its
share of publicity in our paper.
> First you write:
> > To raise more interest for the survey and generate submissions, we have
> > sent messages to several discussion groups, once shortly after the start
> > of the survey and once shortly before the end:
> > ? Usenet: comp.parallel, comp.parallel.mpi, comp.parallel.pvm,
> > comp.sys.super, aus.computers.parallel, comp.programming.threads
> > ? Mailing Lists: beowulf@REDACTED, lam@REDACTED, omp@REDACTED,
> > compunity@REDACTED, bspall@REDACTED
> > ? Forums: CodeGuru Forum, Intel Developer Services Forum
> > ? Websites: slashdot.org
>
> then
>
> > We know of one incident, where
> > an influential member of the Erlang community requested
> > members of their mailing list
> > to show their support for Erlang by participating in the
> > survey, which they promptly did.
> > Incidents like this one have of course influenced the
> > results as well.
>
> This implies that some kind of guru is able to motivate
> his minions.
This is your interpretation, and I certainly did not mean it that way. I have
paid careful attention to make it very clear, how the study was carried out
and whom I have contacted. I have tried to contact a mailing list or forum
for every parallel programming system that was explicitly mentioned in the
survey question two. Erlang is not explicitly mentioned there, and I have
therefore not posted to this list. Why was it not explicitly mentioned there?
Because I had to make a subjective choice about what systems interested me
most, and about what systems I thought were most widely used. And like it or
not, Erlang was not on this list.
Fact is, that Joe posted about the survey on this mailing list, you can still
find his post in the archives:
http://www.erlang.org/ml-archive/erlang-questions/200510/msg00171.html
Do any of you disagree, that Erlang has gotten a higher number of votes
because of this? I do not think so. And let me get this straight: I am not
mad at Joe for doing it or anything, these votes are as valid as anyone elses
and I am thankful to anyone who provided me with his/her opinion. But I have
an explanation for the relatively high number of votes for Erlang and I have
been open about all other aspects of this survey. Why the heck should I keep
this one secret??
I have made it very clear in my paper, that the results of this survey are not
statistically relevant. They cannot be, because I know of no way to sample a
proportional sample of the parallel programming community. If you have any
idea on how to do that and have the money for it, feel free to contact me. I
have not been able to find any comparable data anywhere at all. I did the
best I could to provide at least some clues. These data are not statistics,
they are indications at best. I have made it very clear in my paper, that the
data are not statistically relevant. I have been very careful with my
observations and I have been as open as I could regarding the process of
gathering the data. More I cannot do.
> I can't remember who it was who posted this to erlang
> mailing list. I was just informed that there was a survey
> and of course it made perfect sense to participate as I am
> a user of a concurrent language. That's what it is good
> for.
Sure. Thanks again for voting and writing your opinion.
> There are obviously at least two camps of concurrent
> problems - the massive number crunching ones and the ones
> where you have lots of parallel transactions in a possibly
> distributed system - that's where Erlang is strong.
>
>
> You obviously posted to
>
> - comp.parallel.pvm
> - beowulf@REDACTED
> - omp@REDACTED
>
> all from the number crunchers camp and even to
>
> - CodeGuru Forum
>
> which is a Java forum.
>
> Why did you not post to the erlang list yourself?
I think I have explained the reasons for that above.
> > We have already told you the reasons for the relatively > > high numbers
> > for Erlang in section 2, so please treat them with caution. What can be
> > deduced > from these numbers, though, is that Erlang has a very active
> > user community
> > and it is obviously possible to
> > write parallel programs in Erlang.
>
> You could have known before.
Yes, I could have. And since I have a colleague sitting right opposite to me,
who lurks on this list, I even knew it beforehand. But if I only wrote about
things I do NOT know in my papers, they would be pretty bad, wouldn't they?
This paper has been downloaded more than 300 times in only two days, although
I have not even made it public. All I did was send an email to all people who
wanted to be notified of the results. How many of these people knew about
Erlang beforehand? I bet not even half of them (although I could be wrong
here, but I bet there are no data on this either). I believe I have not
written anything bad at all about Erlang, rather the opposite. I have also
written about the active Erlang community. I do not quite understand why this
is held against me now, especially on this list and in an apparently hostile
manner.
To answer some other points:
chandru wrote:
> The cautionary statement about Erlang said that the high response rate
> was because "an influential member of the community" (Joe) posted it
> to the mailing list. So what is wrong with that? We didn't all get
> together to decide on the answers before participating in the survey!
> They themselves tried to give the survey as much publicity as they
> could.
I think I have answered this above. Nothing is wrong with that, but I still
had to mention it somehow. I did not mean to insult anyone with the way I
phrased it either.
Daniel Luna wrote:
> But this is just plain stupid.
I do not think I can say anything to counter that level of discussion.
> "At [insert mailing list THEY sent to] there are probably a lot of [insert
> programming language] programmers, so these figures cannot be taken too
> seriously."
> So the possibility that a very high percent of erlang developers are using
> it for parallel programming did not occur to the authors?
Should I really reply to this? I will try: We have tried to contact a mailing
list for every parallel programming system mentioned in question two. We have
not contacted any other mailing lists for parallel programming systems like
Erlang. Therefore, the numbers for the systems mentioned in question two can
be compared in a way, and therefore they are in one figure. And the "Other"
systems can be compared in a way. None of the other systems was contacted by
us, and as far as we know, Erlang was the only "Other" system, where the
survey got noticed at all. This is why the numbers for Erlang are unusually
high and cannot be directly compared to the other systems, and this is why we
mention it in the paper. Nothing more, nothing less.
When I asked for review on this paper, the Erlang numbers were held against me
again and again, and it was even suggested to take them out. I have decided
not to do that, because I did not want to let my or anyone elses subjective
opinion influence the results any more than necessary. Maybe I should have
just listened...
> On the other hand, this whole survey seems to be bogus. If you remove the
> 130 Erlang answers you remove more than half of the total answers (256). I
> have no idea how they manage to get any results at all from that.
As Ulf already noted in his reply, there were no 130 Erlang answers. I am not
really good at statistics, but I recall from one of my classes that anything
with less than 20 participants is not statistically meaningful. We have had
more than ten times that, so if you really want to complain, please do
complain about the composition of our sample, but not about its size, because
this will get you further in bashing the survey.
Luke Gorrie wrote:
> I can sympathise with the surveyors. They seem to be interested in
> parallel programming as in "using lots of hardware in parallel to
> solve a problem" like SETI, CGI rendering farms, etc. I bet they got a
> lot of responses from Erlang people writing internet servers & telecom
> systems which are another kettle of fish entirely.
You are right, we come more from a high performance computing background.
Nevertheless, I am interested in all kinds of parallel programming and I have
even taken a look in the past at Erlang. Unfortunately, it does not give any
speedups on multiple processors yet, but I hear that even this is changing,
making it even more interesting for us. So why should I not welcome any
answers from you people? You have a different perspective on parallel
programming than I do, but I am very willing to listen to people with
different perspectives, as quite often this leads to new insights...
Ulf Wiger wrote:
> One can of course note that while the survey form
> allowed ticking in simultaneous use of C, C++, Fortran,
> and Java, it encouraged only one mention each of a
> functional, logical or other programming language.
> I don't know how much this can be expected to have
> skewed the results in favour of the explicitly named
> languages.
I expect this to have skewed the results quite a bit in favour of these
languages, and this is the reason I have only compared the explicitly named
ones with each other, and the "others" with each other. And even in comparing
them, I have been very careful with my observations.
This mail has gotten way out of hand. I hope I could clear things up a bit and
show you people a little bit of my perspective on the matter.
Thanks for listening,
best regards from Germany,
Michael Suess
--
"What we do in life, echos in eternity..."
M.: msuess@REDACTED | T.: +49-561-804-6269 | F.: +49-561-804-6219
WWW: http://www.plm.eecs.uni-kassel.de/plm/index.php?id=msuess
Public PGP key and fingerprint available at above address.
Research Associate, Programming Languages / Methodologies Research Group
University of Kassel, Wilhelmshöher Allee 73, D-34121 Kassel
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20060221/d0b47b52/attachment.bin>
More information about the erlang-questions
mailing list