Never trust a statistics you didn't forge yourself

Tue Feb 21 22:45:12 CET 2006

On Tuesday 21 February 2006 13:52, Marc van Woerkom wrote:
> Hello,

Dear Marc, dear members of the Erlang Mailing List,

I am the main author of the study that has been discussed in this thread and 
also in this one:
http://www.erlang.org/ml-archive/erlang-questions/200602/msg00355.html

I have thought long about whether or not I should try to defend our study 
here, especially since we seem to have raised a lot of hostility here for 
some reason. I will therefore start with answering the points you made in 
this mail, and secondly add some opinions to the points raised in the other 
thread. 

> I just got a copy of your survey results and I am upset
> about it:

You are of course free to feel that way. I knew that I would upset some people 
with this study, especially the ones whose "pet system" has not gotten many 
votes. What surprises me though is, that I the only flames I got until now 
come from the Erlang-community, although Erlang has gotten way more than its 
share of publicity in our paper. 

> First you write:
> > To raise more interest for the survey and generate submissions, we have
> > sent messages to several discussion groups, once shortly after the start
> > of the survey and once shortly before the end:
> > ? Usenet: comp.parallel, comp.parallel.mpi, comp.parallel.pvm,
> > comp.sys.super, aus.computers.parallel, comp.programming.threads
> > ? Mailing Lists: beowulf@REDACTED, lam@REDACTED, omp@REDACTED,
> > compunity@REDACTED, bspall@REDACTED
> > ? Forums: CodeGuru Forum, Intel Developer Services Forum
> > ? Websites: slashdot.org
>
> then
>
> > We know of one incident, where
> > an influential member of the Erlang community requested
> > members of their mailing list
> > to show their support for Erlang by participating in the
> > survey, which they promptly did.
> > Incidents like this one have of course influenced the
> > results as well.
>
> This implies that some kind of guru is able to motivate
> his minions.

This is your interpretation, and I certainly did not mean it that way. I have 
paid careful attention to make it very clear, how the study was carried out 
and whom I have contacted. I have tried to contact a mailing list or forum 
for every parallel programming system that was explicitly mentioned in the 
survey question two. Erlang is not explicitly mentioned there, and I have 
therefore not posted to this list. Why was it not explicitly mentioned there? 
Because I had to make a subjective choice about what systems interested me 
most, and about what systems I thought were most widely used. And like it or 
not, Erlang was not on this list.

Fact is, that Joe posted about the survey on this mailing list, you can still 
find his post in the archives:
http://www.erlang.org/ml-archive/erlang-questions/200510/msg00171.html

Do any of you disagree, that Erlang has gotten a higher number of votes 
because of this? I do not think so. And let me get this straight: I am not 
mad at Joe for doing it or anything, these votes are as valid as anyone elses 
and I am thankful to anyone who provided me with his/her opinion. But I have 
an explanation for the relatively high number of votes for Erlang and I have 
been open about all other aspects of this survey. Why the heck should I keep 
this one secret??

I have made it very clear in my paper, that the results of this survey are not 
statistically relevant. They cannot be, because I know of no way to sample a 
proportional sample of the parallel programming community. If you have any 
idea on how to do that and have the money for it, feel free to contact me. I 
have not been able to find any comparable data anywhere at all. I did the 
best I could to provide at least some clues. These data are not statistics, 
they are indications at best. I have made it very clear in my paper, that the 
data are not statistically relevant. I have been very careful with my 
observations and I have been as open as I could regarding the process of 
gathering the data. More I cannot do.

> I can't remember who it was who posted this to erlang
> mailing list. I was just informed that there was a survey
> and of course it made perfect sense to participate as I am
> a user of a concurrent language. That's what it is good
> for.

Sure. Thanks again for voting and writing your opinion.

> There are obviously at least two camps of concurrent
> problems - the massive number crunching ones and the ones
> where you have lots of parallel transactions in a possibly
> distributed system - that's where Erlang is strong.
>
>
> You obviously posted to
>
> - comp.parallel.pvm
> - beowulf@REDACTED
> - omp@REDACTED
>
> all from the number crunchers camp and even to
>
> - CodeGuru Forum
>
> which is a Java forum.
>
> Why did you not post to the erlang list yourself?

I think I have explained the reasons for that above. 

> > We have already told you the reasons for the relatively > > high numbers
> > for Erlang in section 2, so please treat them with caution. What can be
> > deduced > from these numbers, though, is that Erlang has a very active
> > user community
> > and it is obviously possible to
> > write parallel programs in Erlang.
>
> You could have known before.

Yes, I could have. And since I have a colleague sitting right opposite to me, 
who lurks on this list, I even knew it beforehand. But if I only wrote about 
things I do NOT know in my papers, they would be pretty bad, wouldn't they? 
This paper has been downloaded more than 300 times in only two days, although 
I have not even made it public. All I did was send an email to all people who 
wanted to be notified of the results. How many of these people knew about 
Erlang beforehand? I bet not even half of them (although I could be wrong 
here, but I bet there are no data on this either). I believe I have not 
written anything bad at all about Erlang, rather the opposite. I have also 
written about the active Erlang community. I do not quite understand why this 
is held against me now, especially on this list and in an apparently hostile 
manner.

To answer some other points:
chandru wrote:
> The cautionary statement about Erlang said that the high response rate
> was because "an influential member of the community" (Joe) posted it
> to the mailing list. So what is wrong with that? We didn't all get
> together to decide on the answers before participating in the survey!
> They themselves tried to give the survey as much publicity as they
> could.

I think I have answered this above. Nothing is wrong with that, but I still 
had to mention it somehow. I did not mean to insult anyone with the way I 
phrased it either. 

Daniel Luna wrote:
> But this is just plain stupid.

I do not think I can say anything to counter that level of discussion.

> "At [insert mailing list THEY sent to] there are probably a lot of [insert 
> programming language] programmers, so these figures cannot be taken too 
> seriously."

> So the possibility that a very high percent of erlang developers are using 
> it for parallel programming did not occur to the authors?

Should I really reply to this? I will try: We have tried to contact a mailing 
list for every parallel programming system mentioned in question two. We have 
not contacted any other mailing lists for parallel programming systems like 
Erlang. Therefore, the numbers for the systems mentioned in question two can 
be compared in a way, and therefore they are in one figure. And the "Other" 
systems can be compared in a way. None of the other systems was contacted by 
us, and as far as we know, Erlang was the only "Other" system, where the 
survey got noticed at all. This is why the numbers for Erlang are unusually 
high and cannot be directly compared to the other systems, and this is why we 
mention it in the paper. Nothing more, nothing less. 

When I asked for review on this paper, the Erlang numbers were held against me 
again and again, and it was even suggested to take them out. I have decided 
not to do that, because I did not want to let my or anyone elses subjective 
opinion influence the results any more than necessary. Maybe I should have 
just listened...

> On the other hand, this whole survey seems to be bogus. If you remove the 
> 130 Erlang answers you remove more than half of the total answers (256). I 
> have no idea how they manage to get any results at all from that.

As Ulf already noted in his reply, there were no 130 Erlang answers. I am not 
really good at statistics, but I recall from one of my classes that anything 
with less than 20 participants is not statistically meaningful. We have had 
more than ten times that, so if you really want to complain, please do 
complain about the composition of our sample, but not about its size, because 
this will get you further in bashing the survey.

Luke Gorrie wrote:
> I can sympathise with the surveyors. They seem to be interested in
> parallel programming as in "using lots of hardware in parallel to
> solve a problem" like SETI, CGI rendering farms, etc. I bet they got a
> lot of responses from Erlang people writing internet servers & telecom
> systems which are another kettle of fish entirely.

You are right, we come more from a high performance computing background. 
Nevertheless, I am interested in all kinds of parallel programming and I have 
even taken a look in the past at Erlang. Unfortunately, it does not give any 
speedups on multiple processors yet, but I hear that even this is changing, 
making it even more interesting for us. So why should I not welcome any 
answers from you people? You have a different perspective on parallel 
programming than I do, but I am very willing to listen to people with 
different perspectives, as quite often this leads to new insights...

Ulf Wiger wrote:
> One can of course note that while the survey form
> allowed ticking in simultaneous use of C, C++, Fortran,
> and Java, it encouraged only one mention each of a 
> functional, logical or other programming language.
> I don't know how much this can be expected to have 
> skewed the results in favour of the explicitly named
> languages.

I expect this to have skewed the results quite a bit in favour of these 
languages, and this is the reason I have only compared the explicitly named 
ones with each other, and the "others" with each other. And even in comparing 
them, I have been very careful with my observations.

This mail has gotten way out of hand. I hope I could clear things up a bit and 
show you people a little bit of my perspective on the matter. 

Thanks for listening,
best regards from Germany,
Michael Suess

-- 
"What we do in life, echos in eternity..."
M.: msuess@REDACTED | T.: +49-561-804-6269 | F.: +49-561-804-6219
WWW: http://www.plm.eecs.uni-kassel.de/plm/index.php?id=msuess
Public PGP key and fingerprint available at above address.
Research Associate, Programming Languages / Methodologies Research Group
University of Kassel, Wilhelmshöher Allee 73, D-34121 Kassel
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20060221/d0b47b52/attachment.bin>