Following from my post about the 1999 ganzfeld meta-analysis of Milton and Wiseman, I thought I'd write about another meta-analysis of the same type of experiments. In 1985 Charles Honorton wrote a meta-analysis of ganzfeld experiments from 1974-1982.
"The composite (Stouffer) Z score for the 28 studies is 6.6 (p < 10^ -9), and 43% of the studies were independently significant at the 5% level."
This Stouffer (unweighted) z of 6.6 is the one most commonly quoted in articles commenting on Honorton's findings. But as we saw with Milton and Wiseman's paper, choices regarding statistical measure and inclusion criteria can alter this figure quite radically.
The ganzfeld experiments have enough different aspects that, like the sliders on a graphic equalizer, you can adjust to get the desired result. It illustrates a sort of Heisenberg's Principle for statistics, where the more someone knows about a particular subject, the less able they are to quantify it with any accuracy. And I should emphasise that this is the point I am trying to make: the subjective element of meta-analyses can be considerable.
The Inclusion Criteria
Honorton based his figures on a sub-set of 28 experiments taken from the 42 experiments discussed in Hyman's 1985 paper. Among these 42 experiments, a number of different scoring systems were used. This, claimed Hyman, could lead to a problem where an experiment that initially used one method of scoring later found that a different method gave a better result, and reported that instead. As Stanford (1984) summarized:
“For whatever reason, many ganzfeld researchers have, historically speaking, seemed very unsure of what method to use to evaluate overall ESP performance. Many have used at least two, and sometimes more, methods of analysis. This common failure to settle, on logical, a priori grounds, upon a single method of analysis makes it difficult to decide whether ESP has occurred in any study where multiple analyses have been used with divergent outcomes.”
Honorton agreed with Hyman that the issue of multiple analysis was a problem, and so he decided to conduct his meta-analysis using only one scoring method. Namely, the Direct Hit method, which was the most common.
However, did this really address the issue? The experiment that Hyman used to illustrate the problem (York, 1980 which used Order Ranking as its main measure of success and Direct Hit as a secondary one, but only reported the statistically significant Direct Hit results) is still included in the database. So I think the problem still remains, especially when you consider that the Direct Hit score can be derived from the data for other measures such as Binary Hits, Sum Of Ranks or Z-score Ratings, so it may be too much of a temptation for an experimenter to report a significant or more positive result on this scale alongside the other measures. Honorton included any experiment that reported Direct Hits, whether they were the primary measure or not.
Honorton's choice of Direct Hits may make sense at first glance since it includes the results from the majority of experiments (ie, 28 out of 42). However, it does not include the majority of the data (835 trials out of 2,567) and it is worth looking at the data that Honorton removed.
As a whole, the missing 14 experiments contain 1,612 trails with a Stouffer z of -0.01 (ie, fractionally below chance). Eleven of the fourteen reported results in a numerical form, the other three simply said the experiment was unsuccessful so in my calculations, a z-score of zero was awarded.
If we combine these fourteen with Honorton's database, the unweighted z-score falls from 6.6 to 5.2.
The Milton and Wiseman meta-analysis was criticised for using a method of combining scores that did not take into account the size of each experiment. Since Honorton uses the same method, it seems valid to apply the same adjustment here. Once we use a weighted z-score, the result drops to 2.72 (odds of around 1 in 303).
So with these two really quite uncontroversial decisions (include all data, and choose a more appropriate statistical measure) the result has fallen quite dramatically.
And once you have a certain amount of knowledge about the database, it's very easy to find more ways to push the result down even further. Now, I should reiterate that this isn't about the evidence for psi per se, but it does indicate that there is no single correct answer.
A set of results from Cambridge were famously criticised by Blackmore (as well as Parker & Wiklund and C.E.M. Hansel) and as a result were removed by Jessica Utss in her analyses of the ganzfeld data (Utts, 1999, 2010). So if you take the example of Utts and remove the data from Cambridge then the weighted z-score falls even further, down to 2.18 (odds of around 1 in 69).
Small scale experiments
In calculating each z-score, a binomial distribution is used. Since this is not applicable to experiments with small numbers of trials (Wikipedia suggests trials multiplied by chance probability (mostly 0.25 in this case) is less than 5, so I'll use that) we can remove all experiments with less that 20 trials. This reduces the weighted z-score to 2.09 (1 in 54)
[note: changed the wording of the above paragraph after some comments indicated it wasn't clear. Hope it is now. I can't get blogspot to deal with even the simplest algebraic symbols]
In fact, it would be quite simple to write up a meta-analysis using these criteria as if they were perfectly sensible choices made by an impartial observer before any calculations were attempted. The truth is that sometimes I would try excluding a class of experiments, only to find that it pushed the result up again. I simply ignored that, and tried something else. In fact, this exercise has made me far more skeptical of meta-analyses than I am of the existence of ESP.
So, what hoops would a skeptic need to jump through in order to reduce the results to chance (or near chance)? Despite such a considerable drop so far, it is actually quite difficult to get the result down much more.
It is necessary to include all the experiments up until 1984 (ie, up to the year before the publication of Honorton's meta-analysis) and then take out two experiments by Honorton and Terry which had been criticised on methodological grounds by Kennedy.
This puts the weighted z-score at 1.78 (1 in 27) although the unweighted z-score is now, for once, lower than the weighted at 0.61 (approximately 1 in 4) so a really cheeky skeptic could reinstate the statistical measure they'd abandoned at the start because it inflated the score!
BLACKMORE, S., (1987) "A Report of a Visit to Carl Sargent's Laboratory", Journal of the Society for Psychical Research, 54, pp 186-198
HANSEL, C.E.M, (1985) "The Search for a Demonstration of ESP", "A Skeptic's Handbook of Parapsychology", ed. Paul Kurtz, pp97-128
HONORTON, C., (1985) "Meta-Analysis of Psi Ganzfeld Resarch: A Response to Hyman", Journal of Parapsychology 49, pp 51-91
HYMAN, R., (1985) “The Ganzfeld Psi Experiment: A Critical Appraisal”, Journal of Parapsychology 49, pp 3-50
KENNEDY, J.E., (1979) “Methodological Problems in Free-Response ESP Experiments”, Journal of the American Society for Psychical Research, vol 73, pp 1-15
MURRAY, A. L., (2011) “The Validity Of The Meta-Analytic Method In Addressing The Issue Of Psi Replicability", Journal of Parapsychology, vol 75:2
PARKER, A., WIKLUND, N. (1987) “The ganzfeld experiments: towards an assessment”, Journal of the Society for Psychical Research, 54, pp 261-265
STANFORD, R.G., (1984) “Recent Ganzfeld-ESP Research: A Survey and Critical Analysis”, Advances in Parapsychology 4, pp 83-111
UTTS, J. (1999) " The Significance of Statistics in Mind-Matter Research", Journal of Scientific Exploration, Vol. 13, No. 4, pp.615-638
UTTS, J. (2010) "The Strength of Evidence Versus the Power of Belief: Are We All Bayesians?"
YORK, M. (1977). “The defense mechanism test (DMT) as an indicator of psychic performance as measured by a free-response clairvoyance test using a ganzfeld technique”, Research in parapsychology, 1976, pp. 48-49
Software used for statistics was Meta-Analysis 5.3 by Ralf Schwarzer