103 of 105 people found the following review helpful
Format: HardcoverVerified Purchase
Tests of statistical significance are a particular tool which is appropriate in particular situations, basically to prevent you from jumping to conclusions based on too little data. Because this topic lends itself to definite rules which can be mechanically implemented, it has been prominently featured in introductory statistics courses and textbooks for 80 years. But according to the principle "if all you have is a hammer, then everything starts to look like a nail", it has become a ritual requirement for academic papers in fields such as economics, psychology and medicine to include tests of significance. As the book argues at length, this is a misplaced focus; instead of asking "can we be sure beyond reasonable doubt that the size of a certain effect is not zero" one should think about "how can we estimate the size of the effect and its real world significance". A nice touch is the authors' use of the word oomph for "size of effect".
Misplaced emphasis on tests of significance is indeed arguably one of the greatest "wrong turns" in twentieth century science. This point is widely accepted amongst academics who use statistics, but perversely the innate conservatism of authors and academic journals causes them to continue a bad tradition. All this makes a great topic for a book, which in the hands of an inspired author like Steven Jay Gould might have become highly influential. The book under review is perfectly correct in its central logical points, and I hope it does succeed in having influence, but to my taste it's handicapped by several stylistic features.
(1) The overall combative style rapidly becomes grating.
(2) A little history -- how did this state of affairs arise? -- is reasonable, but this book has too much, with a curious emphasis on the personalities of the individuals involved, which is just distracting in a book about errors in statistical logic.
(3) The authors don't seem to have thought carefully about their target audience. For a nonspecialist audience, a lighter How to Lie With Statistics style would surely work better. For an academic audience, a more focused [logical point/example of misuse/what authors should have done] format would surely be more effective.
(4) Their analysis of the number of papers making logical errors (e.g. confusing statistical significance with real-world importance) is wonderfully convincing that this problem hasn't yet gone away. But on the point "is this just an academic game being played badly, or does it have harmful real world consequences" they assert the latter but merely give scattered examples, which are not completely convincing. If people fudge data in the traditional paradigm then surely they would fudge data in any alternate paradigm; if one researcher concludes an important real effect is "statistically insignificant" just because they didn't collect enough data, then won't another researcher be able to collect more data and thereby get the credit for proving it important? Ironically, they demonstrate the harmful real world effect is of the cult is non-zero but not how large it is ...... 56 of 64 people found the following review helpful
Format: PaperbackVerified Purchase
I know and admire Deirdre McCloskey's work and I am an empirical economist who has to work every day with t and F tests and p-values. So I was quite excited when I read that this particular author had co-authored a book on this particular subject.
Unfortunately, I was quite disappointed. I was expecting either a narrative of errors made in the name of statistical significance or an in-depth analysis of what tests really mean. The authors do neither.
In the first half of the book, they superficially narrate the problems with the Vioxx clinical trials, but tell few other stories of how the standard error "costs jobs, justice and lives." A narrative along the lines of "Normal Accidents", by Charles Perrow, which documents an extensive list of accidents to tell of the perils of complexity, would have made for much better reading. After reading the book, I am none the wiser as to why or how the jobs, justice and lives were lost to statistical significance.
Alternatively, the book could have explained in terms clear to those who do not work every day with tests what is meant by significance and power of a test and what these terms really mean. I have never seen an explanation of these terms that is really clear and sticks in your mind. Unfortunately this was not the case either. The authors mention that statistical significance is more complex than just p-values, affirm that most economists not understand why, and leave it at that. They confuse more than explain.
As a final problem, the book takes a good versus evil attitude that is nowhere good science. Gosset is good and Fischer is bad. Please.
In conclusion, while I agree with the author's main thesis, their book argues it very poorly, very lengthily, and very tediously.
132 of 157 people found the following review helpful
Format: Paperback
I attended a seminar by McCloskey when she announced she was working on this then-upcoming book. So I knew beforehand that its style would be more like a victim-tells-all revenge than a fun-seeking discovery typical of most popular science books. The first half of the book (up to Chapter 13) did turn out to be bitter. However, at least that part was largely based on facts, such as a comprehensive count of academic papers failing to meet certain standards. The second half of the book was devoted to the biographies of key persons who led to the rise of what the authors called the "cult of statistical significance". The book lost any pretense of integrity at that point, and just started slinging muds. Gosset was portrayed as a good-natured figure who worked hard like a bee; and Fisher, a mad scientist who stole the labor of others and would attack people by any means to defend his status. At one point the authors didn't even bother to call Fisher by his name, and just referred to him as the Wasp. They also dragged Fisher's mother into the ordeal by making suggestions that she was responsible for turning Fisher into a cold-hearted person that they claimed.
I was not only disgusted by this kind of tabloid sensationalism, but was also disappointed by how little useful information I got out of this long-awaited book. The authors "irrationalized" the popularization of statistical significance by framing it as the work of a cult. To further illegitimatize the use of statistical significance, they argued that it is wrong to rely on it to evaluate scientific hypotheses because (1) what we really want is how likely for a hypothesis to be true given the data, not the other way around; and (2) there are other clues just as, if not more, important, especially the effect size. These could have been reasonable positions if they did not make statistical significance a scapegoat for being a "fallacy" just because it is defined on the likelihoods of observing data given the hypotheses. As the way it is defined, statistical significance provides a measure of precision. That's all. Just because it doesn't answer all the questions of scientific interest doesn't mean it provides no useful information and certainly doesn't automatically make it a fallacy. Furthermore, many hypothesis tests used in academic researches are based on likelihood "ratios" rather than just the conditionals. At least there would be NO fallacy for the believers of the Likelihood Principle. It is quite regrettable that they fail to elaborate on such crucial information to make other people look stupid, whether it was their intention or not. As for the second point, I agree that researchers should have paid more attention to other factors, such as statistical power and sample size, IN ADDITION TO statistical significance. But I think it is misguided to hail any ban on reporting statistical significance as a heroic act of revolt as the authors did in the book. One can report all the effect sizes he wants. But it all means nothing if his inferences are what they appear to be mostly due to "bad luck" in sampling the wrong subjects.
If my views above are on the right track, then this book would serve the research community no good by martyrizing Gosset and demonizing Fisher. There has been no cult all along. If we are justified in believing that some vested interests overemphasized statistical significance to divert our attention away from the more important issues, then we should encourage people (authors and readers alike) to focus on those more important issues instead of treating statistical significance as if it were irrelevant. For a more serious and more informative discussion on this topics, I would recommend Chow's Statistical Significance: Rationale, Validity and Utility (Introducing Statistical Methods) . His first chapter explains the key issues in 12 pages with more varieties of arguments and more intellectually stimulating details than what Ziliak and McClosky attempted in 251 pages.
I give 3 stars for this book's good intent but average quality, and, on top of that, took 1 star off for its mean-spirited rhetorics. 44 of 51 people found the following review helpful
Format: PaperbackVerified Purchase
This book shows how many scientific disciplines rely way too much on the concept of statistical significance. I have read the book and I find it convincing. The authors show how the focus on statistical significance has taken away attention for 'real' significance. In other words: the focus on statistical significance often means that researchers fail to ask whether their findings matter. In statistics, a result is called statistically significant if it is unlikely to have occurred by chance. So testing for statistical significance is asking the question how likely it is that an effect exists. It tries to answer that question by looking at how precisely the effect can be measured. It does not answer at all how strong and important this effect is. And this latter question about the effect size is much more important from a scientific and a practical perspective. Statistical significance does not imply an effect is important, lack of statistical significance does not mean an effect is not important. You may ask: how can an effect be important that is not statistically significant? The answer to your question has to do with HOW a statistical significance test tries to answer the question of whether an effects does or not exist, which is by looking at HOW PRECISELY the (presumed) effect can be measured. There are circumstances in which an effect is important yet can not be measured precisely. This would be the case when there is a lot of variability in the effect. When an effect is strong YET highly variable (for instance ranging between 30 and 70), statistical significance tests say the effect cannot be measured precisely which can lead to the conclusion: not statistically significant. At the same time, a weaker effect with lower variability (for instance ranging between 4 and 5) could be measured more precisely, which might lead to the conclusion 'statistically significant'.
Mind you, the book is NOT a plea against quantitative research nor statistical analysis. On the contrary. It is a plea for doing it and doing it right by bringing back focus on effect sizes in social science.
16 of 16 people found the following review helpful
Format: PaperbackVerified Purchase
Every paragraph in this book is filled with simmering outrage, and every point is made at least twenty times. The main text is 250 pages long; 25 pages would have been much better.
The thesis is interesting (and I suppose it might even be important and valuable). But the writing style is so unbearable that I cannot give this book more than 2 stars.
25 of 29 people found the following review helpful
Format: PaperbackVerified Purchase
The authors do an admirable job of exposing an important issue, but this work only identifies the problem for you, and offers no solution. It seems to go on too long and eventually become s platform for the authors to grip about the injustices that have been served on them in their career. As we have all been indoctrinated into the "cult of significance" through the education system, it would have been nice for the authors to show us how we could do better. Many times we are asked to work statistics on numbers from disciplines on which we have very little knowledge and experience, therefore all we can offer is statistical significance - not material significance, and hope that the people we are working with understand the difference -Most do not and are not prepared to bridge the gap. If there are alternative techniques and methods I am none the wiser, maybe that's my problem.
11 of 12 people found the following review helpful
Format: Paperback
This book provides strong arguments that scientists often use tests of statistical significance as a ritual that substitutes for thought about how hypotheses should be tested.
Some of the practices they criticize are clearly foolish, such as treating data which fall slightly short of providing statistically significant evidence for a hypothesis as reason for concluding the hypothesis is false. But for other practices they attack, it's unclear whether we can expect scientists to be reasonable enough to do better.
Much of the book is a history of how this situation arose. That might be valuable if it provided insights into what rules could have prevented the problems, but it is mainly devoted to identifying heroes and villains. It seems strange that economists would pay so little attention to incentives that might be responsible.
Instead of blaming the problems primarily on one influential man (R.A. Fisher), I'd suggest asking what distinguishes the areas of science where the problems are common from those where it is largely absent. It appears that the problems are worst in areas where acquiring additional data is hard and where powerful interest groups might benefit from false conclusions. Which leads me to wonder whether scientists are reacting to a risk that they'll be perceived as agents of drug companies, political parties, etc.
The book sometimes mentions anti-commercial attitudes among the villains, but fails to ask whether that might be a symptom of a desire for "pure" science that is divorced from real world interests. Such a desire might cause many of the beliefs that the authors are fighting.
The book does not adequately address concerns that if scientists in those fields abandon easily applied rules, scientists are sufficiently vulnerable to corruption that we'd end up with less accurate conclusions.
The authors claim the problems have been getting worse, and show some measures by which that seems true. But I suspect their measures fail to capture some improvement that has been happening as the increasing pressure to follow the ritual has caused papers that would previously have been purely qualitative to use quantitative tests that reject the worst ideas.
The book seems somewhat sloppy in its analysis of specific examples. When interpreting data from a study where scientists decided there was no effect because the evidence fell somewhat short of statistical significance, it claims the data show "St. John's-wort is on average twice as helpful as the placebo". But the data would provide evidence for that only if there were data showing that the remission rate with no treatment was zero. It's likely that some or all of the alleged placebo effect was due to effects that are unrelated to treatment. And their use of the word "show" suggests stronger evidence than is provided by the data.
I'll close with two quotes that I liked from the book:
"The goal of an empirical economist should not be to determine the truthfulness of a model but rather the domain of its usefulness" - Edward E. Leamer
"The probability that an experimental design will be replicated becomes very small once such an experiment appears in print." - Thomas D. Sterling
10 of 12 people found the following review helpful
ByDIVon December 15, 2010 Format: Paperback
The book touches on an important issue. I say "touches", because -- despite its considerable length -- it doesn't go into a lot of mathematical depth. Worse, the 'plain language' explanations are not as easy to follow as they should be. But worst of all the writing style is so absolutely dreadful that it is the only book that I have ever come across that I wanted to finish due to the merits of the content, but couldn't bear to get beyond about page 70 due to the horrible writing style.
Obviously that is my personal reaction to the book. Perhaps others will love it. I want to clarify that the language is technically (practically) perfect. On matters of grammar or spelling, or even graphic design I have no issue. The problem for me was the 'creative' element. It is partly the short sentences. Partly the over-the-top bluster (injection of emotion). Partly the lengthy circling around a topic without adding much information. And a lot is due to aspects I can't quite put my finger on.
Suffice it to say that the best use of the book is as a source of references to other, better-written books and articles. But for that you could browse
[...] for free, which is almost as up-to-date.
--DIV
3 of 3 people found the following review helpful
Format: Paperback
Suppose I told you I had found irrefutable scientific evidence that a CEO's golf handicap affects his or her company's stock performance. Better golfers imply higher stock returns, I'd say. As evidence I would produce a huge regression model, covering thousands of companies, where golf handicap as explanation of differing stock returns was found to be statistically significant at the 1 percent level. Now promise me you will never trust any such mock science.
Stephen Ziliak and Deirdre McCloskey, two economics professors with a keen sense of the right and wrong uses of statistics, have written a truly eye-opening book. Without resorting to equations, they show that the practice known as hypothesis testing or tests of statistical significance is utterly flawed. Unfortunately, most scientists are unaware of this fact and scientific journals of the highest standing continue using it anyway.
Let it be said at once, Ziliak and McCloskey's book is not an enjoyable read. Not only for its uncomfortable implications but just as much for its poor editing. On the good side the prose is sometimes beautiful and often funny. But its points are rarely well argued, the quotations and examples are too many and too lengthy and the endless repetition of its central polemic soon becomes tedious. However, I am willing to accept these flaws as the book is too important to be written off on the grounds of style.
The authors take a searing look at the practice of labelling factors either statistically significant or insignificant. The method is often associated with the statistician Ronald A. Fisher. But the authors show how Fisher virtually abducted the thinking from William Sealy Gosset, the head brewer of Guinness brewery, but corrupted Gosset's original ideas and presented them in a simplistic and flawed version. Gosset is the man behind the famous pseudonym Student which has given name to Student's t.
The biggest flaw in significance testing is to ask only whether there is a relation, not the magnitude of the assumed relation. Some factors could come out statistically significant while having negligible impact (like the golf handicap?), while others could have large impact while never achieving statistical significance. The outcome is to some extent in the hands of the researcher. Choose a big enough sample size and almost anything becomes statistically significant. Conversely, a small enough sample size makes nothing significant. The risk of manipulation by researchers is obvious as they can pick and choose between significance and insignificance by choice of sample size.
Hypothesis testing is even asking the wrong question. It asks what the likelihood would be of observing the data you have collected, assuming the null hypothesis is true. But the more relevant question concerns the likelihood of the hypothesis being true, given the observed data. This is equivalent to confusing the probability of a person being dead given that he was hanged, with the probability of him being hanged given that he is dead. Quite a difference, I would say.
The chilling implication of Ziliak's and McCloskey's book is that it casts doubt on almost everything we regard as knowledge in society. If papers published in eminent journals like the American Economic Review make such elementary errors, how are we to trust any scientific findings at all? Let alone any of the causes and effects communicated to us by less rigorous media?
This is a review by investingbythebooks.com
3 of 3 people found the following review helpful
Format: PaperbackVerified Purchase
For me this was a matter of life and death.
My cholesterol numbers were bad, and though I told the doc I wouldn't take a statin, he looked at the chart and clucked, "with LDL this high you're at risk for pancreatitis. You'd better get those numbers down." And he wrote a prescription.
I filled it, and as usual I got the dozen pages of onionskin paper with the pharmacological details. I decided to read them, and I was flabbergasted. Translated into English, this is what it said:
We have done a Big Study, oh yes we have, and we have Numbers: look at them!
And we have analyzed those numbers and we have Conclusions. And we used
Statistics, so you know we must be right.
The first thing you should know is that people who took our drug died at a higher
rate than people who didn't. 11% higher approximately. And if this were a court,
we'd have to say that the drug is guilty on the preponderance of the evidence,
because we figure the odds are about 4 to 1 that the drug did it. But our Statistics
tells us to ignore anything that is not beyond reasonable doubt, and 4-to-1 doesn't
make it, so we think you should ignore the fact that the drug does more harm than good.
What you should focus on is that it lowers your LDL. We showed that beyond reasonable
doubt - the odds are at least 22 to 1. And low LDL is good. So our advice is, take it.
That is what it said.
The question for us all is, How did we come to this? How can the scientific hierarchy, from the FDA down to kindly Doctor Brown, think that it is anything but crazy to take this drug? Am I looking to have "his LDL was low" on my tombstone? Is there no judgement that would say that costing lives is very bad, and lowering LDL is of no value of itself if it doesn't save lives?
The message of the book is that things are every bit as bad as you might fear.
The authors show how the pursuit of science has been shunted off into a search for "statistical significance" which has nothing whatever to do with scientific significance or importance. They give a pretty good explanation for how things got so messed up.
This book is of the highest possible importance for anyone who uses or teaches statistical inference. But anyone who knows a little statistics should definitely read the book, and anyone who knows a little math might enjoy it.
For it is engagingly written. Think of it, a book on statistical significance! Could anything be more dull? But the book has the pace of a potboiler. There are witty jokes, haikus, appeals to outrage and to laughter. There is a hero (William Gosset, aka Student) and a villain (the evil Sir Ronald Fisher). There are fables and parables.
Most importantly, there is the truth: that science based solely on rejecting the null hypothesis is sterile, unconvincing even to its practitioners, and extremely costly, both in money and in lost time and lives. You will weep when you see how thoroughly the sciences that use statistics (such economics, psychology, sociology, medicine) have come under the grip of a Statistical-Academic Complex that persists in significance tests because significance tests get you promoted, never mind the real scientific value.
The problem is real. Mistakes are being made daily because of sloppy statistics. And remember that drug that the doc wanted me to take to ward off pancreatitis? In the fine print the study showed that the drug didn't help people with pancreatitis either. Ba-dum.