I realize that Steve blogged about this study earlier in the week, but since I also commented on this particular study as my not-so-super-secret alter ego, I figured it rated a place on SBM as well. I emphasized different aspects of the study and tried to quantify exactly why, under even the most charitable interpretation of the study possible, the effects are not clinically significant. Besides, if the level of comments and e-mails is any indication, there is sufficient interest in this particular study to rate a second post.
Not suprisingly, this study is about about acupuncture. Well, it’s not exactly a study, it’s a meta-analysis that aggregates a whole lot of acupuncture studies in which this most popular of woos is administered to patients with chronic pain from a variety of causes. It’s also being promoted all over the place with painfully credulous headlines like:
- (What a horrible headline; the very definition of “helping” is to do better than placebo. This was an AP story. Where’s when you need her?
- (Uh, no. Not quite.)
- (Et tu, Medscape?)
In deference to that last article, I was half-tempted to call this post Quackery: Acupuncture does not relieve pain. Then there were news reports like this:
And, of course, on accompanying the above news segment there was a story like describing a patient with chronic pain:
In January 2009 she was referred to Dr. Jun Mao, a licensed physician and acupuncturist at the University of Pennsylvania.
“The only thing I had not tried was acupuncture,” Zierler said.
Now, a new review of research suggests that this ancient technique may truly hold benefits for those suffering from certain forms of chronic pain.
In a review of 29 previous well-designed studies, which together looked at almost 18,000 patients, researchers at Memorial Sloan-Kettering Cancer Center found that acupuncture does, indeed, work for treating four chronic pain conditions: back and neck pain, osteoarthritis, chronic headache and shoulder pain.
Even “placebo” acupuncture, where the practitioner only pretends to place the needle or places the needle in a random site, is effective at relieving pain, though true acupuncture works better.
And so was born the propaganda line for this particular study, namely that it’s huge; that it is the most compelling evidence thus far that acupuncture “works”; that all that stuff about “sham acupuncture” being as good as “real acupuncture” isn’t true. But is this study strong evidence of any of this? Let’s go to the tape, as I like to say.
The study itself is from a group called the Acupuncture Trialists’ Collaboration. I don’t know about you, but the very existence of something called the Acupuncture Trialists’ Collaboration is disturbing to me. Be that as it may, the study is Vickers et al, . It was just published online in the Archives of Internal Medicine.
My first inclination when reading this was to apply a dictum to it that applies to all meta-analyses, no matter what the research question is; GIGO, Garbage In, Garbage Out, just like a . On the other hand, on the surface, this meta-analysis looks like it’s a big deal. It uses patient level data instead of aggregated data, which allows for a better meta-analysis in most cases. It tries to restrict its included studies to those with the highest methodological quality (although that doesn’t completely inoculate it from the GIGO label, as you will see).
So what the authors did was to search MEDLINE, ClinicalTrials.gov, and the Cochrane Collaboration Central Register of Controlled Trials for studies testing acupuncture against chronic pain. They then winnowed the pile of studies they found using several criteria:
Randomized controlled trials were eligible for analysis if they included at least 1 group receiving acupuncture needling and 1 group receiving either sham (placebo) acupuncture or no-acupuncture control. The RCTs must have accrued patients with 1 of 4 indications—nonspecific back or neck pain, shoulder pain, chronic headache, or osteoarthritis—with the additional criterion that the current episode of pain must be of at least 4 weeks duration for musculoskeletal disorders. There was no restriction on the type of outcomemeasure, although we specified that the primary end point must be measured more than 4 weeks after the initial acupuncture treatment.
Do you see a problem yet? I do. It is not required that all studies included have a sham placebo group. That means some studies were acupuncture versus no acupuncture controls, the latter of which could include groups that got anywhere from nothing to regular care. That’s just one problem that I see, because mixing studies that compare acupuncture to no treatment, to sham treatment, or to sham treatment and no treatment are comparing apples and oranges in a way. Pooling such studies is inherently problematic.
There are other problems, but let’s first discuss what the study showed. First, Vickers et al reported that patients who underwent acupuncture had less pain. That’s true. However, I find it very hard to be impressed by these results. Indeed, they were most…underwhelming. Basically, the study reported that “real” acupuncture resulted in pain scores that were 0.23, 0.16, and 0.15 standard deviations lower than sham controls and 0.55, 0.57, and 0.42 standard deviations lower than no-acupuncture controls for back and neck pain, osteoarthritis, and chronic headaches, respectively. What does this mean? The authors themselves try to put it into context:
To give an example of what these effect sizes mean in real terms, a baseline pain score on a 0 to 100 scale for a typical RCT might be 60. Given a standard deviation of 25, follow- up scores might be 43 in a no acupuncture group, 35 in a sham acupuncture group, and 30 in patients receiving true acupuncture. If response were defined in terms of a pain reduction of 50% or more, response rates would be approximately 30%, 42.5%, and 50%, respectively.
One notes that Vickers et al have chosen a rather dramatic example, with large numbers. For patients with chronic pain, it’s uncommon to have a 50% reduction in pain scores, and the standard deviation they chose was rather large. By their own argument, even if there weren’t any methodological issues with the meta-analysis and their conclusions were completely justified, Vickers et al have just unwittingly made the argument that the effect of acupuncture might be statistically significantly greater than placebo effects but that it’s almost certainly not clinically significant. What Vickers et al are arguing is that a change of 5 on a 0-100 pain scale (which would be a change of 0.5 on a 0-10 pain scale), a subjective scale, is noticeable by patients. It’s probably not. There is a concept referred to as “minimally clinically important difference” (MCID) as “the smallest difference in score in the domain of interest which patients perceive as beneficial and which would mandate…a change in the patient’s management.” A looking at minimal detectable and clinically relevant changes in pain scores in arthritis found a range in absolute terms between 6.8 and 19.9. Tubach et al assessed only the improvement aspect of the MCID and defined the minimal clinically important improvement (MCII) as the minimum improvement in the pain score reported by 75% of osteoarthritis patients ranking their response as “good” and reported that the MCII was -15.3 for hip osteoarthritis and -19.9 for knee osteoarthritis.
Here’s a hint: -5 (the difference between sham acupuncture and “real” acupuncture) is not clinically significant. The only way you can even approach clinical significance is to compare no-acupuncture controls versus acupuncture, in which case you’re adding placebo effects into any other effect observed, even if that effect is real (which I highly doubt it to be). Indeed, Vickers et al labor mightily to try to convince readers that this tiny effect, if it exists, is not just statistically significant, but clinically significant. They doth protest too much, methinks. In fact, I very much like how the grand master of the scientific analysis of “complementary and alternative medicine” (CAM), Edzard Ernst, put it:
Edzard Ernst, emeritus professor of complementary medicine at the University of Exeter, said the study “impressively and clearly” showed that the effects of acupuncture were mostly due to placebo. “The differences between the results obtained with real and sham acupuncture are small and not clinically relevant. Crucially, they are probably due to residual bias in these studies. Several investigations have shown that the verbal or non-verbal communication between the patient and the therapist is more important than the actual needling. If such factors would be accounted for, the effect of acupuncture on chronic pain might disappear completely.”
Which brings me to another major problem with this meta-analysis. It’s one that I noticed and one that Ernst also comments on. None of the studies included that I perused were double blind, which means that there was the potential for observational bias to creep into the study. While I concede that the authors did a pretty good job of making sure that studies in which there was a possibility of what is known in the biz as ; i.e., failure to protect the randomization process to guarantee that the treatment to be allocated is not known before a subject is enrolled in the study (in studies with subjective outcomes, like pain, unclear allocation concealment is ), no attempt was made that I could identify to make sure included trials were double blind. In studies of subjective outcomes, blinding is almost certainly as important or more important than allocation concealment, and double blinding is essential. As Ernst put it so well, a trial is “either both patient and therapist-blind, or not blind at all.” The investigators appear to have only assessed the selected studies for whether patient blinding was adequate, looking for descriptions of questionnaires in which patients are asked to guess which group they were assigned to. Without double blinding, it’s hard to call any of these trials included in this meta-analysis “high quality.” And, yes, it is possible to double blind acupuncture studies, as much as acupuncture fans try to argue otherwise.
Finally, there’s the issue of heterogeneity in the trials. The authors report a lot of heterogeneity for most of the analyses that were performed but gave one of the sketchiest descriptions of how they actually calculated that heterogeneity that I’ve ever seen in a meta-analysis anywhere. One wonders what the reviewers were thinking. For supposedly ascribing to the for high quality meta-analyses, which specifies the calculation of a statistic (I2) for describing the heterogeneity of each meta-analysis comparison that is done, the authors don’t live up to its principles in at least this one respect. They don’t report that statistic. That strikes me as more sloppy than anything else, given that the authors concede considerable heterogeneity in their studies, making combining them problematic.
Finally, there’s the issue of publication bias. Publication bias, as most of my readers probably know, is the tendency for positive studies to be more likely to be published than negative studies. That’s because scientists don’t like publishing negative studies (they seem like “failures”) and journals don’t like publishing them either (because editors don’t consider them very interesting). That’s why, it’s essential that a meta-analysis include an analysis looking for publication bias. One very common way of doing this is a funnel plot. Yet there is no funnel plot included that I could find (I couldn’t get access to the supplemental material because I had to have someone e-mail the study to me and forgot to ask). Instead, they talk about looking at effect sizes in small studies and large studies and then calculate that “only if there were 47 unpublished RCTs with n = 100 patients showing an advantage to sham of 0.25SD would the difference between acupuncture and sham lose significance.” How they calculated this number is not described. I must say, I’ve never seen this sort of analysis in a meta-analysis before, which is why it stuck out like the proverbial sore thumb, as did the lack of a description of how this estimate was calculated. Modeling? Why 47 unpublished RCTs of 100 subjects and not a smaller number of larger RCTs? The whole thing looks like a number the authors pulled out of their nether regions and then plugged into their meta-analysis software in order to see if it would affect anything. In fact, I have a sneaking suspicion that they probably tried a lot of combinations in order to find the one that would make it look as though it would take a whole boatload of studies going the other way to eliminate the statistical significance of their results. Is that unfair to say so? Well, the authors have no one to blame but themselves, and if I missed the description of how that was calculated I’ll take my lumps.
In the end, I am less than impressed by this study, and it doesn’t surprise me at all that it was funded by the National Center for Complementary and Alternative Medicine (NCCAM) and the .
In fact, I’m pretty much unimpressed at the whole study, although no doubt it will be touted by acupuncturists for years to come as “proof” that acupuncture really and truly works and isn’t just placebo medicine. It doesn’t, and it is. In fact, the study strongly suggests that any effect of acupuncture observed is almost certainly due to nonspecific and placebo effects and that the “positive” result is, as Ernst describes, likely due to small residual biases. Even if we concede that there might be the small effect of “true” acupuncture reported by Vickers et al, it is almost certainly a finding that is statistically significant but clinically insignificant or, as I like to put it because I like baseball analogies, a really long run for a really short slide. As they say, . ?