PRELUDE: THE PROBLEM WITH SCREENING
If there’s one aspect of science-based medicine (SBM) that makes it hard, particularly for practitioners, it’s SBM’s continual requirement that we adjust what we do based on new information from science and clinical trials. It’s not easy for patients, either. To lay people, SBM’s greatest strength, its continual improvement and evolution as new evidence becomes available, can appear to be inconsistency, and that seeming inconsistency is all too often an opening for quackery. Even when there isn’t an opening for quackery, it can cause a lot of confusion; some physicians are often resistant to changing their practice. It’s not for nothing that there’s an old joke in medical circles that no outdated medical practice completely dies until a new generation of physicians comes up through the ranks and the older physicians who believe in the practice either retire or die. There’s some truth in that. As I’ve said before, SBM is messy. In particular, the process of applying new science as the data become available to a problem that’s already as complicated as screening asymptomatic people for a disease in order to intervene earlier and, hopefully, save lives can be fraught with confusion and difficulties.
Certainly one of the most contentious issues in medicine over the last few years has been the issue of screening for various cancers. The main cancers that we most commonly subject populations to routine mass screening for include prostate, colon, cervical, and breast cancer. Because I’m a breast cancer surgeon, I most frequently have to deal with breast cancer screening, which means, in essence, screening with mammography. The reason is that mammography is inexpensive, well-tested, and, in general, very effective.
Or so we thought. Last week, yet another piece of evidence to muddle the picture was published in the (NEJM) and hit the news media in outlets such as the New York Times ().
Before I discuss the study, let’s look at the background. As I’ve written about over the last couple of years, evidence has been accumulating that is muddying the picture regarding the benefits of screening mammography, So let’s be clear on what we are discussing here: screening mammography is different from diagnostic mammography in that it is performed at regular intervals in asymptomatic women in order to detect cancer at an earlier stage and thereby allow earlier intervention, resulting in the saving of more lives than if we waited until breast cancer produces symptoms (such as a lump) that lead to diagnosis. If a woman feels a lump or some change in her breast and undergoes mammography, that is not screening. In that case, mammography is being done for diagnostic purposes. We are not discussing diagnostic mammography. We are discussing screening mammography. I can’t emphasize that distinction enough.
What we’re discovering is not that screening mammography is ineffective, but rather that it is probably not as effective as advertised in preventing death from breast cancer, which, let’s face it, is the primary reason we subject women over age 40 to mammographic screening. The reason is that phenomena such as overdiagnosis and overtreatment, coupled with lead time and length time bias, conspire to confound what conceptually is very simple but in practice is very complex indeed, catching breast cancer at an earlier stage and thereby saving lives. Overdiagnosis, for instance, is the diagnosis of cancer that, for whatever biological reason, would never threaten the life of the patient because it either progresses so slowly that the patient dies of natural causes before it ever reaches the point of endangering the patient’s life, never progresses at all, or possibly even spontaneously regresses. Because we do not yet have reliable methods to distinguish indolent tumors from those that will grow and metastasize, we as cancer doctors have the moral obligation to treat all tumors discovered by screening as though they could potentially kill the patient. These treatments are often not benign, and can include surgery, radiation, and even chemotherapy. Overdiagnosis leads to overtreatment, and overtreatment is not a benign thing either. Unfortunately, until recently we haven’t always taken into account the potential harm from overtreatment into account or the biology of the various cancers and the very nature of screening itself, which preferentially detects more indolent disease.
As a result of the accumulation of evidence suggesting less benefit from mass screening programs than we had hoped and more potential harm than we had feared, the oncology world has been rethinking screening for cancer, in particular for prostate and breast cancer. Clearly, the most problematic cancer to screen for has been prostate cancer, because the common blood test used to detect it, prostate-specific antigen (PSA) is fraught with false positives, leading to morbid surgery (prostatectomy) or somewhat less morbid radiation therapy to treat early lesions that probably would never develop into life-threatening cancer. After all, autopsy series have shown that approximately 75% of men over the age of 80 have small foci of cancer in their prostate glands, but nowhere near 75% of men die of prostate cancer. In other words, more men die with prostate cancer than from it, and most are asymptomatic. That is why the American Cancer Society no longer recommends routine PSA screening.
Screening for breast cancer is less problematic, because mammography has a lower incidence of false positives, but, as we have been discovering, it’s still problematic. As many as one in three breast cancers may be overdiagnosed by mammography; as many as one in five mammographically-detected breast cancers in asymptomatic women might spontaneously regress. As a result of accumulating evidence, last fall the United States Preventative Services Task Force (UPSTF) revised its recommendations for screening mammography to recommend that it begin at age 50 rather than age 40. The resulting kerfuffle led to emergency meetings at various cancer centers regarding how to reassure women. From my perspective, it was depressing how much we seemed to concentrate on “damage” control and protecting the current recommendations, rather than explaining the new recommendations.
Enter this new Norwegian study, hot off the presses.
DECREASED BREAST CANCER MORTALITY: DUE TO SCREENING OR BETTER TREATMENT?
I first became aware of the new study from a Google News Alert that led me to a NYT story by Gina Kolata entitled :
A new study suggests that increased awareness and improved treatments rather than mammograms are the main force in reducing the breast cancer death rate.
Starting in their 40s or 50s, most women in this country faithfully get a mammogram every year, as recommended by health officials. But the study suggests that the decision about whether to have the screening test may now be a close call.
The study, medical experts say, is the first to assess the benefit of mammography in the context of the modern era of breast cancer treatment. While it is unlikely to settle the debate over mammograms — and experts continue to disagree about the value of the test — it indicates that improved treatments with hormonal therapy and other targeted drugs may have, in a way, washed out most of mammography’s benefits by making it less important to find cancers when they are too small to feel.
Previous studies of mammograms, done decades ago, found they reduced the breast cancer death rate by 15 to 25 percent, a meaningful amount. But that was when treatment was much less effective.
It should come as no surprise to regular readers that the commonly accepted estimate of how much mammographic screening of a population reduces the death rate from breast cancer is around 20-25%. These numbers are based on randomized clinical trials, most of which were carried out more than 20 years ago. Based on these trials, many countries instituted mass screening programs using mammography. Because it would violate clinical equipoise, given that it is generally accepted that screening mammography decreases deaths from breast cancer, there will likely never be another randomized, controlled clinical trial of mammography, even though technology has progressed and we have better treatments that may affect the outcome. That leaves observational studies, a less rigorous form of evidence, to investigate the issue. However, it is still quite possible to obtain useful data from such studies, and that is what this Norwegian group did when they examined the effect on breast cancer mortality of introducing mammographic screening programs. The results, , provide an unexpected, even startling, answer.
This study, Kalager et al, was performed in a very clever manner. To understand how it was carried out, it’s necessary to know a bit about Norway first. Norway is a nation of 4.8 million people that, because it has a centralized public health care system, has records far more complete and centralized than anything we have in the United States. In brief, this is a study that could never have been done in the US:
Norway, with a total population of 4.8 million, has a public health care system. Patients generally receive treatment in their county of residence, and there is no private primary care for breast cancer.8 The nationwide Cancer Registry of Norway is close to 100% complete.9,10 Patients are identified in the registry by their individually unique national registration number, which includes the date of birth. The registry runs the Breast Cancer Screening Program, which began as a pilot project in 4 of the 19 Norwegian counties in 1996. Two years later, the government decided to expand the program, and over a period of 9 years, the remaining 15 counties were enrolled in a staggered fashion11 The rollout of the program followed no specific geographic pattern. Since 2005, all women in the country between the ages of 50 and 69 years have been invited to participate in screening mammography every 2 years.
Before enrollment in the program, each county was required to establish multidisciplinary breast-cancer management teams and breast units.12 As a result, breast-cancer management became centralized for all residents within each county, and dedicated teams of radiologists, radiologic technologists, pathologists, surgeons, oncologists, and nurses managed the care of all patients, regardless of age.
The study followed, in essence, a staggered cohort design. The authors compared rates of breast cancer deaths based on incidence in four groups: one group of women who during the years of the rollout of the mammography screening program (1996 to 2005) were living in counties with screening (the screening group); one group of women who were living in counties without screening during that time (the nonscreening group); and two groups of historical controls who from 1986 to 1995 mirrored the screening and nonscreening group. They then analyzed data from 40,057 women with breast cancer during that period of time. In order to try to isolate the effect of the breast cancer screening program, they calculated mortality in the screening group including only deaths from breast cancer in women who had received the diagnosis after the screening program was implemented. This avoids inclusion of breast cancer deaths that occurred after the implementation of the screening program that were actually diagnoses that were made before the screening program.
In addition, the authors divided up Norway’s 19 counties into six regions, chosen for having entered the screening program at approximately the same time. Death rates were then compared separately for each region, allowing similar followup times for each region. In addition, this strategy allowed for the comparison of trends in mortality from breast cancer over time. They then performed these analyses:
First, we compared women in the nonscreening group with their historical counterparts to determine the temporal change in mortality that was not attributable to the introduction of the screening program and that was likely to reflect improved treatment and earlier clinical diagnosis. Then, we compared women in the screening group with their historical counterparts to determine the change in mortality after implementation of the screening program. In this second comparison, the difference in the rate of death between the two groups can be attributed both to the screening program and to temporal trends in mortality that were unrelated to the screening program. Thus, the reduction in mortality that was related to the screening program was the difference between the rate ratio for death among women in the screening group as compared with their historical counterparts and the rate ratio for death among women in the nonscreening group as compared with their historical counterparts.
To boil it down, using this method, the authors could come up with two numbers for improvement in breast cancer mortality over time, an improvement not attributable to screening, which is therefore attributable to better treatments, and a number that is attributable both to screening and better treatments. The result is a graph (click to enlarge):
As can be seen, the estimate for how much of the improvement in mortality due to breast cancer between the two time periods in Norway is due to mammography is attributable to screening mammography is approximately one third of the overall improvement (10% out of an overall improvement of 28%). Moreover, statistically, there is enough uncertainty in this estimate that it could be as little as 2% of the decrease in breast cancer mortality that is due to mammographic screening. Most of this improvement in survival was observed in women with stage II tumors:
Among women between the ages of 50 and 69 years in the screening group, those with stage I tumors had a relative reduction in mortality of 16%, as compared with their historical counterparts (rate ratio, 0.84; 95% CI, 0.63 to 1.11); among women in the nonscreening group, the corresponding reduction was 13% (rate ratio, 0.87; 95% CI, 0.62 to 1.23). Among women with stage II tumors, those in the screening group had a marked 29% reduction in mortality, as compared with their historical counterparts (rate ratio, 0.71; 95% CI, 0.58 to 0.86); among women in the nonscreening group, the reduction was 7% (rate ratio, 0.93; 95% CI, 0.76 to 1.12).
Among women with more advanced tumors (stage III and stage IV), there was no effect attributable to screening. Although there was still a 30% reduction in mortality in this group, none of it could be attributed to screening. Intuitively, this makes sense; stage III and stage IV tumors are generally detected clinically because of symptoms, not in a screening program. It also intuitively makes sense that the death rate from stage I tumors would be less affected than that from stage II tumors because (1) stage I tumors would be much more likely to be overdiagnosed by screening and (2) the followup time in this study is relatively short (8.9 years for the group with the longest followup), which may not be sufficient time for the full benefit of screening to show for earlier stage tumors; and (3) stage II tumors can still be detected mammographically in asymptomatic women but, because they are more advanced, have already shown themselves to be potentially deadly. It should also be conceded that, because of improvements in imaging and in detection of lymph node metastases (such as sentinel lymph node biopsy), this result might also be partially due to selective stage migration among those who undergo screening. If that were the case, part of the improvement in survival among those with stage II disease could be apparent rather than real.
Being retrospective, this study is, of course, not bullet-proof. As the authors themselves concede, the maximum followup time of only 8.9 years may be too short for a full benefit to have been seen. In addition, because the screening program was rolled out gradually in the counties, diagnoses were made more recently in the screening group, which may actually overestimate the benefit associated with the screening program. Finally, it’s possible that, because the multidisciplinary breast cancer teams were established before the screening program was rolled out, it’s possible that some of the women in the nonscreening group might have undergone mammography associated with this, thus “contaminating” the nonscreened group and lowering the apparent benefit of screening. The authors offer several reasons why they think this latter problem is unlikely to have been significant, including how limited access to mammography was before screening programs were implemented, the fact that there was no financial incentive to providers to order mammography, and because, as expected, the implementation of the screening program in the various counties resulted in a substantial increase in the number of diagnoses of breast cancer, with no similar trend in counties that had not yet joined the screening program.
Overall, because most of the more recent observational studies of mammographic screening use historical controls without an attempt to control for the confounding variable of temporal downward trends in breast cancer mortality, the authors conclude that the benefit of screening in terms of decreasing a woman’s odds of dying from breast cancer is smaller than previously estimated. In aggregate, they estimate that, in Norway at least, approximately one-third of the decrease in breast cancer mortality is due to screening, and two thirds to other factors, which (we hope) includes better treatment.
MAMMOGRAPHY: A LONG RUN FOR A SHORT SLIDE?
Not surprisingly, there was an accompanying editorial. I was, however, rather surprised at who was chosen to write the editorial, namely Dr. Gilbert Welch, a long-time critic of screening mammography, who entitled his article Personally, I was expecting that any accompanying editorial would be written by an advocate of mammographic screening. The crux of his argument is in this paragraph:
The juxtaposition of such a charged medical debate in the face of such an exhaustive scientific investigation is in itself instructive. For context, one trial involving fewer than 150 men who were followed for less than 2 years was sufficient to convince physicians of the value of treating severe hypertension.1 That physicians are still debating the relative merits of screening mammography despite the wealth of data suggests that the test is surely a close call, a delicate balance between modest benefit and modest harm.
Dr. Welch is referring to , which found a marked difference in morbidity and mortality over a relatively short followup period in men with diastolic blood pressures ranging from 115 through 129 mm Hg. The results were so clear-cut that after that study no one could argue against treating someone with a diastolic blood pressure that high. Personally, I viewed this comparison as a bit of a cheap shot by Dr. Welch in that the study he chose looked at men with a severe health condition known to predispose to stroke, myocardial infarction, and other complications. He compared this clinical situation to the screening of an asymptomatic population, where it’s known going in that it will be harder to show benefits, requiring a lot of patients and a lot of followup. That aside, though, Dr. Welch is correct that more recent evidence suggests that the benefits of mammography are more modest than we have traditionally been taught.
That brings us to the question of why this study found results different than the results cited by the USPSTF when making their recommendations. Dr. Welch offers two possibilities. First, this study could be wrong. That is, of course, always possible given that it is not a prospective randomized trial. However, situations change with time, and it is no longer possible to do a randomized trial to determine whether mammography saves lives. Clinical equipoise again. So all we are left with are studies like this Norwegian study, where investigators do the best they can to reduce confounders. Looking at the design of the study, I tend to agree with Dr. Welch that it is unlikely that these confounders would be enough to account for the difference between the much higher reductions in breast cancer mortality observed in early randomized studies (the ones cited by the USPSTF) and Kalager et al found. Most likely, Kalager et al is correct, or at least not too far off. That leaves several questions.
First and foremost, is a 10% reduction in mortality from breast cancer adequate to justify mass screening programs? Whenever I am asked a question like this (for example, about Avastin), my tendency is to respond that this is not a scientific question. It is a moral question that asks us to make a value judgment as a society. Think of it this way. It is estimated that approximately 40,000 women will die of breast cancer in 2010. If Kalager et al is correct and its results are applicable to the U.S., that means that, without screening, approximately 44,000 women would die of breast cancer. What are 4,000 lives worth? I can’t answer that question. Dr. Welch frames it a bit differently, to make the benefit seem even more modest:
If we assume that mammography screening is associated with a 10% reduction in the rate of death from breast cancer (making the optimistic assumption that all the benefit comes from screening mammograms), the 10-year risk of breast-cancer death for a 50-year-old woman in the United States is now about 4 per 1000 women.6 If we assume that this risk already incorporates the benefit of screening mammography, the risk estimate without mammography would be about 4.4 per 1000 women.
Benefits that look very modest when looked at as a risk per 1000, however, can produce fairly large absolute numbers when applied to millions of women. I tend to view Kalager et al’s results as the lower bound of estimates for the benefits of screening mammography. It wouldn’t surprise me if the true benefits are higher. Remember those 4,000 women, who are all someone’s mothers, wives, and/or daughters. However, so are the women potentially harmed:
Because we are all subject to framing effects, it is important to consider the reverse frame. The number of women who will not die from breast cancer rises from 995.6 to 996 per 1000 women with the addition of screening mammography. Although readers may each respond differently to these frames, both reflect the same absolute benefit: 0.4 per 1000 women. In other words, 2500 women would need to be screened over a 10-year period for 1 to avoid death from breast cancer.
What happens to the other 2499 women who had to undergo screening to achieve this benefit is also relevant. Estimates of harm vary considerably. In the United States, more than 1000 women would be expected to have at least one false positive result,7 a number that would be considerably lower in Europe.8 Less frequent but more worrisome is the problem of overdiagnosis. Somewhere between 5 and 15 women would be expected to be needlessly treated for a condition that was never going to bother them, with all the accompanying harms.9,10
Another estimate that I discussed came from an article last year by Laura Esserman, who estimated that to avert one death from breast cancer with mammographic screening for women between the ages of 50-70, 838 women need to be screened over 6 years for a total of 5,866 screening visits, to detect 18 invasive cancers and 6 instances of DCIS. That’s roughly three times the benefit estimated by Kalager et al.
Whatever the true benefit of mammographic screening, how we balance the potential benefits of mammographic screening with the potential harms is more of a philosophical and moral question. Science-based medicine can inform us regarding the values of mammographic screening, but it can’t tell us what we value.
A MIDDLE GROUND?
Figuring out the benefits of screening for cancer, be it with mammography or by other means for other cancers, is by its very nature a difficult issue, full of scientific and ethical confounders. Unfortunately, this scientifically muddy issue, to be applied successfully to a mass population, is often sold uncritically and with far more confidence than the data support. Mammography is represented uncritically as pure good, and women are told that every woman should start having mammography religiously at age 40. That’s why, when the USPSTF recommendations were released last fall, which stated that screening should begin at age 50 and be offered every two years instead of every year (which, by the way, is what Norway does), the reaction against them was so vociferous, particularly given that the recommendations were released right in the midst of the debate about President Obama’s health insurance reform bill and its opponents were . It was science-based medicine reexamining an existing accepted belief about breast cancer and running right into the perfect storm of resistance from advocacy groups, politics, and physician practice patterns. It turns out that all too often both patients and physicians want certainty, not the ever-evolving recommendations based on new science.
Moreover, there are a lot of issues at play here, as points out:
We have moved to a different age, but I have to admit that I have a bias that we run the risk of going back in part to that future. Today, with screening mammograms we find more cancers that are smaller than can be detected by physical examination and no amount of self-awareness or physician awareness is going to change that. Patients and physicians have not suddenly become more astute in their diagnostic skills. If anything, they have become less effective in that arena.
There is also downsides of our success:
- We do find breast cancers today through mammography that years ago would never have been discovered and may never have caused a woman any difficulty.
- We do treat more women than benefit from that treatment, but we don’t have tests that can tell us which breast cancers don’t require any treatment as compared to those which are lethal.
- We have studies that show us formal, structured breast self examinations don’t appear to save lives.
- We have ethnic groups in this country that have had significant reductions in breast cancer mortality and others that have not, in part because too many don’t have access to quality medical care.
In short, this is a complicated issue. I am not certain that all of the relevant facts and considerations applicable to the circumstances in the United States are necessarily reflected in this particular study. Yet I suspect the headlines will be that mammograms don’t work or that they make little difference.
We must find our way out of these dilemmas. Messaging to the public has become garbled and confusing. The sad fact is that if we make the wrong decisions about the value of screening mammography, it will be years before we find out if we were mistaken.
“Dr. Len” has a point. However, he, too, is a bit guilty of black and white thinking here. He seems to think that we either preach the benefits of mammography or not. There is, however, a third way, and this was described by Drs. Kerianne H. Quanstrum and Rodney A. Hayward, M.D. at the University of Michigan in an NEJM editorial published two weeks before Kalager et al entitled .
After describing the violent reaction to the USPSTF guidelines, Quanstrum and Hayward pointed out two very simple things. First:
In reality, this independent panel, the Preventive Services Task Force, simply recommended that routine screening mammography begin at the age of 50 years, whereas women between the ages of 40 and 49 years should make individual decisions with their doctors as to whether their preferences and risk factors indicate screening at an earlier age. The panel also recommended that screening mammograms be performed every other year, which they suggested would reduce the harms of mammography by nearly half while maintaining most of the benefits provided by annual imaging.5 In short, the panel concluded that we had previously overestimated the value of mammography: that mammography is good, but not that good; that it is necessary for many women, but not all; and that it should be performed at some frequency, but perhaps not every year, for every woman.
Behind the panel’s conclusions regarding mammography lurks an unwelcome reality that our profession has often failed to acknowledge. Every medical intervention — no matter how beneficial for some patients — will provide continuously diminishing returns as the threshold for intervention is lowered. Mammography is just one case in point. For women between the ages of 40 and 49 years, the false positive rate is quite high, and the expected benefits are quite low: more than 1900 women would need to be invited for screening mammography in order to prevent just one death from breast cancer during 11 years of follow-up, at the direct cost of more than 20,000 visits for breast imaging and approximately 2000 false positive mammograms. Conversely, for women between the ages of 60 and 69 years, fewer than 400 women would need to be invited for screening in order to prevent one breast-cancer death during 13 years of follow-up, while accruing approximately 5000 visits and 400 false positive mammograms.6 In short, as the risk of breast cancer increases, the benefits of mammography increase, whereas the relative harms become progressively less significant.
Which strikes me as exactly correct. If there’s one rule of screening for any disease, it’s that the more you screen the more disease you will find, but much of it will be subclinical and much of it would likely never have killed the patient. Another rule is the law of diminishing returns. This leads to an observation, namely that for any screening test, there will be a population for whom the benefits clearly outweigh the risks, one for whom the risks clearly outweigh the benefits, and one for whom the answer is not so clear. What we have now is a model in which a threshold is set and there is in essence a binary system: Screen or don’t screen. What Quanstrum and Hayward argue is that there should be three options, which they represent thusly (click to enlarge):
And, it seems to me, that’s all that the USPSTF was suggesting: That for women without other risk factors for breast cancer between the ages of 50 and 70 the benefits of mammography clearly outweigh the risks. For women without other risk factors for breast cancer under age 40, the risks outweigh the benefits. For women between 40 and 50, the answer is unclear, and these women should discuss the issue with their doctors and come to a decision based on a frank discussion of the benefits and risks, acknowledging that screening may not be right for some women under 50, while for others it is. The problem is, binary decision-making is easier than more complicated models.
In the end, that’s probably what drives me the most batty about debates over screening for breast cancer. It’s not a black-and-white question, but advocates and physicians who become invested in the status quo sell it as such. Evidence and the science change, but policy recommendations become fossilized because certainty is perceived as being better than nuance. I will admit that three or four years ago I probably would have been one of the docs circling the wagons in the face of these new studies. No longer. I also detest the “other side,” who represent mammographic screening as useless because the benefits appear to be more modest than we originally believed. I believe that patients are far more capable than we give them credit for of understanding and acknowledging that there are gray areas in medicine. We should not be selling certainty when there is none.