By Hana Kiros, MIT Technology Review
Radiologists assisted by an AI screen for breast cancer more successfully than they do when they work alone, according to new research. That same AI also produces more accurate results in the hands of a radiologist than it does when operating solo.
The large-scale study, published this month in The Lancet Digital Health, is the first to directly compare an AI’s performance in breast cancer screening according to whether it’s used alone or to assist a human expert. The hope is that such AI systems could save lives by detecting cancers doctors miss, free up radiologists to see more patients, and ease the burden in places where there is a dire lack of specialists.
The software being tested comes from Vara, a startup based in Germany that also led the study. The company’s AI is already used in over a fourth of Germany’s breast cancer screening centers and was introduced earlier this year to a hospital in Mexico and another in Greece.
The Vara team, with help from radiologists at the Essen University Hospital in Germany and the Memorial Sloan Kettering Cancer Center in New York, tested two approaches. In the first, the AI works alone to analyze mammograms. In the other, the AI automatically distinguishes between scans it thinks look normal and those that raise a concern. It refers the latter to a radiologist, who would review them before seeing the AI’s assessment. Then the AI would issue a warning if it detected cancer when the doctor did not.
To train the neural network, Vara fed the AI data from over 367,000 mammograms—including radiologists’ notes, original assessments, and information on whether the patient ultimately had cancer—to learn how to place these scans into one of three buckets: “confident normal,” “not confident” (in which no prediction is given), and “confident cancer.” The conclusions from both approaches were then compared with the decisions real radiologists originally made on 82,851 mammograms sourced from screening centers that didn’t contribute scans used to train the AI.
The second approach—doctor and AI working together—was 3.6% better at detecting breast cancer than a doctor working alone, and raised fewer false alarms. It accomplished this while automatically setting aside scans it classified as confidently normal, which amounted to 63% of all mammograms. This intense streamlining could slash radiologists’ workloads.
After breast cancer screenings, patients with a normal scan are sent on their way, while an abnormal or unclear scan triggers follow-up testing. But radiologists examining mammograms miss 1 in 8 cancers. Fatigue, overwork, and even the time of day all affect how well radiologists can identify tumors as they view thousands of scans. Signs that are visually subtle are also generally less likely to set off alarms, and dense breast tissue—found mostly in younger patients—makes signs of cancer harder to see.
Radiologists using the AI in the real world are required by German law to look at every mammogram, at least glancing at those the AI calls fine. The AI still lends them a hand by pre-filling reports on scans labeled normal, though the radiologist can always reject the AI’s call.
Thilo Töllner, a radiologist who heads a German breast cancer screening center, has used the program for two years. He’s sometimes disagreed when the AI classified scans as confident normal and manually filled out reports to reflect a different conclusion, but he says “normals are almost always normal.” Mostly, “you just have to press enter.”
Mammograms the AI has labeled as ambiguous or “confident cancer” are referred to a radiologist—but only after the doctor has offered an initial, independent assessment.
Radiologists classify mammograms on a 0 to 6 scale known as BI-RADS, where lower is better. A score of 3 indicates that something is probably benign, but worth checking up on. If Vara has assigned a BI-RADS score of 3 or higher to a mammogram the radiologist labels normal, a warning appears.
AI generally excels at image classification. So why did Vara’s AI on its own underperform a lone doctor? Part of the problem is that a mammogram alone can’t determine whether someone has cancer—that requires removing and testing the abnormal-looking tissue. Instead, the AI examines mammograms for hints.
Christian Leibig, lead author on the study and director of machine learning at Vara, says that mammograms of healthy and cancerous breasts can look very similar, and both types of scans can present a wide range of visual results. This complicates AI training. So does the low prevalence of cancer in breast screenings (according to Leibig, “in Germany, it’s roughly six in 1,000”). Because AIs trained to catch cancer are mostly trained on healthy breast scans, they can be prone to false positives.
The study tested the AI only on past mammogram decisions and assumed that radiologists would agree with the AI each time it issued a decision of “confident normal” or “confident cancer.” When the AI was unsure, the study defaulted to the original radiologist’s reading. That means it couldn’t test how using AI affects radiologists’ decisions—and whether any such changes may create new risks. Töllner admits he spends less time scrutinizing scans Vara labels normal than those it deems suspicious. “You get quicker with the normals because you get confident with the system,” he says.
Click here to read the full article on MIT Technology Review.