According to a new study published this morning in Nature, an algorithm trained via a deep neural network has been able to perform on par with board-certified cardiologists at the annotation of 12 different types of heart rhythms.
Researchers from Stanford University and iRhythm collaborated for the study, which detailed an algorithm trained on 91,232 30-second single-lead ECG readings from 53,877 patients, recorded via iRhythm’s Zio monitoring patch, the company’s signature product.
“Finding arrhythmias and this task of annotating and accurately locating arrhythmias is often kind of a needle in a haystack problem,” Mark Day, iRhythm’s VP of research and development, told MobiHealthNews. “When you consider the service that we provide, we record up to 14 days of data. And in that time even a healthy patient would normally have 1.5 million heartbeats. … What this algorithm means is we have the opportunity to provide cardiologist-level annotation of all this data and finding those small periods that are extremely meaningful from a treatment perspective.”
Just because the algorithm performed on par with cardiologists doesn’t mean that it’s designed or destined to replace them, Day noted.
“This is by no means to say that the cardiologist’s role is not critical,” he said. “Their role is much more valuable focused on developing patient care plans and really supporting the treatment plan that comes from identifying and diagnosing the arrhythmia.”
A test dataset of 328 ECG snippets from different patients were analyzed by the algorithm, a consensus committee of cardiologists, and individual cardiologists, with the consensus committee’s annotations acting as the gold standard.
“The average F score, which is the harmonic mean of the positive predictive value and sensitivity, for the [deep neural network] (0.837) exceeded that of average cardiologists (0.780),” researchers wrote. “With specificity fixed at the average specificity achieved by cardiologists, the sensitivity of the DNN exceeded the average cardiologist sensitivity for all rhythm classes.”
How it was done
Stanford and iRhythm recruited nine board-certified cardiologists, eight of whom subspecialized in arrhythmias, and broke them into three teams of three, each of which looked at about 110 of the test data ECGs.
“And they had to sit down and look at them and argue as a group and agree on a designation for the annotations of that 30 seconds,” Day said. “Those annotations were sort of treated as the gold standard. The other six cardiologists who weren’t participating in that group were then asked to review those same records, those same test sets and provide their individual annotations. So in essence we were comparing individual cardiologists, six of them, to a consensus average of three cardiologists. And then we obviously compared the algorithm’s output to the average output of the six cardiologists.”
Cardiologists and the algorithm looked for 10 of the most common arrhythmias, plus sinus rhythm (normal ECG) and artifact (cases where the reading was taken incorrectly), for a total of 12 possible classifications.
What’s the history
This work builds on the paper published in mid-2017 in arXiv by the same team.
As we noted at the time, groups like AliveCor and Apple have done a good deal of work on using machine learning to search for arrhythmias based on ECG. Apple has even famously included this feature in their latest Watch OS. But those efforts have mostly been focused on just one arrhythmia class: atrial fibrillation, or AF.
“There’s certainly other work and we certainly respect the predecessors in this space that many entities have been working on algorithms that distinguish between AF and non-AF. That’s been the typical approach,” Day said. “And that’s meaningful. Certainly it’s a major problem. But what really needs to happen is to develop an algorithm more like what a cardiologist provides, which is much more nuanced. It’s not just AF or not AF. And in that sense, the deep neural network can annotate 12 different types of rhythms, 10 different arrhythmias as well as sinus and artifact. AF is certainly one of them, but it’s a much bigger problem to differentiate between 10 different things as opposed to two different things.”
On the record
“What this means is certainly something that will play out over time," Day said. "This is a study that was looking at 30-second ECG strips to allow that clinical validation that they performed, but if proven out and scaled into clinical use, the impact is one of really allowing better clinical outcomes, not just for physicians in terms of being able to make more efficient use of their time, but also for patients in terms of having better and faster and more effective clinical treatments.”