Google researchers use deep learning to detect diabetic retinopathy with upwards of 90 percent accuracy

By Jonah Comstock

A team of Google researchers has published a paper in the Journal of the American Medical Association showing that Google's deep learning algorithm, trained on a large data set of fundus images, can detect diabetic retinopathy with better than 90 percent accuracy. 

"These results demonstrate that deep neural networks can be trained, using large data sets and without having to specify lesion-based features, to identify diabetic retinopathy or diabetic macular edema in retinal fundus images with high sensitivity and high specificity," researchers write in the paper. "This automated system for the detection of diabetic retinopathy offers several advantages, including consistency of interpretation (because a machine will make the same prediction on a specific image every time), high sensitivity and specificity, and near instantaneous reporting of results. In addition, because an algorithm can have multiple operating points, its sensitivity and specificity can be tuned to match requirements for specific clinical settings, such as high sensitivity for a screening setting."

While other forms of machine learning have been used to diagnose diabetic retinopathy in the past, deep learning is a more pure form of artificial intelligence in that it doesn't receive any guidance to look for particular features. Instead, it learns on its own from nothing but the images and information about what's in them. In this case, the data set consisted of 128,175 images, each graded three to seven times by licensed opthalmologists.

The algorithm was then tested on 9,963 deidentified images retrospectively obtained from EyePACS in the United States and three eye hospitals in India. A second, publicly available, research data set of 1,748 was also used. The accuracy was determined by comparing its diagnoses to those done by a panel of at least seven US board-certified ophthalmologists. The two data sets had 97.5 percent and 96.1 percent sensitivity respectively and 93.4 percent and 93.9 percent specificity respectively.

"Automated grading of diabetic retinopathy has potential benefits such as increasing efficiency, reproducibility, and coverage of screening programs; reducing barriers to access; and improving patient outcomes by providing early detection and treatment," researchers wrote. "To maximize the clinical utility of automated grading, an algorithm to detect referable diabetic retinopathy is needed."

In both the JAMA paper and an explanatory blog post published the same day, researchers noted that there is more work to do.

"Interpretation of a 2D fundus photograph, which we demonstrate in this paper, is only one part in a multi-step process that leads to a diagnosis for diabetic eye disease," researcher Lily Peng wrote on the blog. "In some cases, doctors use a 3D imaging technology, Optical Coherence Tomography (OCT), to examine various layers of a retina in detail. Applying machine learning to this 3D imaging modality is already underway, led by our colleagues at DeepMind. In the future, these two complementary methods might be used together to assist doctors in the diagnosis of a wide spectrum of eye diseases."

And, of course, there are limitations to the accuracy of the algorithm. Since its unclear even to its creator exactly how a deep learning algorithm learns, the neural net could be relying on some feature of the photographs in the dataset that isn't transferrable to all retinal photography.

"Because the network 'learned' the features that were most predictive for the referability implicitly, it is possible that the algorithm is using features previously unknown to or ignored by humans," researchers wrote. "Although this study used images from a variety of clinical settings (hundreds of clinical sites: three in India, hundreds in the United States, and three in France) with a range of camera types to mitigate the risk that the algorithm is using anomalies in data acquisition to make predictions, the exact features being used are still unknown. Understanding what a deep neural net uses to make predictions is a very active area of research within the larger machine learning community."

Finally, the algorithm is very good at what it was trained to do, but that's a long way from standing in as a replacement for a specialist.

"The algorithm has been trained to identify only diabetic retinopathy and diabetic macular edema," researchers wrote. "It may miss nondiabetic retinopathy lesions that it was not trained to identify. Hence, this algorithm is not a replacement for a comprehensive eye examination, which has many components, such as visual acuity, refraction, slitlamp examination, and eye pressure measurements."