A new peer-reviewed study testing the coverage, accuracy and safety of eight online symptom assessment apps has found that the performance of apps varies widely, with only a handful performing close to the levels of human general practitioners (GPs).
Published today in BMJ Open, the study is the first of its kind to be published since 2015 and was conducted by a team of doctors and scientists led by global digital health company Ada Health.
Eight symptom assessment apps were tested: Ada, Babylon, Buoy, K Health, Mediktor, Symptomate, WebMD, and Your.MD.
WHY IT MATTERS
Coverage is an important measure for digital health tools that might be deployed at scale, since a tool with low coverage for example may exclude users who are too young, too old, pregnant, or who are living with a pre-existing mental health condition.
The study looked at how comprehensively the apps covered possible conditions and user types. The most comprehensive app was Ada, which provided a condition suggestion 99% of the time. The other apps tested provided a suggestion 69.5% of the time on average, with the lowest scoring just 51.5%. Human GPs provided 100% coverage.
The study also found that the apps’ clinical accuracy was also highly variable. Ada was rated as the most accurate, suggesting the right condition in its top three suggestions 71% of the time. The average across all the other apps was just 38%, with scores falling in a range between 23.5% and 43%.
With the exception of Ada, most apps didn’t correctly identify the possible conditions in the majority of the cases. Human GPs were the most accurate, with 82% accuracy.
The study also assessed the safety of the app’s advice by examining whether the guidance they provided, such as staying at home to manage symptoms, or going to see a doctor, was considered to have the appropriate level of urgency.
While most apps gave safe advice in the majority of cases, only three apps performed close to the level of human GPs: Ada, Babylon, and Symptomate. Although all the apps assessed scored above 80% on safety, compared to 97% for human GPs, any small disparity in the safety of advice could potentially have a major impact upon patient outcomes if deployed at scale.
THE LARGER CONTEXT
At the beginning of the year, Babylon unveiled a ten-year partnership with the Royal Wolverhampton NHS trust (RWT) to launch an integrated digital health system. In June, Babylon admitted a data breach had occurred that allowed patient access to recordings of another patient's consultations via the GP at Hand app.
In related news, Google is rolling out a new research app which shows participants how data is driving health insights, in a bid to drive clinical research participation and engagement.
ON THE RECORD
Dr Claire Novorol, co-founder and chief medical officer, Ada Health: “Symptom assessment apps have seen rapid uptake by users in recent years as they are easy to use, convenient and can provide invaluable guidance and peace of mind. When used in a clinical setting to support - rather than replace - doctors, they also have huge potential to reduce the burden on strained healthcare systems and improve outcomes.
"This peer-reviewed study provides important new insights into the development and performance of these tools. In particular, it shows that there is still much work to be done to make sure that these technologies are being built to be inclusive and to cover all patients. We believe this is vital if symptom assessment apps are to fulfil their potential: human doctors don’t have the luxury of cherry-picking which patients they help and digital health must be held to the same standard.”