Is peer-reviewed science too slow to track wearable accuracy?

By Jonah Comstock

JONAH_COMSTOCK_HEADSHOTFitness wearables are big these days. Going on a tenth of the population has them, their visibility is on the rise through TV commercials, and even the President is talking about getting one. And, of course, the upcoming Apple Watch will feature fitness tracking functionality. So it's no surprise that headlines that take the wind out of wearables are, as Wired's Brent Rose puts it in a new piece, "catnip for journalists."

Wired penned a response to a widely-reported February study from the University of Pennsylvania which suggested that wearables are less accurate than the activity trackers built into phones. We wrote our own response asking whether scientific accuracy was really the most important thing about wearables, as opposed to the potential for engagement. For its part, Wired didn't just dispute the science in the study, they shot back with a less rigorous, but possibly better designed, study of their own and found opposite results.

Wired's number one complaint about the study was that it used devices that were several years out of date, so they repeated the experiment with newer trackers. Second, the original researchers tracked steps on a treadmill, which is questionable as a proxy for real-world walking, so Wired tracked steps around a baseball field. Making these two changes, Rose found that the wearable devices were more accurate than the phones, with the Fitbit Charge HR the most accurate of all.

But the Wired experiment wasn't all good news for fitness trackers. They also tested the contention that fitness trackers are easy to fool with arms-only motions like whittling. Rose did just that for five minutes and found that it's still a big problem for wristworn devices: they recorded an average of 600 steps during those five stationary minutes.

So according to Wired, fitness trackers are actually more accurate than smartphones at counting steps, unless the user is also moving their arms around a lot, in which case the phone is a better bet. But Rose's final point is that caloric burn is an even more important metric than steps for serious fitness trackers. And because new wearables can monitor heart rate, they have an edge over phones in that category.

That wasn't really conclusively proved. For step counting, Wired compared all the devices to a baseline of actually physically counting the steps out loud. But for their caloric burn test, there was no baseline. So all Wired could conclude was that the devices in their study with heart rate tracking capabilities (the Fitbit and the Basis peak) returned higher caloric burn numbers than those that estimated caloric burn purely based on activity.

Interestingly, another study that came out just a few weeks prior to the University of Pennsylvania study actually looked at the accuracy of caloric burn tracking and found that fitness devices were even worse at it than step counting. Researchers used a portable metabolic analyzer to check those values. But, like the University of Pennsylvania study, the American Council on Exercise study used old devices, devices which didn't even have heart rate tracking.

The JAMA story, the response to it, and especially Wired's valid objections, all highlight some big problems, and they're not new problems to digital health. One is that fitness wearables are hot, and any published study about them -- especially one with a counterintuitive finding -- is likely to get reported a lot. It's easy for a complicated story to get distorted in today's clickbait headline world, as we discovered a few weeks ago when a report about how the Apple Watch was developed somehow turned into a persistent rumor that previously announced health features had been axed.

The second problem this story highlights is that in many ways the prevailing scientific model of randomized control trials and peer-reviewed journals is just too slow to meaningfully evaluate consumer activity trackers. A similar complaint persists on the more clinical side of mobile health, where the speed of randomized control trials can hold developers back. At the mHealth Summit in 2012, the Centre for Global eHealth Innovation's Joseph Cafazzo spoke at an event about an app called Bant his company was developing.

"I think the RCT could [be published] in 2015," he said at the time. "But honestly, we’re learning things through small pilots that can get apps into the field right now. In 2015, we want to see Bant further along than it is now. In the end, I haven’t had one parent say ‘I can’t wait for that RCT to be over so my kid can get this app.’ We’ll do the RCTs, but we have to be a lot more nimble for the purposes of these apps.”

According to comments on our story, the Quantified Self Institute has it's own take on this research coming out soon. And in general, more research would be appreciated, especially verifying the accuracy of caloric burn estimates that use heart rate tracking against a scientifically accepted baseline. But we may just have to accept that peer-reviewed science will always be several steps behind consumer technology life cycles.