
The charm of fitness-tracking smartwatches lies in their ability to decode our complex bodies into clear, understandable data. However, we may be deceiving ourselves if we think these gadgets are always accurate. A recent scientific study reveals that wearables often make mistakes, and it might be impossible to ever truly gauge their precision.
This won’t surprise regular Mytour readers. We've previously pointed out that some smartwatch metrics are more dependable than others, with calorie burn being one of the least accurate. On the other hand, while heart rate variability varies between devices, recovery-centric gadgets all tend to follow a similar trend—if you believe my homemade study with a sample size of one.
So, what can we conclude about the accuracy of smartwatches, and why is it so difficult to provide a clear answer? That’s the central question tackled by a recent study from sports and data scientists in Ireland. The study, an umbrella review that looks at multiple studies, gathered all available published data on consumer wearables. Here are some of their findings.
Research becomes obsolete the moment it’s published.
One might expect that companies like Apple, Garmin, or Fitbit would conduct thorough research on their technology before it’s released to the public. While they likely do such studies in-house, their main focus is on launching and selling products, not necessarily on validating their accuracy against other brands.
The studies we have are typically carried out by researchers, and these studies usually start once the wearables are already on the market. It takes a minimum of two years to complete a study on a new smartwatch and publish it, and by that time, the smartwatch is no longer considered 'new.'
This recent analysis, published in July 2024, relied on the most up-to-date meta-analyses, which in turn utilized the latest studies they had. What fitness watch models were included in these analyses? I reviewed the supplementary tables for the newest models from each major brand, which featured the following:
Fitbit’s Charge 4 (with the Charge 6 launched last year)
Apple Watch Series 6 (the latest is Series 9, which launched last year alongside the Ultra 2)
Garmin’s Fenix 5 (the Fenix 8 was released recently)
Garmin’s Forerunner 245 (still popular, though the 255 and 265 have since launched, with the 265 already a year and a half old)
Oura’s generation 2 ring (now at gen3)
Whoop 3.0 (currently at model 4.0)
To understand how the Apple Watch Ultra 2 compares to devices like the Charge 6, Forerunner 265, or Whoop 4.0, you'll have to wait a few more years. By then, new versions will likely have been released, making direct comparisons outdated.
Accuracy studies aren't always conducted in a uniform manner.
The studies differ so much that it's hard to compare them, even when examining older models of devices. For instance, the umbrella review revealed that many studies tended to underestimate heart rate and overestimate sleep time. However, the authors noted that they couldn't definitively state whether wearables in general tend to overestimate or underestimate these metrics, as the studies used different devices and reference standards.
According to the umbrella review, "This research highlights the complex variability across devices, outcomes, user contexts, and reference standards," which makes it difficult to assess the accuracy of wearables. In short, there's not enough data to provide clear answers when shopping for a new device.
Which metrics performed the best and which the worst?
While we should take these results with a pinch of salt, it’s still worth exploring the key findings of the umbrella review. Here are some recurring themes, though we can't claim they apply universally:
Heart rate was typically accurate within +/- 3% of the true value. This isn’t too bad, but a 6% variation could be problematic, especially when trying to stay within a 10-point range for your heart rate.
Heart rate variability was rated as “very good to excellent” at rest, but accuracy declined during physical activity.
Energy expenditure (calories burned) wasn’t great, as expected. Devices sometimes underestimated by 21%, and at other times overestimated by 14%.
Step counts also showed significant variation, with some readings as much as 9% lower than the actual count, and others 12% higher.
Sleep duration was generally overestimated, while sleep latency (how long it takes to fall asleep) was typically underestimated.
Rather than obsessing over accuracy, it's more important to assess whether something is useful.
I don’t judge wearables based on accuracy but on whether they’re useful. You might recall from my comparison of Whoop, Garmin, and Oura that each device reported different resting heart rate and heart rate variability figures, but all were able to track the same trend, which provided me with valuable insights into when my body was properly rested and recovered versus when it wasn’t.
This focus on usefulness is why I advise people to avoid fixating on calorie burn. If you really want to know how many calories you should consume to maintain your weight, it’s better to track your calorie intake alongside your weight. Similarly, instead of relying on a watch’s estimate of being in zone 2 during exercise, you can use other indicators, such as your breathing and your internal thoughts (“Oh no, how much longer?”), to gauge how hard you're working.
While we may not be able to verify the precision of every tracker, I recognize that accuracy is crucial for many people shopping for smartwatches and fitness trackers, so I’ll keep addressing it when it’s relevant. A GPS-enabled watch should accurately display the street you're running on, and a heart rate sensor shouldn’t mistake your running cadence for your heart rate. However, the key question to ask about a wearable isn’t whether its metrics are accurate, but whether they are useful, even if they might not be entirely accurate.