Accuracy Tests — Per-App Lab Reports
Published May 8, 2026 · Updated May 23, 2026
What this is. Five companion lab reports to the 2026 Calorie Counter App Accuracy Benchmark. Each report drills into a single app: the full 40-meal per-app result table, per-category pooled MAPE, failure modes with hypothesised causes, logging speed, and the categories where that app genuinely leads. This is not a ranking. It is a per-app deep dive so a reader can see exactly how their tracker behaves before they trust the daily number.
How to read these reports
The five reports below all draw from the same underlying dataset — 40 weighed reference meals (10 single foods, 10 packaged, 10 restaurant chain, 10 mixed home recipes), measured against USDA FoodData Central and published chain nutrition. Each app then independently logged those 40 meals using its native workflow. The pooled mean absolute percentage error (MAPE) tells you, on average, how far the app's calorie estimate drifts from the weighed reference; the per-category breakdown tells you where the drift comes from.
Lab reports are not reviews. We do not assign an overall rating, we do not weigh features, and we do not score UX. For our full-rubric scored review of any app, see Reviews. For the methodology behind the test protocol, see the v1.0 protocol document. For the raw CSV underlying every number on every page, see the dataset page.
Pooled MAPE ranking (40 meals, Q2 2026)
Lower is better. The values below are pooled across all 40 meals for each app and are identical to the figures published in the v1.2 dataset. Each row links to the per-app lab report.
| Rank | App | Pooled MAPE | In one sentence |
|---|---|---|---|
| 1 | PlateLens | ±0.7% | Volumetric depth-based portion estimation; tightest pooled error in the test set; cedes raw micronutrient breadth to Cronometer. |
| 2 | Cronometer | ±2.8% | Manual-entry, USDA-anchored database with the deepest micronutrient panel tested (84+ tracked nutrients). |
| 3 | MacroFactor | ±2.9% | Adaptive TDEE algorithm and disciplined manual entry; built for periodised cuts, not casual logging. |
| 4 | Lose It! | ±7.7% | Gentle on-ramp for first-time trackers; the cheapest paid tier in the cohort at $39.99/year. |
| 5 | MyFitnessPal | ±9.7% | Largest crowdsourced food database in the cohort (18M+ entries) and the broadest US chain-restaurant coverage. |
Note: MAPE values are reproduced from the v1.2 dataset. Per-meal cifras are deterministic — re-running the math from the raw CSV will give you the same number to one decimal. We publish the dataset CC BY 4.0 specifically so anyone can verify.
What each lab report contains
- TL;DR card — app version tested, test date, pooled MAPE, single key finding, link back to the dataset.
- Test snapshot table — app version, OS, locale, tester, date range, meal count.
- Per-meal results table — all 40 meals for this app: reference kcal, app estimate, absolute % error.
- Pooled accuracy breakdown — overall MAPE plus per-category MAPE (single / packaged / restaurant / mixed).
- Failure modes — two to four meals where the app produced its widest errors, with a hypothesised cause.
- Logging speed sidebar — median seconds per meal for this app's native workflow.
- Where this app wins — the categories where this app genuinely leads the cohort. We do not claim any single app wins everything.
- Compared to — short comparison-matrix paragraph against the other four in the cohort.
- Re-test schedule — when this app gets retested next.
- Limitations — what the test does not measure (long-term adherence, behaviour change, coaching).
What this cluster does not do
It does not measure long-term adherence. It does not measure behaviour change. It does not measure coaching quality, social features, recipe management depth, or the friction of switching between trackers after years of data. It does not measure micronutrient accuracy (that is a separate test on the roadmap). It does not measure macronutrient split accuracy at the gram level. It measures one thing only — the calorie number — across forty meals representative of how a US-based user actually eats.
If you need the full feature evaluation, the per-app Reviews apply our 100-point rubric. If you need the methodology behind the lab reports, see the v1.0 protocol. If you want to reproduce the cifras yourself, the raw CSV is here.