// Independent Testing · No Affiliates · No Sponsored Placements Methodology · Editorial
// PROTOCOL — CTL-PHOTO-v1.0

AI Food-Photo Logging Methodology

Sub-protocol of the Calorie Tracker Lab rubric · Last updated May 23, 2026 · Lead: Vincent Okonkwo · Statistics: Yuki Nakamura

Scope. This document specifies the 30-plated-meal photo-AI benchmark used to score every app on Calorie Tracker Lab that offers a photo-based logging workflow. It produces the AI-photo-recognition sub-score that feeds the composite. Photo-AI accuracy is measured independently of the broader calorie accuracy protocol because photo-AI is its own pipeline with its own failure modes.

1. Why a separate photo-AI protocol

Photo-AI logging is the workflow most vulnerable to silent, confident error. A barcode mis-resolution can be caught when the user notices the wrong package on screen. A manual entry can be caught when the user types and reviews. A photo-AI estimate is, by design, the workflow with the lowest user vigilance — the user took a picture and accepted what the app said. When the app says "Caesar salad with grilled chicken: 480 kcal" and the plate is actually "fettuccine alfredo with shrimp: 1,140 kcal," the user never sees the error. Three weeks of these silent errors and a 500 kcal/day deficit becomes a 200 kcal/day surplus.

The photo-AI benchmark therefore separates three measurable failure modes — identification, portion estimation, and final calorie estimation — and scores each independently rather than collapsing them into a single number. A photo-AI app can identify a dish correctly and still mis-portion it badly; another can portion accurately but mis-identify the dish; we want to see both.

2. The 30-plated-meal sample

The benchmark battery is 30 plated meals composed and weighed in-lab. The 30-meal count is the practical compromise between statistical power (n=30 gives a workable CI on per-meal MAPE while staying within the test budget for monthly retest cadence) and the cost of standardising plating and lighting for each meal.

Difficulty tiernExamplesWhat it stress-tests
Tier 1 — single principal item106 oz grilled chicken breast on white plate; medium banana on white plate; 1 cup cooked white rice in bowl; whole avocado halved; 100 g almonds in bowlBaseline dish recognition under near-laboratory conditions. An app that misses Tier 1 has structural recognition problems.
Tier 2 — composed plate, separable components10Chicken-rice-broccoli plate (components visually distinct); turkey sandwich + side salad; salmon + roasted potatoes + green beans; oatmeal bowl with sliced strawberries, almond butter dollop, chia sprinkleMulti-item recognition, per-item portion judgement, summation logic.
Tier 3 — composite dish, ingredients fused10Lasagna (hidden ricotta, hidden béchamel); chicken tikka masala over basmati (cream-based sauce); vegetable stir-fry (oil load not visible); Caesar salad (dressing volume not visible); shakshuka (hidden olive oil)Inferential reasoning about hidden fat, sauce, oil, and cooking-method calorie load — the workflow where photo-AI typically fails hardest.

The full 30-meal photo log — each meal's weighed component breakdown, USDA-anchored reference kcal, and reference photo — is published as an open dataset (CC BY 4.0) alongside the per-app photo-AI results.

3. Standardised plating, distance, lighting

Photo-AI performance depends heavily on the input image. To isolate model performance from input variability, every test photo is captured under fixed conditions. (Real-world degradation under varying conditions is characterised in a separate "field condition" sub-benchmark, summarised in §7.)

FixtureSpec
Plate10" round matte white ceramic, edge-to-edge unbordered. Same plate for every Tier 1 and Tier 2 meal. Bowls (matte white, 6.5") for bowl-format meals.
BackgroundMatte white photography sweep, no surrounding objects, no utensils in frame unless a utensil is part of the meal-component analysis.
LightingAputure Amaran 60d daylight-balanced LED panel, 5600K, 80% diffuser, positioned 1.2 m above the plate at a 75° angle from horizontal. Light meter reads 850 lux at the plate surface ±50 lux.
Camera distance35 cm from lens to plate centre. Phone mounted on Manfrotto Pixi mini tripod with extension arm; phone is not hand-held to remove tester-side framing variability.
Camera angleTop-down, 90° to plate plane (overhead). A separate "user-realistic" 45° angle pass is captured for the field-condition sub-benchmark.
DeviceiPhone 15 Pro, iOS 18.3, native camera resolution, no zoom, HDR on (default user behaviour).
Plate compositionEach meal's components are weighed individually before plating; plating arrangement is documented in the dataset's per-meal reference photo so retests can reproduce the exact arrangement.

4. Per-app workflow

Each app is tested on its single-photo native workflow: open the app's photo logging surface, capture (or upload — see below) one image, accept the app's first portion-estimate suggestion without manual correction. The benchmark explicitly does not use multi-photo workflows or correction loops, because the point is to measure the workflow a typical user actually runs — one photo, accept, log.

Mechanical details:

5. Per-meal scoring

For each (app × meal) pair, three independent sub-scores are recorded:

Sub-scoreDefinitionPass criterion
Identification accuracyDid the app correctly name the principal dish (Tier 1, Tier 2) or correctly name the composite dish (Tier 3)?Top-1 returned dish name matches the canonical dish name (case-insensitive, allowing common synonyms — "salmon" ≈ "grilled salmon"; "chicken tikka masala" ≠ "butter chicken"). Adjudicated against a fixed synonym list published in the dataset.
Portion accuracyIs the app's estimated portion volume within ±20% of the weighed truth?|estimated_g − weighed_g| / weighed_g ≤ 0.20. The ±20% threshold matches the FDA manufacturer-tolerance benchmark and is the conventional pass threshold in academic dietary-assessment validation literature.
Calorie accuracyIs the app's final logged kcal within MAPE bands of the USDA-anchored reference?Reported as continuous APE per meal; pooled across the 30 meals as photo-AI MAPE; no per-meal pass/fail threshold.

The three sub-scores are deliberately not collapsed into a single per-meal pass/fail because they describe different failure modes. An app that identifies a chicken-and-rice plate correctly, estimates the rice portion within 5%, and still misses calories by 25% (because its USDA mapping for "rice, cooked" is wrong) tells you something different from an app that mis-identifies the dish as "fried rice" and is wrong by 25% as a result.

6. Composite-meal subscore (Tier 3)

Tier 3 meals (lasagna, tikka masala, stir-fry, Caesar, shakshuka, and the remaining five composite dishes in the battery) are scored on an additional composite-meal subscore that captures the photo-AI pipeline's reasoning about hidden ingredients:

The composite-meal subscore is reported separately in the per-app photo-AI accuracy report and does not pool into the headline photo-AI MAPE; it captures qualitative reasoning failure modes that the pooled MAPE statistic does not surface.

7. Field-condition sub-benchmark

Real users do not photograph their dinner under studio lighting. A parallel field-condition sub-benchmark captures the same 30 meals under three additional condition sets — bright daylight (window-side, 11 am, north-facing), restaurant dim (250 lux, warm 3000K overhead), and kitchen overhead (typical 4000K LED, 400 lux) — and at 45° angle hand-held to simulate user behaviour. Field-condition results are reported separately to characterise photo-AI degradation; they do not contribute to the headline benchmark to keep that signal clean.

8. App version pinning + retest cadence

Photo-AI apps ship model updates frequently — sometimes monthly, occasionally weekly via server-side model swaps that do not bump the app's version string. This creates a measurement problem the lab handles two ways:

The dataset preserves all prior monthly releases; we do not silently overwrite published photo-AI numbers when a new model ships.

9. Current cycle: CTL-PHOTO-2026-Q2 (May release)

The current photo-AI benchmark cycle (CTL-PHOTO-2026-Q2, May 2026 release) ran the standardised studio battery against the four apps with active photo-AI offerings in the US App Store as of 10 May 2026. Headline pooled photo-AI MAPE values:

These figures align with the broader CTL-BENCH-2026-Q2 accuracy benchmark (40-meal mixed-workflow), which finds the same rank order under the lab's no-manual-correction protocol.

10. Limitations

Related protocols