Replaces placeholder ota_verify_signature_with_key with real mbedtls
ECDSA verify; adds 4-case native test suite with generated P-256 vectors.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Implements /ota/check (version comparison + sig_b64 payload) and
/ota/firmware (binary stream) using the same _impl pattern as
camera_endpoint.py. HMAC auth left commented pending main app wiring.
6/6 tests passing.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Generates firmware signing keypair; private key stays in gitignored
secrets/, public key written as 65-byte C array to
firmware/lib/ota_updater/ota_pubkey.h for compile-time OTA verification.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
device-id, location-id, wifi-ssid, and wifi-password were interpolated
directly into the NVS partition CSV. A value containing comma, double
quote, CR, or LF would split the field/row and silently provision the
wrong NVS keys — easiest concrete failure: a Wi-Fi password containing
a comma. Validate operator-supplied strings before generating the CSV.
Add an empty tools/__init__.py so the regression tests can import the
helper as 'tools.flash_device' (matches the existing 'server.*' test
pattern).
Found via adversarial review (run 2026-05-01-192928, gpt-5.5 reviewer).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A misbehaving or clock-broken device could submit period_end <=
period_start, polluting the camera_records table with zero-length or
inverted windows that corrupt downstream hourly analytics. Add a
Pydantic model_validator so the request is rejected at the API
boundary instead of silently persisting bad ranges.
Found via adversarial review (run 2026-05-01-191359, both reviewers).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
net_guard_tick() compared absolute uint32_t millis() values:
if (millis() < s_next_retry_ms) return;
This is broken across the ~49.7-day millis() wrap: depending on which
side of the wrap each value lands, retries either tight-loop or stall
indefinitely. The device is designed for multi-month uptime, so this
is a real production case, not a theoretical one.
Replace with the standard wrap-safe pattern using a signed difference.
Found via adversarial review (run 2026-05-01-202910, gpt-5.5 reviewer).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hmac_sign() previously trusted whatever secret_hex came out of NVS:
- Lengths >128 chars overflowed the fixed 64-byte stack buffer in
hex_to_bytes (out_len was unbounded).
- Non-hex characters were silently decoded to 0 via strtol with no
end-pointer check, producing signatures under a corrupted key.
- Empty secrets fell through to mbedtls_md_hmac_starts with len=0.
flash_device.py now rejects malformed --hmac-secret at provision time,
but hmac_sign should also refuse to sign under a malformed key regardless
of how it ended up in NVS (legacy provisioning, partial flash, etc.).
Add length, hex-charset, and even-length validation; make hex_to_bytes
return bool and have hmac_sign return empty HString on any failure
(callers already treat empty as failure via post_json_once).
Found via adversarial review (run 2026-05-01-202910, both reviewers).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add patterns for *secret* files (e.g. operator-saved HMAC secrets at
repo root), __pycache__/ directories, and .adversarial-review/ run
artifacts so they don't get accidentally committed via 'git add -A'.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
store_heartbeat_diagnostics() unconditionally SET each diagnostic column
to its parameter, so a v1.0.0 heartbeat (which omits the five v1.1.0
fields and leaves them as None after Pydantic parsing) erased previously
stored diagnostics for that device. Wrap each parameter in
COALESCE(?, column_name) so omitted fields preserve the existing value.
Found via adversarial review (gpt-5.5 reviewer, run 2026-05-01-191359).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
--hmac-secret accepted any string and passed it through to NVS, silently
producing a device that cannot authenticate to the server. Reject anything
that isn't exactly 64 hex characters (32 bytes) before generating the NVS
image. Auto-generated secrets are validated too as a defensive check.
Found via adversarial review (both reviewers, run 2026-05-01-192928).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
reporter_flush() snapshotted the buffers under lock, released the lock
to POST, then unconditionally cleared the entire buffer on success.
Records appended by reporter_submit_*() during the in-flight POST were
silently erased. Replace clear() with erase() of just the snapshotted
prefix so concurrent appends survive.
Found via adversarial review (gpt-5.5 reviewer, run 2026-05-01-190903).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
NimBLE-Arduino 1.4.2 had an init/fire race in its FreeRTOS callout porting
layer where os_callout_timer_cb dispatched a queued TimerHandle expiry
against a not-yet-initialized event (NULL fn pointer), causing PC=0
InstrFetchProhibited within ~1s of boot when the camera task starved the
timer service. Confirmed by ets_printf instrumentation. Upgrading to
^2.0.0 rewrites the porting layer and eliminates the race; verified clean
on the customer network for 1+ hour.
Also rolls in DNS-resilience work that surfaced the BLE crash during
provisioning: pin lwIP/esp-netif resolvers to 1.1.1.1/8.8.8.8 across DHCP
renewals, add three-tier resolver fallback in reporter with a hardcoded
IP of last resort, and switch to raw WiFiClient with manual Host header
to bypass HTTPClient's brittle DNS path.
Migration touches for NimBLE 2.x:
- NimBLEAdvertisedDeviceCallbacks -> NimBLEScanCallbacks
- onResult signature now takes const NimBLEAdvertisedDevice*
- setAdvertisedDeviceCallbacks -> setScanCallbacks
- start(0, nullptr, false) -> start(0, false, false)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Step 2 now shows openssl rand -hex 32 (with python and /dev/urandom
fallbacks) and writes to .agent/dc-<id>-secret with chmod 600, so the
flash_device.py example can read $(cat ...) the same way the known-good
dc-0002 command does.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the printed materials shipped with each device:
- retailer-setup-guide.docx — non-technical 1-2 page setup guide
- retailer-setup-guide.py — generator script for the .docx
- doorcounter-repo-qr.png — QR code linking to the public Gitea repo
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a sourced parts table (M5 TimerCamera-F, USB cable, 5V adapter), the
~750 mW measured power draw, the 3-5s detection latency caveat, and a
six-step Quick Start aimed at semi-technical operators deploying their
own device.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Config-load and camera-init FATAL branches now reboot (3s LED signal
before restart) instead of hanging forever. Matches the enum name
REBOOT_FATAL_* and makes camera-init failures diagnosable via the
next boot's heartbeat recent_events. Config failures produce a
visible reboot loop rather than a silent hang.
- Emit EVT_NTP_SYNC(seconds_since_boot) on the first NTP-synced
reporter iteration so slow / failed NTP sync is a visible signal in
the heartbeat's recent_events window.
- README "Deploying firmware 1.1" now opens with a "Before you flash"
warning directing the operator to land server-side heartbeat
schema changes first (migration 005 + stub integration) to avoid a
strict-schema 4xx reboot loop after deployment.
Two FATAL while(true) hangs in main.cpp (config load fail, camera init
fail) previously relied on the hardware watchdog to reboot the device,
leaving the cause invisible beyond a generic TWDT reset reason. Now
each path logs EVT_REBOOT with REBOOT_FATAL_CONFIG or REBOOT_FATAL_CAMERA
before hanging, so the next heartbeat's recent_events surfaces which
branch hung. Server-side decoder updated for the two new enum values.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Flash command, expected first-boot behavior, per-feature summary of the
1.1 release, 24-hour field-check playbook, and a reference table for
decoding the heartbeat's recent_events array.
The real server lives in a separate repo; this repo carries reference
stubs for each endpoint (see camera_endpoint.py precedent). Adds the
Pydantic extension, persistence helper, migration 005, and tests that
the real server can copy when adding diagnostic-field support.
Matches the firmware v1.1.0 heartbeat payload shape. Old-shape
payloads (firmware v1.0.0) continue to parse cleanly with the new
fields defaulting to None.
Heartbeat v1.1.0 now carries heap stats (free + min_free since boot),
esp_reset_reason(), last WiFi disconnect code, and the last 8
persisted event-log entries. Makes field failures diagnosable
server-side without retrieving the device: the post-reboot heartbeat
will include EVT_BOOT with reset reason and whatever EVT_WIFI_DOWN
or EVT_HTTP_FAIL entries preceded it.
Reporter task counts consecutive heartbeat failures from the bool
returned by reporter_heartbeat (Task 5). After 6 consecutive misses
(~6 hours at the hourly cadence) the device logs EVT_HEARTBEAT_MISS
then EVT_REBOOT(REBOOT_HEARTBEAT_MISS) and restarts, giving the whole
network stack a clean reinitialization. The 200ms delay before the
restart lets NVS commit the REBOOT entry so the next boot can report
it via EVT_BOOT + esp_reset_reason().
30s TWDT subscribes all three long-running tasks and panics on hang.
The reporter task's retry loop explicitly feeds between attempts so
the 3-try sequence (worst case 52s) does not itself trip the dog.
Reset reason on next boot is visible via esp_reset_reason() which
EVT_BOOT already logs.
Unbounded TLS/HTTP POSTs were blocking the reporter task indefinitely
on weak WiFi. Now: 5s connect timeout, 10s response timeout, 3 attempts
with 0/2s/5s backoff. Every attempt logs HTTP_OK or HTTP_FAIL to the
event log. reporter_heartbeat now returns bool so the caller can count
consecutive misses.
- net_guard_tick now detects status-vs-event divergence. If s_up is
true but WiFi.status() says otherwise (rare: driver wedge, silent
RF failure), force DOWN state and schedule reconnect. Uses 0xFF
disconnect reason so the event log distinguishes this path.
- Forward-declare DeviceConfig in net_guard.h so consumers that don't
call net_guard_start don't transitively pull config.h.
loop() no longer blocks for 5s after a disconnect; reconnect is
scheduled from the WiFi event handler with exponential backoff.
Buffered reports flush on every clean UP transition.
Mutex take in event_log_write and event_log_read_recent switched
from portMAX_DELAY to pdMS_TO_TICKS(50) with skip-on-timeout. Prevents
the high-priority WiFi event task from stalling on NVS writes; diag
loss under contention is preferable to dropped WiFi events.
- Seed s_up from WiFi.status() in net_guard_start so the first
STA_GOT_IP (fired during setup's busy-wait, before onEvent was
registered) is not missed — prevents a reconnect flap on every boot.
- Drop WiFi.disconnect() from net_guard_tick; WiFi.begin() alone
re-associates cleanly and avoids a spurious STA_DISCONNECTED that
was double-logging EVT_WIFI_DOWN on every retry.
- Re-check s_up after the millis() timing gate to close the
GOT_IP-vs-tick race.
- Document the volatile-only shared-state contract.
net_guard registers WiFi.onEvent() so disconnects are handled
immediately instead of polled every 1s. Backoff 1s->2s->4s->...->60s cap.
Every up/down transition is logged to the event log with the disconnect
reason code, so field failures are diagnosable.
Every boot logs EVT_BOOT with esp_reset_reason(); every deliberate
ESP.restart() is preceded by EVT_REBOOT with a reason code. This
gives us a persistent answer to 'why did the device just reboot?'.
Exercises the slot-scan logic in event_log_init(): after a simulated
reboot (RAM state cleared, NVS slots preserved) the module must
resume with the correct head/cnt so newest-first read order is
unchanged and subsequent writes continue the seq monotonically.
Adds native-only event_log_test_simulate_reboot() helper. Lifts the
slot-scan loop out of the #ifdef ARDUINO guard so the native stub
exercises the same recovery path as production; the platform-specific
NVS setup remains guarded.
- Remove monotonic counter writes to NVS (stop burning flash on every
event). Derive head and cnt by scanning slots on boot.
- Widen seq to uint32 so slot scan works across multi-year lifetimes.
- Add FreeRTOS mutex around write/read so WiFi event handlers can
safely call event_log_write from another task.
- Check Preferences.begin() return; disable logging if NVS unavailable.
- Extract NTP_SYNC_THRESHOLD constant; drop misleading native uptime.
- Add tests for empty read, max_entries truncation, real-path hash.
Persistent 32-slot ring buffer of tagged diagnostic events (boot, wifi
up/down, http ok/fail, heartbeat miss, reboot). Used to diagnose field
failures post-hoc via the heartbeat payload, without needing serial
access. Native-native stub lets policy be unit-tested.
Replace per-track line-crossing counter with a single event state machine
gated by foreground pixel count (ENTER=250, EXIT=150) and finalized by
quiet-exit or timeout. Direction inferred from centroid excursion
(up_score vs down_score) on quiet-exit fires, and from net displacement
(last_c vs first_c) on timeout fires.
Tuning reflects bench data at the intended 7' overhead mount: walkers
produce smaller centroid excursions than originally modelled, so
EXTENT gates, MIN_TRAJ, MAX_FRAMES and REFRACTORY were all relaxed from
their initial guesses. Constants and rationale live in firmware/lib/cv/cv.h.
Bench results (8 isolated walks, 4 entries + 4 exits):
* Event detection: 8/8 (100%)
* Aggregate entries+exits split: 4+4 (matches)
* Per-walk direction labelling: 4/8 (~50%)
Document explicitly that per-walk direction is unreliable at this mount
and that downstream analytics should trust only gross traffic
(entries + exits). Recovering direction would require a physical mount
change or a richer signal; both are out of scope for v1.
Tooling:
* tools/replay_logs.py — replay event state machine against captured
[F] diagnostic lines, for offline tuning without flash-test loops.
* firmware/src/main_capture.cpp + tools/capture_frames.py +
tools/replay_frames.py — raw-frame capture firmware and Python port
of the detector, kept in tree for future iteration even though the
TimerCamera-F serial driver stripped specific byte ranges in testing
and log-based replay became the working path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A single person walking under the overhead camera was generating both an
entry and an exit within a few seconds — the line-crossing logic treated
a blob's traversal into one side of the frame and out the other as two
separate events whenever the track spawned near the line, oscillated
against shadows, or churned at creation.
Replaced line-crossing semantics with directional traversal:
- Each track records spawn_y at creation and a counted flag.
- An event fires only if the track is not yet counted, spawned firm on
one side of the line (|spawn_y - line_y| > CV_TRAVERSAL_MARGIN_PX),
and is now firm on the opposite side. Direction of travel determines
entry vs exit. The track is then flagged counted — one trip, one count.
- Cooldown remains as a secondary safety net.
main.cpp: single/double LED pulse on entry/exit detections. Saves and
restores the current LED state so upload (yellow-on) and no-WiFi
indicators aren't clobbered.
Tests updated to walk blobs beyond the margin and register two new
cases: wobble-at-line doesn't count, and a reversed full traversal
doesn't double-count on the same track.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- README: note NVS may be cleared by firmware uploads (requires re-running
flash_device.py); new Troubleshooting table covering the fast-blink fatal
state, captive-portal fallback, and no-counts cases.
- tools/serial_monitor.py: ESP32 RTS/DTR reset + serial capture with
per-line elapsed-time prefix. Used to distinguish "unprovisioned" vs
"WiFi failed" boot states (fast-blink LED alone is ambiguous).
- README project-tree updated to include lib/cv, docs/server-prompt-…,
and the new tool.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When a blob briefly drops below CV_MIN_BLOB_PX, its track is killed and respawns,
causing the same person to generate multiple counts per visit (~50/min observed
in field). Add a per-direction cooldown (default 5 frames ≈ 0.8s @ 5 fps) that
drops subsequent entries (or exits) within the window of the last counted one.
Entry and exit cooldowns are tracked independently.
Fixed at compile time for now; exposing as a server-push tunable is deferred
until the server-push-config branch lands. See docs/server-prompt-crossing-
cooldown.md for the server-side coordination notes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Reduce debug level to 1 (errors only) for production builds
- Replace BLE pause/resume with full deinit/reinit during HTTP uploads (~25KB freed)
- Add 60s boot report delay for fast post-deploy connectivity verification
- Add device_id to BLE batch and heartbeat request bodies
- Correct API host to http:// (plain HTTP, not HTTPS)
- Add HTTP response logging and CV entry/exit serial logging
- Create root README.md with operator setup and architecture overview
- Update design spec: HMAC format, BLE memory approach, request body shapes, reporting intervals
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
loopTask: cv_init() created a CVState{} temporary (9KB background
array) on the stack — fixed by initializing members directly.
cam task: cv_process() had uint8_t fg[CV_PIXELS] (9KB) as a local
variable — made static, matching the existing fg_copy fix.
cam task stack bumped from 4096 to 8192 for headroom.
Also: switch to 4MB OTA partition table (TimerCamera-F has 4MB flash,
not 8MB), add CONFIG_ARDUINO_LOOP_STACK_SIZE=16384 build flag,
upload_speed=115200 and --no-stub for reliable CH340 flashing.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- hmac_sign now takes method+path instead of device_id; builds message as
method\npath\ntimestamp\nhex(sha256(body)) per server verify_device_hmac
- reporter: header renamed X-HMAC-Signature → X-Signature; passes "POST"+path
- test vector regenerated against new message format; timestamp-diff test updated
- .size() → .length() throughout (Arduino String has no size())
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>