Commit Graph

29 Commits

Author SHA1 Message Date
a0eee0e6d4 fix(firmware): preserve buffered records appended during flush POST
reporter_flush() snapshotted the buffers under lock, released the lock
to POST, then unconditionally cleared the entire buffer on success.
Records appended by reporter_submit_*() during the in-flight POST were
silently erased. Replace clear() with erase() of just the snapshotted
prefix so concurrent appends survive.

Found via adversarial review (gpt-5.5 reviewer, run 2026-05-01-190903).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 13:19:11 -07:00
a585a56cff fix(firmware): upgrade NimBLE to 2.x + DNS fallback for unreliable resolvers
NimBLE-Arduino 1.4.2 had an init/fire race in its FreeRTOS callout porting
layer where os_callout_timer_cb dispatched a queued TimerHandle expiry
against a not-yet-initialized event (NULL fn pointer), causing PC=0
InstrFetchProhibited within ~1s of boot when the camera task starved the
timer service. Confirmed by ets_printf instrumentation. Upgrading to
^2.0.0 rewrites the porting layer and eliminates the race; verified clean
on the customer network for 1+ hour.

Also rolls in DNS-resilience work that surfaced the BLE crash during
provisioning: pin lwIP/esp-netif resolvers to 1.1.1.1/8.8.8.8 across DHCP
renewals, add three-tier resolver fallback in reporter with a hardcoded
IP of last resort, and switch to raw WiFiClient with manual Host header
to bypass HTTPClient's brittle DNS path.

Migration touches for NimBLE 2.x:
- NimBLEAdvertisedDeviceCallbacks -> NimBLEScanCallbacks
- onResult signature now takes const NimBLEAdvertisedDevice*
- setAdvertisedDeviceCallbacks -> setScanCallbacks
- start(0, nullptr, false) -> start(0, false, false)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 11:34:17 -07:00
a795cfa0ad fix(firmware): reboot on FATAL failures + emit NTP_SYNC + server-coord warning
- Config-load and camera-init FATAL branches now reboot (3s LED signal
  before restart) instead of hanging forever. Matches the enum name
  REBOOT_FATAL_* and makes camera-init failures diagnosable via the
  next boot's heartbeat recent_events. Config failures produce a
  visible reboot loop rather than a silent hang.
- Emit EVT_NTP_SYNC(seconds_since_boot) on the first NTP-synced
  reporter iteration so slow / failed NTP sync is a visible signal in
  the heartbeat's recent_events window.
- README "Deploying firmware 1.1" now opens with a "Before you flash"
  warning directing the operator to land server-side heartbeat
  schema changes first (migration 005 + stub integration) to avoid a
  strict-schema 4xx reboot loop after deployment.
2026-04-23 14:10:32 -07:00
d943b3df5a feat(firmware): log reason before FATAL hang loops
Two FATAL while(true) hangs in main.cpp (config load fail, camera init
fail) previously relied on the hardware watchdog to reboot the device,
leaving the cause invisible beyond a generic TWDT reset reason. Now
each path logs EVT_REBOOT with REBOOT_FATAL_CONFIG or REBOOT_FATAL_CAMERA
before hanging, so the next heartbeat's recent_events surfaces which
branch hung. Server-side decoder updated for the two new enum values.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 14:03:57 -07:00
5c9f5df0ce feat(firmware): include diagnostics in heartbeat payload
Heartbeat v1.1.0 now carries heap stats (free + min_free since boot),
esp_reset_reason(), last WiFi disconnect code, and the last 8
persisted event-log entries. Makes field failures diagnosable
server-side without retrieving the device: the post-reboot heartbeat
will include EVT_BOOT with reset reason and whatever EVT_WIFI_DOWN
or EVT_HTTP_FAIL entries preceded it.
2026-04-23 13:54:55 -07:00
f08f70a8fb feat(firmware): software heartbeat-miss watchdog reboots after 6h offline
Reporter task counts consecutive heartbeat failures from the bool
returned by reporter_heartbeat (Task 5). After 6 consecutive misses
(~6 hours at the hourly cadence) the device logs EVT_HEARTBEAT_MISS
then EVT_REBOOT(REBOOT_HEARTBEAT_MISS) and restarts, giving the whole
network stack a clean reinitialization. The 200ms delay before the
restart lets NVS commit the REBOOT entry so the next boot can report
it via EVT_BOOT + esp_reset_reason().
2026-04-23 13:52:07 -07:00
7b546d0ed7 feat(firmware): enable task watchdog on camera/reporter/loop tasks
30s TWDT subscribes all three long-running tasks and panics on hang.
The reporter task's retry loop explicitly feeds between attempts so
the 3-try sequence (worst case 52s) does not itself trip the dog.
Reset reason on next boot is visible via esp_reset_reason() which
EVT_BOOT already logs.
2026-04-23 13:49:05 -07:00
8f8ad0b1b0 fix(firmware): add HTTP timeouts + 3-try retry, report heartbeat status
Unbounded TLS/HTTP POSTs were blocking the reporter task indefinitely
on weak WiFi. Now: 5s connect timeout, 10s response timeout, 3 attempts
with 0/2s/5s backoff. Every attempt logs HTTP_OK or HTTP_FAIL to the
event log. reporter_heartbeat now returns bool so the caller can count
consecutive misses.
2026-04-23 13:44:17 -07:00
af3067d481 refactor(firmware): drive WiFi reconnect from net_guard events
loop() no longer blocks for 5s after a disconnect; reconnect is
scheduled from the WiFi event handler with exponential backoff.
Buffered reports flush on every clean UP transition.
2026-04-23 13:36:29 -07:00
95724bf3ff feat(firmware): log boot and reboot reason to event log
Every boot logs EVT_BOOT with esp_reset_reason(); every deliberate
ESP.restart() is preceded by EVT_REBOOT with a reason code. This
gives us a persistent answer to 'why did the device just reboot?'.
2026-04-23 13:21:23 -07:00
a37207b6ff feat: event-based walker detector tuned to real 7' overhead mount
Replace per-track line-crossing counter with a single event state machine
gated by foreground pixel count (ENTER=250, EXIT=150) and finalized by
quiet-exit or timeout. Direction inferred from centroid excursion
(up_score vs down_score) on quiet-exit fires, and from net displacement
(last_c vs first_c) on timeout fires.

Tuning reflects bench data at the intended 7' overhead mount: walkers
produce smaller centroid excursions than originally modelled, so
EXTENT gates, MIN_TRAJ, MAX_FRAMES and REFRACTORY were all relaxed from
their initial guesses. Constants and rationale live in firmware/lib/cv/cv.h.

Bench results (8 isolated walks, 4 entries + 4 exits):
  * Event detection: 8/8 (100%)
  * Aggregate entries+exits split: 4+4 (matches)
  * Per-walk direction labelling: 4/8 (~50%)

Document explicitly that per-walk direction is unreliable at this mount
and that downstream analytics should trust only gross traffic
(entries + exits). Recovering direction would require a physical mount
change or a richer signal; both are out of scope for v1.

Tooling:
  * tools/replay_logs.py — replay event state machine against captured
    [F] diagnostic lines, for offline tuning without flash-test loops.
  * firmware/src/main_capture.cpp + tools/capture_frames.py +
    tools/replay_frames.py — raw-frame capture firmware and Python port
    of the detector, kept in tree for future iteration even though the
    TimerCamera-F serial driver stripped specific byte ranges in testing
    and log-based replay became the working path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 16:03:36 -07:00
3b471992f2 feat(cv): directional once-per-track counting + detection LED blinks
A single person walking under the overhead camera was generating both an
entry and an exit within a few seconds — the line-crossing logic treated
a blob's traversal into one side of the frame and out the other as two
separate events whenever the track spawned near the line, oscillated
against shadows, or churned at creation.

Replaced line-crossing semantics with directional traversal:
- Each track records spawn_y at creation and a counted flag.
- An event fires only if the track is not yet counted, spawned firm on
  one side of the line (|spawn_y - line_y| > CV_TRAVERSAL_MARGIN_PX),
  and is now firm on the opposite side. Direction of travel determines
  entry vs exit. The track is then flagged counted — one trip, one count.
- Cooldown remains as a secondary safety net.

main.cpp: single/double LED pulse on entry/exit detections. Saves and
restores the current LED state so upload (yellow-on) and no-WiFi
indicators aren't clobbered.

Tests updated to walk blobs beyond the margin and register two new
cases: wobble-at-line doesn't count, and a reversed full traversal
doesn't double-count on the same track.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 09:46:59 -07:00
9d5b588231 feat: production-ready firmware with BLE memory management, device_id fixes, and docs
- Reduce debug level to 1 (errors only) for production builds
- Replace BLE pause/resume with full deinit/reinit during HTTP uploads (~25KB freed)
- Add 60s boot report delay for fast post-deploy connectivity verification
- Add device_id to BLE batch and heartbeat request bodies
- Correct API host to http:// (plain HTTP, not HTTPS)
- Add HTTP response logging and CV entry/exit serial logging
- Create root README.md with operator setup and architecture overview
- Update design spec: HMAC format, BLE memory approach, request body shapes, reporting intervals

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 11:13:50 -07:00
4b671843b3 fix: three stack overflows crashing firmware on TimerCamera-F
loopTask: cv_init() created a CVState{} temporary (9KB background
array) on the stack — fixed by initializing members directly.

cam task: cv_process() had uint8_t fg[CV_PIXELS] (9KB) as a local
variable — made static, matching the existing fg_copy fix.

cam task stack bumped from 4096 to 8192 for headroom.

Also: switch to 4MB OTA partition table (TimerCamera-F has 4MB flash,
not 8MB), add CONFIG_ARDUINO_LOOP_STACK_SIZE=16384 build flag,
upload_speed=115200 and --no-stub for reliable CH340 flashing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 10:58:06 -07:00
135eb3b46c fix: HMAC format — match server POST\npath\ntimestamp\nsha256(body) scheme
- hmac_sign now takes method+path instead of device_id; builds message as
  method\npath\ntimestamp\nhex(sha256(body)) per server verify_device_hmac
- reporter: header renamed X-HMAC-Signature → X-Signature; passes "POST"+path
- test vector regenerated against new message format; timestamp-diff test updated
- .size() → .length() throughout (Arduino String has no size())

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 10:47:13 -07:00
8a00665e4c fix: ArduinoOTA init, reporter mutex, BLE lock scope, NVS type
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 10:33:23 -07:00
36f4becbe9 fix: camera downscale — centered crop, explicit PSRAM frame buffer 2026-04-14 09:30:20 -07:00
121f7a0a0a fix: main.cpp — static frame buffer, mutex for cv state, NTP init guard
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 06:59:09 -07:00
49da51bc05 feat: main.cpp — FreeRTOS tasks, LED indicators, factory reset
Replaces empty stub with full application: camera+CV task on core 1 at
5 fps, hourly reporter task on core 0, WiFi reconnect loop, 5-second
factory reset via BOOT button (GPIO37), LED on GPIO2 for status.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 06:56:59 -07:00
29737d735a feat: WiFiManager captive portal provisioning
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 06:44:06 -07:00
988443f207 fix: reporter — correct re-buffer on POST failure, NTP guard, TLS note
- reporter_submit_camera/ble: cap batch to REPORTER_MAX_BUFFER before
  POST and assign whole capped batch back to buffer on failure, fixing
  silent record drop when batch > buffer capacity
- post_json: reject sends when ts < 1700000000 (clock not NTP-synced)
- post_json: add comment documenting intentional no-cert-validation

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 06:33:00 -07:00
244426ec8b feat: reporter — HMAC-signed hourly POST with 24-record offline buffer
Fix Arduino String .size() → .length() in hmac.cpp (pre-existing bug surfaced by compilation).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 06:28:24 -07:00
6422e052df fix: ble_scanner sha256_prefix — guard mbedTLS null info and setup failure 2026-04-14 06:26:20 -07:00
ccbbf689cf feat: BLE passive scanner with RSSI bucketing and MAC hashing
Add passive BLE scan module using NimBLE for WiFi coexistence. Tracks
unique devices per hour with SHA256-hashed MACs, RSSI bucketing
(near/mid/far), max concurrent count, and thread-safe collect/reset.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 06:24:49 -07:00
29808e07a6 fix: camera — null-check sensor handle before set_vflip/set_hmirror
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-13 17:34:38 -07:00
99756bdbaf feat: camera module — OV3660 init and 96x96 grayscale capture
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-13 17:33:10 -07:00
74bff0912b fix: config_save_wifi — always write both credentials
Replace short-circuit boolean evaluation of putString return values with
separate size_t variables so both writes always execute regardless of
whether the first succeeds.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-13 14:05:26 -07:00
d5afd0bd87 feat: config module — NVS read/write via Preferences
Add config.h/config.cpp for DeviceConfig NVS persistence using Arduino
Preferences library. Add minimal main.cpp stub. Fix partition table
overlap (nvs 0x6000→0x5000, otadata 0xf000→0xe000) so firmware builds.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-13 14:02:28 -07:00
6c46ea26ab chore: init PlatformIO project for TimerCamera-F 2026-04-13 13:31:38 -07:00