Files
DoorCounter/README.md
Peter Woolery 2d95069bd1 docs: network-resilience firmware 1.1 deployment + field diagnostic guide
Flash command, expected first-boot behavior, per-feature summary of the
1.1 release, 24-hour field-check playbook, and a reference table for
decoding the heartbeat's recent_events array.
2026-04-23 14:02:09 -07:00

12 KiB
Raw Blame History

DoorCounter

Retail door traffic counter using M5Stack TimerCamera-F (ESP32 + OV3660). Counts walker traversals via overhead camera CV, passively scans BLE foot traffic, and reports hourly to logs.research.bike.

Known limitation — directional accuracy. This firmware reports counts as {entries, exits} for API compatibility, but per-walk direction labelling is not reliable at the current mount (7' overhead, straight down). In bench testing, event detection was 100% (8/8 walks detected) while per-walk direction matched the physical walk only ~50% of the time — the centroid trajectories produced by entries and exits were nearly indistinguishable. The number to trust is gross traffic: entries + exits ≈ total walkers through the doorway. The directional split is an unreliable best-effort heuristic. See Directional counting for why.

Hardware

  • Device: M5Stack TimerCamera-F (ESP32-S, OV3660, PSRAM, WiFi/BLE)
  • Mount: Overhead, camera pointing straight down, centered above doorway
  • Power: USB (any phone charger)

Firmware

Built with PlatformIO. Target: timercam.

cd firmware
pio run -t upload --upload-port /dev/ttyUSB0

What it does

Module Behavior
CV pipeline 5 fps, 96×96 grayscale, event-based walker detector (foreground-count state machine; centroid-trajectory direction heuristic) with post-fire refractory period
Detection LED Single blink on entry, double blink on exit (preserves upload/no-WiFi status LED)
BLE scanner Continuous passive scan; deinits during hourly upload to free heap
Reporter Hourly HMAC-signed POST; 60s boot report for fast connectivity check
Provisioning Captive portal AP on first boot for WiFi setup
OTA Arduino OTA; operator push via ota_push.py

Reporting intervals

  • First report: 60 seconds after NTP sync (connectivity check)
  • Subsequent reports: every 3600 seconds

Counting model — event-based walker detector

The CV pipeline is a single event state machine (no per-blob tracking for counting). Per-frame foreground pixel count gates event start and end; centroid trajectory within the active event decides direction.

Event lifecycle:

  1. Idle → Active: fg_count ≥ CV_EVENT_ENTER_THRESH (250 px) fires event start. Background updates freeze while the event is active so the walker does not get absorbed into the baseline.
  2. Active accumulation: every frame updates first_c (once), min_c, max_c, last_c, min_y_seen, max_y_seen, and the frame count.
  3. Active → End (either):
    • Quiet exit: fg_count < CV_EVENT_EXIT_THRESH (150 px) for CV_EVENT_QUIET_FRAMES (3) consecutive frames — walker has left.
    • Timeout: event_frame_count > CV_EVENT_MAX_FRAMES (25 frames ≈ 5s).
  4. On end, the event is finalized: gated by minimum duration, vertical extent (must span a large fraction of the frame), and minimum centroid trajectory magnitude. Background snaps to the current frame.
  5. A refractory period (CV_EVENT_REFRACTORY_FRAMES = 10 ≈ 2s) after a fire blocks a new event from starting — absorbs residual lingering motion that would otherwise double-count.

Direction heuristic (applied only if the event passes all gates):

  • up_score = first_c min_c (how far centroid excursed upward)
  • down_score = max_c first_c (how far it excursed downward)
  • Quiet-exit events: is_entry = (up_score ≥ down_score)
  • Timeout events: is_entry = (last_c < first_c) — net displacement is more reliable than excursion when the walker is still in frame at timeout.

Per-mount convention: centroid moving up through the frame (y decreasing) = entry into the store.

Directional counting — known limitation

Per-walk direction labelling is unreliable at the current mount. In bench testing (8 alternating entry/exit walks at 4s intervals, 7' overhead mount pointing straight down):

  • Event detection: 8/8 (100%) — every walk produced exactly one event.
  • Aggregate split: 4 entries + 4 exits — matches the 4+4 ground truth.
  • Per-walk direction: 4/8 (50%) — essentially a coin flip.

At this mount, entries and exits produce nearly identical centroid trajectories: both begin near mid-frame (walker is already large when fg_count crosses 250), both reach a peak excursion toward the top, and both end near mid-frame (walker's tail is still visible when fg_count drops below 150). No heuristic over the recorded centroid statistics separates them with better than ~50% accuracy on alternating walks.

What we ship, and what the server should trust:

  • Gross traffic (entries + exits) is accurate. This is the number downstream analytics should use as "people through the door this hour."
  • Directional split is reported but unreliable. Treat individual entries and exits values as a best-effort labelling. Do not infer net flow or dwell from them.

To actually recover per-walk direction would require either a physical change (raise or tilt the camera so walkers enter/leave through the frame edges) or a richer signal than centroid statistics (e.g. time-resolved optical flow, or a second sensor). That work is out of scope for v1.

See firmware/lib/cv/cv.h for tuning constants and cv.cpp for the finalize logic.

Operator Setup

1. Flash firmware

cd firmware
pio run -t upload --upload-port /dev/ttyUSB0

2. Provision device identity

python tools/flash_device.py \
  --port /dev/ttyUSB0 \
  --device-id dc-0042 \
  --location-id retailer-123 \
  --hmac-secret <32-byte-hex> \
  --wifi-ssid "StoreWiFi" \
  --wifi-password "secret"

WiFi credentials are optional — if omitted, device starts captive portal on boot.

Re-provision after firmware uploads. Flashing firmware via pio run -t upload may clear the NVS partition on this board. If the device boots into a ~1 Hz LED blink (the "not provisioned" fatal state) after a firmware update, re-run flash_device.py with the same credentials. See Troubleshooting.

3. OTA updates

python tools/ota_push.py \
  --host dc-0042.local \
  --firmware firmware/.pio/build/timercam/firmware.bin

End User Setup

  1. Mount device overhead, camera pointing straight down
  2. Plug into USB power
  3. Connect phone to DoorCounter-Setup WiFi
  4. Browser opens automatically → enter store WiFi password → done

LED indicators: Red = no WiFi · Blue = counting · Yellow = uploading · Brief flash (×1) on entry · Brief flash (×2) on exit

API

Endpoint: http://logs.research.bike

Endpoint Data
POST /api/v1/camera/events/batch Hourly entry/exit counts
POST /api/v1/events/batch Hourly BLE proximity records
POST /api/v1/heartbeat Device health (uptime, RSSI, pending records)

All requests are HMAC-SHA256 signed. See design spec for full API shapes and auth scheme.

Project Structure

DoorCounter/
├── firmware/
│   ├── platformio.ini
│   ├── lib/
│   │   ├── cv/            — CV pipeline (event state machine, centroid-trajectory direction)
│   │   └── hmac/          — HMAC-SHA256 signing library
│   └── src/
│       ├── main.cpp       — FreeRTOS tasks, boot sequence
│       ├── config.*       — NVS read/write
│       ├── provisioning.* — captive portal
│       ├── camera.*       — frame capture + CV pipeline
│       ├── ble_scanner.*  — BLE passive scan
│       └── reporter.*     — hourly batch POST + local buffer
├── tools/
│   ├── flash_device.py    — NVS provisioning script
│   ├── ota_push.py        — OTA push script
│   └── serial_monitor.py  — reset + read serial with timestamps (diagnostic)
├── docs/
│   ├── server-prompt-crossing-cooldown.md — server-side coordination notes
│   └── superpowers/specs/2026-04-13-door-counter-design.md
└── server/                — API server (separate deployment)

Troubleshooting

Symptom Likely cause Remedy
~1 Hz LED blink after boot, no serial beyond esp_core_dump_flash: No core dump partition found! NVS missing device_id / location_id / hmac_secret. Commonly triggered by a firmware upload wiping NVS. Re-run flash_device.py with the device's known credentials.
Device stays on DoorCounter-Setup AP instead of joining customer WiFi SSID/password in NVS wrong, or network out of range. Connect phone to DoorCounter-Setup → captive portal → re-enter WiFi. Or reflash NVS with correct --wifi-ssid / --wifi-password.
No entries/exits counted for a known-walking doorway WiFi captive portal still up (camera task starts only after connect); or camera blocked/unfocused. Check LED: solid on = booting/uploading, off = counting. Run serial_monitor.py to see [CV] entry/exit log lines.

Capture a boot log with timestamps:

python tools/serial_monitor.py --port /dev/ttyUSB0 --reset --timestamp --seconds 30

Deploying firmware 1.1 (network resilience)

Flash command

cd firmware && pio run -e timercam -t upload

Expected first boot

On the serial log (115200 baud), the device prints the boot banner, then initializes event_log, then records the reset reason via EVT_BOOT. The first heartbeat fires roughly 60-70s after power-on (15s WiFi busy-wait + NTP sync + 60s BOOT_REPORT_DELAY_S). Monitor with pio device monitor or:

python tools/serial_monitor.py --port /dev/ttyUSB0 --reset --timestamp --seconds 90

What's new in 1.1

  • Event-driven WiFi reconnect with 1s→60s exponential backoff (net_guard module); disconnect reasons logged.
  • HTTP timeouts (5s connect / 10s response) + 3-try retry on every POST.
  • ESP-IDF Task Watchdog (30s) on camera, reporter, and loop tasks; panic → reboot → reason surfaces in the next heartbeat.
  • Software heartbeat-miss watchdog: 6 consecutive missed heartbeats (~6 h) triggers a clean reboot.
  • Persistent NVS event-log ring buffer (32 entries) surfaced in the heartbeat's recent_events field.
  • New heartbeat fields: reset_reason, heap_free, heap_min_free, last_disconnect_code, recent_events.

24-hour field checks

After deploying a device, run through this checklist against the server's heartbeat records at the 24-hour mark:

  • Heartbeat count ≥ 22 — ≥ 92% uptime across 24 h at the hourly cadence.
  • No sustained t=6 (EVT_HEARTBEAT_MISS) entries in recent_events — transient singletons are expected; repeated misses indicate a sticky network problem worth investigating.
  • heap_min_free stable day over day — a downward drift indicates a leak. Alert threshold: min-free drops by more than 20% vs baseline.
  • last_disconnect_code matches known AP behavior — reason 8 (assoc lost) and reason 15 (4-way handshake timeout) are common on busy APs; recurring reason 200+ indicates a firmware bug.
  • reset_reason has no unexpected values — see table below.
reset_reason Meaning Expected?
1 Power-on Normal immediately after a deployment.
4 Software reset (our ESP.restart()) Correlate with EVT_REBOOT in recent_events.
6 Task watchdog Investigate — a task hung for 30s.
7 Brownout Investigate power supply / USB cable.
8 SDIO reset Unusual — investigate.

Decoding recent_events

The recent_events array is a ring buffer of {t, d0, d1, ts} entries. Tag definitions live in firmware/lib/event_log/event_log.h:

t Event d0 d1
1 EVT_BOOT esp_reset_reason()
2 EVT_WIFI_UP RSSI
3 EVT_WIFI_DOWN disconnect reason code; 0xFF = silent-death fallback
4 EVT_HTTP_OK fnv1a-16 path hash elapsed ms (capped at 65535)
5 EVT_HTTP_FAIL path hash HTTP status or negative errno cast to uint16
6 EVT_HEARTBEAT_MISS consecutive miss count
7 EVT_NTP_SYNC reserved
8 EVT_REBOOT RebootReason: 1=HEARTBEAT_MISS, 2=FACTORY_RESET, 3=OTA, 4=WIFI_REPROV

Server-side decoder tables (EVENT_TAG_DECODER, REBOOT_REASON_DECODER) live in server/heartbeat_diagnostics_stub.py.