Peter Woolery e2dbe6a2d5 fix(server): COALESCE diagnostic columns so v1.0 heartbeats don't clear v1.1 data
store_heartbeat_diagnostics() unconditionally SET each diagnostic column
to its parameter, so a v1.0.0 heartbeat (which omits the five v1.1.0
fields and leaves them as None after Pydantic parsing) erased previously
stored diagnostics for that device. Wrap each parameter in
COALESCE(?, column_name) so omitted fields preserve the existing value.

Found via adversarial review (gpt-5.5 reviewer, run 2026-05-01-191359).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 13:19:23 -07:00

DoorCounter

Retail door traffic counter using M5Stack TimerCamera-F (ESP32 + OV3660). Counts walker traversals via overhead camera CV, passively scans BLE foot traffic, and reports hourly to logs.research.bike.

Known limitations.

  • Directional accuracy. Counts are reported as {entries, exits} for API compatibility, but per-walk direction labelling is not reliable at the current mount (7' overhead, straight down). Bench testing: event detection 100% (8/8), per-walk direction ~50% (coin flip). Trust gross traffic: entries + exits ≈ total walkers. See Directional counting.
  • Detection latency. A walker takes 35 seconds from entering the FOV to being registered as a count — the state machine waits for the walker to clear the frame (or a 5s timeout) before finalizing. Counts are not instantaneous; hourly aggregation is the intended consumption mode.

Hardware

Component Source Notes
Camera M5Stack TimerCamera-F (OV3660 fisheye, PSRAM) ESP32 + WiFi/BLE on board
USB cable USB-A → USB-C, right-angle Right-angle plug helps with overhead mounts
Power supply 5V USB wall adapter Any 5V/1A+ USB charger works
  • Mount: Overhead, camera pointing straight down, centered above doorway (~7' / 2.1m height)
  • Power draw: ~750 mW measured at the wall (camera + WiFi + BLE all active). Runs cool — fanless, can be sealed in a small enclosure. Annual energy cost at US residential rates is well under $1.

Quick Start (semi-technical)

The fastest path from "box arrived" to "counts in the dashboard." Comfortable with a terminal but not necessarily an embedded developer? Start here.

You will need: the camera + cable + power supply listed above, a Linux/macOS computer with USB, and ~20 minutes.

1. Install the toolchain (one-time)

# Python 3.10+ and pip
pip install --user platformio esptool esp-idf-nvs-partition-gen

PlatformIO installs the ESP32 compiler on first build — expect a few minutes the first time.

2. Clone this repo

git clone https://github.com/<your-org>/DoorCounter.git
cd DoorCounter

3. Plug the camera in

Connect the USB-C cable to the TimerCamera and the other end to your computer. On Linux it appears as /dev/ttyUSB0; on macOS as /dev/tty.usbserial-*. If you don't see it, install CP210x USB drivers.

4. Flash the firmware

cd firmware
pio run -t upload --upload-port /dev/ttyUSB0

5. Provision the device with its credentials

Pick a unique device ID (e.g. dc-0001), a location ID, and generate a 32-byte HMAC secret. The server admin must record this same secret — counts won't be accepted without it.

# Generate a fresh secret
openssl rand -hex 32 > my-device-secret.txt

# Provision
python tools/flash_device.py \
  --port /dev/ttyUSB0 \
  --device-id dc-0001 \
  --location-id my-store \
  --hmac-secret "$(cat my-device-secret.txt)" \
  --wifi-ssid "MyStoreWiFi" \
  --wifi-password "wifi-password-here"

If you skip --wifi-ssid/--wifi-password, the device opens a DoorCounter-Setup WiFi access point on boot. Connect a phone to it and enter the credentials in the captive portal.

6. Mount the device

  1. Position above the doorway, camera lens pointing straight down (~7' / 2.1m up).
  2. Plug into the wall adapter — that's it. The LED turns red while joining WiFi, then off once it's counting.
  3. First heartbeat lands at the server within ~60 seconds; first hourly count batch arrives at the top of the next hour.

What "working" looks like

  • LED behavior: off = counting normally · red = no WiFi · yellow = uploading · brief flash when a walker is registered (1 flash = entry, 2 flashes = exit).
  • A walker takes 35 seconds from entering the FOV to triggering the LED flash — this is normal.
  • Hourly uploads to logs.research.bike (or your configured server) include the entry/exit counts since the last report.

If something is off

Symptom Try
Red LED stays on Wrong WiFi password — re-run step 5, or use the DoorCounter-Setup captive portal.
LED blinks ~1 Hz forever (or device reboots in a loop) NVS got wiped — re-run step 5 with the same credentials.
No counts appearing on server Run python tools/serial_monitor.py --port /dev/ttyUSB0 --reset --timestamp --seconds 30 and watch for [CV] entry/exit lines as you walk under it.

For deeper troubleshooting see Troubleshooting and Operator Setup.

Firmware

Built with PlatformIO. Target: timercam.

cd firmware
pio run -t upload --upload-port /dev/ttyUSB0

What it does

Module Behavior
CV pipeline 5 fps, 96×96 grayscale, event-based walker detector (foreground-count state machine; centroid-trajectory direction heuristic) with post-fire refractory period
Detection LED Single blink on entry, double blink on exit (preserves upload/no-WiFi status LED)
BLE scanner Continuous passive scan; deinits during hourly upload to free heap
Reporter Hourly HMAC-signed POST; 60s boot report for fast connectivity check
Provisioning Captive portal AP on first boot for WiFi setup
OTA Arduino OTA; operator push via ota_push.py

Reporting intervals

  • First report: 60 seconds after NTP sync (connectivity check)
  • Subsequent reports: every 3600 seconds

Counting model — event-based walker detector

The CV pipeline is a single event state machine (no per-blob tracking for counting). Per-frame foreground pixel count gates event start and end; centroid trajectory within the active event decides direction.

Event lifecycle:

  1. Idle → Active: fg_count ≥ CV_EVENT_ENTER_THRESH (250 px) fires event start. Background updates freeze while the event is active so the walker does not get absorbed into the baseline.
  2. Active accumulation: every frame updates first_c (once), min_c, max_c, last_c, min_y_seen, max_y_seen, and the frame count.
  3. Active → End (either):
    • Quiet exit: fg_count < CV_EVENT_EXIT_THRESH (150 px) for CV_EVENT_QUIET_FRAMES (3) consecutive frames — walker has left.
    • Timeout: event_frame_count > CV_EVENT_MAX_FRAMES (25 frames ≈ 5s).
  4. On end, the event is finalized: gated by minimum duration, vertical extent (must span a large fraction of the frame), and minimum centroid trajectory magnitude. Background snaps to the current frame.
  5. A refractory period (CV_EVENT_REFRACTORY_FRAMES = 10 ≈ 2s) after a fire blocks a new event from starting — absorbs residual lingering motion that would otherwise double-count.

Direction heuristic (applied only if the event passes all gates):

  • up_score = first_c min_c (how far centroid excursed upward)
  • down_score = max_c first_c (how far it excursed downward)
  • Quiet-exit events: is_entry = (up_score ≥ down_score)
  • Timeout events: is_entry = (last_c < first_c) — net displacement is more reliable than excursion when the walker is still in frame at timeout.

Per-mount convention: centroid moving up through the frame (y decreasing) = entry into the store.

Directional counting — known limitation

Per-walk direction labelling is unreliable at the current mount. In bench testing (8 alternating entry/exit walks at 4s intervals, 7' overhead mount pointing straight down):

  • Event detection: 8/8 (100%) — every walk produced exactly one event.
  • Aggregate split: 4 entries + 4 exits — matches the 4+4 ground truth.
  • Per-walk direction: 4/8 (50%) — essentially a coin flip.

At this mount, entries and exits produce nearly identical centroid trajectories: both begin near mid-frame (walker is already large when fg_count crosses 250), both reach a peak excursion toward the top, and both end near mid-frame (walker's tail is still visible when fg_count drops below 150). No heuristic over the recorded centroid statistics separates them with better than ~50% accuracy on alternating walks.

What we ship, and what the server should trust:

  • Gross traffic (entries + exits) is accurate. This is the number downstream analytics should use as "people through the door this hour."
  • Directional split is reported but unreliable. Treat individual entries and exits values as a best-effort labelling. Do not infer net flow or dwell from them.

To actually recover per-walk direction would require either a physical change (raise or tilt the camera so walkers enter/leave through the frame edges) or a richer signal than centroid statistics (e.g. time-resolved optical flow, or a second sensor). That work is out of scope for v1.

See firmware/lib/cv/cv.h for tuning constants and cv.cpp for the finalize logic.

Operator Setup

1. Flash firmware

cd firmware
pio run -t upload --upload-port /dev/ttyUSB0

2. Provision device identity

Generate a fresh 32-byte HMAC secret (64 hex chars) and stash it where you won't lose it — the server must store the same value or counts will be rejected:

# Generate and save (one device per file; never commit these)
mkdir -p .agent
openssl rand -hex 32 > .agent/dc-0042-secret
chmod 600 .agent/dc-0042-secret

No openssl? Equivalents:

  • python3 -c 'import secrets; print(secrets.token_hex(32))'
  • head -c 32 /dev/urandom | xxd -p -c 64

Then provision:

python tools/flash_device.py \
  --port /dev/ttyUSB0 \
  --device-id dc-0042 \
  --location-id retailer-123 \
  --hmac-secret "$(cat .agent/dc-0042-secret)" \
  --wifi-ssid "StoreWiFi" \
  --wifi-password "secret"

WiFi credentials are optional — if omitted, device starts captive portal on boot.

Known-good command for dc-0002 (dev device at research.bike):

python tools/flash_device.py \
  --port /dev/ttyUSB0 \
  --device-id dc-0002 \
  --location-id retailer-123 \
  --hmac-secret "$(cat .agent/dc-0002-secret)" \
  --wifi-ssid Elly-Fi \
  --wifi-password <ask> \
  --line-offset 50

Secret is stored in .agent/dc-0002-secret (gitignored). Server must already know this secret — do not rotate without updating the server side.

Re-provision after firmware uploads. Flashing firmware via pio run -t upload may clear the NVS partition on this board.

  • FW 1.0: device boots into a ~1 Hz LED blink (hang in "not provisioned" fatal).
  • FW 1.1+: device reboot-loops with FATAL: device_id/location_id/hmac_secret not provisioned followed by rst:0xc (SW_CPU_RESET) (FATAL paths now reboot instead of hang).

Either way, re-run flash_device.py with the same credentials. See Troubleshooting.

3. OTA updates

python tools/ota_push.py \
  --host dc-0042.local \
  --firmware firmware/.pio/build/timercam/firmware.bin

End User Setup

  1. Mount device overhead, camera pointing straight down
  2. Plug into USB power
  3. Connect phone to DoorCounter-Setup WiFi
  4. Browser opens automatically → enter store WiFi password → done

LED indicators: Red = no WiFi · Blue = counting · Yellow = uploading · Brief flash (×1) on entry · Brief flash (×2) on exit

API

Endpoint: http://logs.research.bike

Endpoint Data
POST /api/v1/camera/events/batch Hourly entry/exit counts
POST /api/v1/events/batch Hourly BLE proximity records
POST /api/v1/heartbeat Device health (uptime, RSSI, pending records)

All requests are HMAC-SHA256 signed. See design spec for full API shapes and auth scheme.

Project Structure

DoorCounter/
├── firmware/
│   ├── platformio.ini
│   ├── lib/
│   │   ├── cv/            — CV pipeline (event state machine, centroid-trajectory direction)
│   │   └── hmac/          — HMAC-SHA256 signing library
│   └── src/
│       ├── main.cpp       — FreeRTOS tasks, boot sequence
│       ├── config.*       — NVS read/write
│       ├── provisioning.* — captive portal
│       ├── camera.*       — frame capture + CV pipeline
│       ├── ble_scanner.*  — BLE passive scan
│       └── reporter.*     — hourly batch POST + local buffer
├── tools/
│   ├── flash_device.py    — NVS provisioning script
│   ├── ota_push.py        — OTA push script
│   └── serial_monitor.py  — reset + read serial with timestamps (diagnostic)
├── docs/
│   ├── server-prompt-crossing-cooldown.md — server-side coordination notes
│   └── superpowers/specs/2026-04-13-door-counter-design.md
└── server/                — API server (separate deployment)

Troubleshooting

Symptom Likely cause Remedy
~1 Hz LED blink after boot (FW 1.0), OR reboot loop with FATAL: device_id/location_id/hmac_secret not provisionedrst:0xc (SW_CPU_RESET) (FW 1.1+) NVS missing device_id / location_id / hmac_secret. Commonly triggered by a firmware upload wiping NVS. FW 1.1+ reboots on FATAL instead of hanging. Re-run flash_device.py with the device's known credentials (see section 2 for dc-0002).
Device stays on DoorCounter-Setup AP instead of joining customer WiFi SSID/password in NVS wrong, or network out of range. Connect phone to DoorCounter-Setup → captive portal → re-enter WiFi. Or reflash NVS with correct --wifi-ssid / --wifi-password.
No entries/exits counted for a known-walking doorway WiFi captive portal still up (camera task starts only after connect); or camera blocked/unfocused. Check LED: solid on = booting/uploading, off = counting. Run serial_monitor.py to see [CV] entry/exit log lines.

Capture a boot log with timestamps:

python tools/serial_monitor.py --port /dev/ttyUSB0 --reset --timestamp --seconds 30

Deploying firmware 1.1 (network resilience)

Before you flash

Firmware 1.1 adds five new fields to the POST /api/v1/heartbeat payload (reset_reason, heap_free, heap_min_free, last_disconnect_code, recent_events). The real server must accept these optional fields before you deploy firmware 1.1, or strict-schema validation will 4xx every heartbeat; after 6 consecutive misses (~6h) the heartbeat-miss watchdog will reboot the device, producing a reboot loop.

Reference migration and handler code for the real server are in this repo:

  • server/heartbeat_diagnostics_stub.py — Pydantic model extensions, store_heartbeat_diagnostics() helper, and EVENT_TAG_DECODER / REBOOT_REASON_DECODER reference tables.
  • server/migrations/005_heartbeat_diagnostics.sql — adds five nullable columns to the heartbeats table (adjust table name to match the real server's schema).

Copy the stub additions into the production server repo, run the migration, and confirm a v1.1.0-shape heartbeat returns 200 before you flash any device.

Flash command

cd firmware && pio run -e timercam -t upload

If the device reboot-loops after flashing with FATAL: device_id/location_id/hmac_secret not provisioned, NVS was wiped. Re-run flash_device.py (see section 2). FW 1.1 turned the old FW 1.0 LED-blink hang into an explicit reboot loop; same root cause, same fix.

Expected first boot

On the serial log (115200 baud), the device prints the boot banner, then initializes event_log, then records the reset reason via EVT_BOOT. The first heartbeat fires roughly 60-70s after power-on (15s WiFi busy-wait + NTP sync + 60s BOOT_REPORT_DELAY_S). Monitor with pio device monitor or:

python tools/serial_monitor.py --port /dev/ttyUSB0 --reset --timestamp --seconds 90

What's new in 1.1

  • Event-driven WiFi reconnect with 1s→60s exponential backoff (net_guard module); disconnect reasons logged.
  • HTTP timeouts (5s connect / 10s response) + 3-try retry on every POST.
  • ESP-IDF Task Watchdog (30s) on camera, reporter, and loop tasks; panic → reboot → reason surfaces in the next heartbeat.
  • Software heartbeat-miss watchdog: 6 consecutive missed heartbeats (~6 h) triggers a clean reboot.
  • Persistent NVS event-log ring buffer (32 entries) surfaced in the heartbeat's recent_events field.
  • New heartbeat fields: reset_reason, heap_free, heap_min_free, last_disconnect_code, recent_events.

24-hour field checks

After deploying a device, run through this checklist against the server's heartbeat records at the 24-hour mark:

  • Heartbeat count ≥ 22 — ≥ 92% uptime across 24 h at the hourly cadence.
  • No sustained t=6 (EVT_HEARTBEAT_MISS) entries in recent_events — transient singletons are expected; repeated misses indicate a sticky network problem worth investigating.
  • heap_min_free stable day over day — a downward drift indicates a leak. Alert threshold: min-free drops by more than 20% vs baseline.
  • last_disconnect_code matches known AP behavior — reason 8 (assoc lost) and reason 15 (4-way handshake timeout) are common on busy APs; recurring reason 200+ indicates a firmware bug.
  • reset_reason has no unexpected values — see table below.
reset_reason Meaning Expected?
1 Power-on Normal immediately after a deployment.
4 Software reset (our ESP.restart()) Correlate with EVT_REBOOT in recent_events.
6 Task watchdog Investigate — a task hung for 30s.
7 Brownout Investigate power supply / USB cable.
8 SDIO reset Unusual — investigate.

Decoding recent_events

The recent_events array is a ring buffer of {t, d0, d1, ts} entries. Tag definitions live in firmware/lib/event_log/event_log.h:

t Event d0 d1
1 EVT_BOOT esp_reset_reason()
2 EVT_WIFI_UP RSSI
3 EVT_WIFI_DOWN disconnect reason code; 0xFF = silent-death fallback
4 EVT_HTTP_OK fnv1a-16 path hash elapsed ms (capped at 65535)
5 EVT_HTTP_FAIL path hash HTTP status or negative errno cast to uint16
6 EVT_HEARTBEAT_MISS consecutive miss count
7 EVT_NTP_SYNC reserved
8 EVT_REBOOT RebootReason: 1=HEARTBEAT_MISS, 2=FACTORY_RESET, 3=OTA, 4=WIFI_REPROV

Server-side decoder tables (EVENT_TAG_DECODER, REBOOT_REASON_DECODER) live in server/heartbeat_diagnostics_stub.py.

Description
No description provided
Readme 293 KiB
Languages
C++ 58.6%
Python 36.8%
C 4.6%