Files
DoorCounter/README.md
Peter Woolery 461ed7d888 docs(readme): add HMAC secret generation command to operator setup
Step 2 now shows openssl rand -hex 32 (with python and /dev/urandom
fallbacks) and writes to .agent/dc-<id>-secret with chmod 600, so the
flash_device.py example can read $(cat ...) the same way the known-good
dc-0002 command does.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 14:45:08 -07:00

413 lines
18 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# DoorCounter
Retail door traffic counter using M5Stack TimerCamera-F (ESP32 + OV3660). Counts walker traversals via overhead camera CV, passively scans BLE foot traffic, and reports hourly to `logs.research.bike`.
> **Known limitations.**
> - **Directional accuracy.** Counts are reported as `{entries, exits}` for API compatibility, but **per-walk direction labelling is not reliable at the current mount (7' overhead, straight down).** Bench testing: event detection 100% (8/8), per-walk direction ~50% (coin flip). **Trust gross traffic: `entries + exits` ≈ total walkers.** See [Directional counting](#directional-counting).
> - **Detection latency.** A walker takes **35 seconds** from entering the FOV to being registered as a count — the state machine waits for the walker to clear the frame (or a 5s timeout) before finalizing. Counts are not instantaneous; hourly aggregation is the intended consumption mode.
## Hardware
| Component | Source | Notes |
|-----------|--------|-------|
| **Camera** | [M5Stack TimerCamera-F (OV3660 fisheye, PSRAM)](https://shop.m5stack.com/products/esp32-psram-timer-camera-fisheye-ov3660) | ESP32 + WiFi/BLE on board |
| **USB cable** | [USB-A → USB-C, right-angle](https://www.amazon.com/dp/B0DWMPVP4F) | Right-angle plug helps with overhead mounts |
| **Power supply** | [5V USB wall adapter](https://www.amazon.com/dp/B0B2WLSY9D) | Any 5V/1A+ USB charger works |
- **Mount**: Overhead, camera pointing straight down, centered above doorway (~7' / 2.1m height)
- **Power draw**: **~750 mW measured at the wall** (camera + WiFi + BLE all active). Runs cool — fanless, can be sealed in a small enclosure. Annual energy cost at US residential rates is well under $1.
## Quick Start (semi-technical)
The fastest path from "box arrived" to "counts in the dashboard." Comfortable with a terminal but not necessarily an embedded developer? Start here.
**You will need**: the camera + cable + power supply listed above, a Linux/macOS computer with USB, and ~20 minutes.
### 1. Install the toolchain (one-time)
```bash
# Python 3.10+ and pip
pip install --user platformio esptool esp-idf-nvs-partition-gen
```
PlatformIO installs the ESP32 compiler on first build — expect a few minutes the first time.
### 2. Clone this repo
```bash
git clone https://github.com/<your-org>/DoorCounter.git
cd DoorCounter
```
### 3. Plug the camera in
Connect the USB-C cable to the TimerCamera and the other end to your computer. On Linux it appears as `/dev/ttyUSB0`; on macOS as `/dev/tty.usbserial-*`. If you don't see it, install [CP210x USB drivers](https://www.silabs.com/developer-tools/usb-to-uart-bridge-vcp-drivers).
### 4. Flash the firmware
```bash
cd firmware
pio run -t upload --upload-port /dev/ttyUSB0
```
### 5. Provision the device with its credentials
Pick a unique device ID (e.g. `dc-0001`), a location ID, and generate a 32-byte HMAC secret. The server admin must record this same secret — counts won't be accepted without it.
```bash
# Generate a fresh secret
openssl rand -hex 32 > my-device-secret.txt
# Provision
python tools/flash_device.py \
--port /dev/ttyUSB0 \
--device-id dc-0001 \
--location-id my-store \
--hmac-secret "$(cat my-device-secret.txt)" \
--wifi-ssid "MyStoreWiFi" \
--wifi-password "wifi-password-here"
```
> If you skip `--wifi-ssid`/`--wifi-password`, the device opens a `DoorCounter-Setup` WiFi access point on boot. Connect a phone to it and enter the credentials in the captive portal.
### 6. Mount the device
1. Position above the doorway, camera lens pointing straight down (~7' / 2.1m up).
2. Plug into the wall adapter — that's it. The LED turns red while joining WiFi, then off once it's counting.
3. First heartbeat lands at the server within ~60 seconds; first hourly count batch arrives at the top of the next hour.
### What "working" looks like
- LED behavior: **off** = counting normally · **red** = no WiFi · **yellow** = uploading · **brief flash** when a walker is registered (1 flash = entry, 2 flashes = exit).
- A walker takes 35 seconds from entering the FOV to triggering the LED flash — this is normal.
- Hourly uploads to `logs.research.bike` (or your configured server) include the entry/exit counts since the last report.
### If something is off
| Symptom | Try |
|---------|-----|
| Red LED stays on | Wrong WiFi password — re-run step 5, or use the `DoorCounter-Setup` captive portal. |
| LED blinks ~1 Hz forever (or device reboots in a loop) | NVS got wiped — re-run step 5 with the same credentials. |
| No counts appearing on server | Run `python tools/serial_monitor.py --port /dev/ttyUSB0 --reset --timestamp --seconds 30` and watch for `[CV] entry/exit` lines as you walk under it. |
For deeper troubleshooting see [Troubleshooting](#troubleshooting) and [Operator Setup](#operator-setup).
## Firmware
Built with PlatformIO. Target: `timercam`.
```bash
cd firmware
pio run -t upload --upload-port /dev/ttyUSB0
```
### What it does
| Module | Behavior |
|--------|----------|
| CV pipeline | 5 fps, 96×96 grayscale, event-based walker detector (foreground-count state machine; centroid-trajectory direction heuristic) with post-fire refractory period |
| Detection LED | Single blink on entry, double blink on exit (preserves upload/no-WiFi status LED) |
| BLE scanner | Continuous passive scan; deinits during hourly upload to free heap |
| Reporter | Hourly HMAC-signed POST; 60s boot report for fast connectivity check |
| Provisioning | Captive portal AP on first boot for WiFi setup |
| OTA | Arduino OTA; operator push via `ota_push.py` |
### Reporting intervals
- **First report**: 60 seconds after NTP sync (connectivity check)
- **Subsequent reports**: every 3600 seconds
### Counting model — event-based walker detector
The CV pipeline is a **single event state machine** (no per-blob tracking
for counting). Per-frame foreground pixel count gates event start and end;
centroid trajectory within the active event decides direction.
**Event lifecycle:**
1. **Idle → Active**: `fg_count ≥ CV_EVENT_ENTER_THRESH` (250 px) fires event start.
Background updates freeze while the event is active so the walker does
not get absorbed into the baseline.
2. **Active accumulation**: every frame updates `first_c` (once), `min_c`,
`max_c`, `last_c`, `min_y_seen`, `max_y_seen`, and the frame count.
3. **Active → End** (either):
- **Quiet exit**: `fg_count < CV_EVENT_EXIT_THRESH` (150 px) for
`CV_EVENT_QUIET_FRAMES` (3) consecutive frames — walker has left.
- **Timeout**: `event_frame_count > CV_EVENT_MAX_FRAMES` (25 frames ≈ 5s).
4. On end, the event is finalized: gated by minimum duration, vertical
extent (must span a large fraction of the frame), and minimum centroid
trajectory magnitude. Background snaps to the current frame.
5. A **refractory period** (`CV_EVENT_REFRACTORY_FRAMES` = 10 ≈ 2s) after
a fire blocks a new event from starting — absorbs residual lingering
motion that would otherwise double-count.
**Direction heuristic** (applied only if the event passes all gates):
- `up_score = first_c min_c` (how far centroid excursed upward)
- `down_score = max_c first_c` (how far it excursed downward)
- Quiet-exit events: `is_entry = (up_score ≥ down_score)`
- Timeout events: `is_entry = (last_c < first_c)` — net displacement is
more reliable than excursion when the walker is still in frame at timeout.
Per-mount convention: centroid moving **up through the frame** (y decreasing)
= **entry** into the store.
### Directional counting — known limitation
**Per-walk direction labelling is unreliable at the current mount.** In
bench testing (8 alternating entry/exit walks at 4s intervals, 7' overhead
mount pointing straight down):
- **Event detection**: 8/8 (100%) — every walk produced exactly one event.
- **Aggregate split**: 4 entries + 4 exits — matches the 4+4 ground truth.
- **Per-walk direction**: 4/8 (50%) — essentially a coin flip.
At this mount, entries and exits produce nearly identical centroid
trajectories: both begin near mid-frame (walker is already large when
`fg_count` crosses 250), both reach a peak excursion toward the top, and
both end near mid-frame (walker's tail is still visible when `fg_count`
drops below 150). No heuristic over the recorded centroid statistics
separates them with better than ~50% accuracy on alternating walks.
**What we ship, and what the server should trust:**
- **Gross traffic (`entries + exits`) is accurate.** This is the number
downstream analytics should use as "people through the door this hour."
- **Directional split is reported but unreliable.** Treat individual
`entries` and `exits` values as a best-effort labelling. Do not infer
net flow or dwell from them.
To actually recover per-walk direction would require either a physical
change (raise or tilt the camera so walkers enter/leave through the frame
edges) or a richer signal than centroid statistics (e.g. time-resolved
optical flow, or a second sensor). That work is out of scope for v1.
See `firmware/lib/cv/cv.h` for tuning constants and `cv.cpp` for the
finalize logic.
## Operator Setup
### 1. Flash firmware
```bash
cd firmware
pio run -t upload --upload-port /dev/ttyUSB0
```
### 2. Provision device identity
Generate a fresh 32-byte HMAC secret (64 hex chars) and stash it where you
won't lose it — the server must store the same value or counts will be
rejected:
```bash
# Generate and save (one device per file; never commit these)
mkdir -p .agent
openssl rand -hex 32 > .agent/dc-0042-secret
chmod 600 .agent/dc-0042-secret
```
> No `openssl`? Equivalents:
> - `python3 -c 'import secrets; print(secrets.token_hex(32))'`
> - `head -c 32 /dev/urandom | xxd -p -c 64`
Then provision:
```bash
python tools/flash_device.py \
--port /dev/ttyUSB0 \
--device-id dc-0042 \
--location-id retailer-123 \
--hmac-secret "$(cat .agent/dc-0042-secret)" \
--wifi-ssid "StoreWiFi" \
--wifi-password "secret"
```
WiFi credentials are optional — if omitted, device starts captive portal on boot.
**Known-good command for dc-0002** (dev device at research.bike):
```bash
python tools/flash_device.py \
--port /dev/ttyUSB0 \
--device-id dc-0002 \
--location-id retailer-123 \
--hmac-secret "$(cat .agent/dc-0002-secret)" \
--wifi-ssid Elly-Fi \
--wifi-password <ask> \
--line-offset 50
```
Secret is stored in `.agent/dc-0002-secret` (gitignored). Server must already
know this secret — do not rotate without updating the server side.
> **Re-provision after firmware uploads.** Flashing firmware via
> `pio run -t upload` may clear the NVS partition on this board.
> - **FW 1.0**: device boots into a ~1 Hz LED blink (hang in "not provisioned" fatal).
> - **FW 1.1+**: device reboot-loops with `FATAL: device_id/location_id/hmac_secret not provisioned`
> followed by `rst:0xc (SW_CPU_RESET)` (FATAL paths now reboot instead of hang).
>
> Either way, re-run `flash_device.py` with the same credentials. See
> [Troubleshooting](#troubleshooting).
### 3. OTA updates
```bash
python tools/ota_push.py \
--host dc-0042.local \
--firmware firmware/.pio/build/timercam/firmware.bin
```
## End User Setup
1. Mount device overhead, camera pointing straight down
2. Plug into USB power
3. Connect phone to `DoorCounter-Setup` WiFi
4. Browser opens automatically → enter store WiFi password → done
**LED indicators**: Red = no WiFi · Blue = counting · Yellow = uploading · Brief flash (×1) on entry · Brief flash (×2) on exit
## API
Endpoint: `http://logs.research.bike`
| Endpoint | Data |
|----------|------|
| `POST /api/v1/camera/events/batch` | Hourly entry/exit counts |
| `POST /api/v1/events/batch` | Hourly BLE proximity records |
| `POST /api/v1/heartbeat` | Device health (uptime, RSSI, pending records) |
All requests are HMAC-SHA256 signed. See [design spec](docs/superpowers/specs/2026-04-13-door-counter-design.md) for full API shapes and auth scheme.
## Project Structure
```
DoorCounter/
├── firmware/
│ ├── platformio.ini
│ ├── lib/
│ │ ├── cv/ — CV pipeline (event state machine, centroid-trajectory direction)
│ │ └── hmac/ — HMAC-SHA256 signing library
│ └── src/
│ ├── main.cpp — FreeRTOS tasks, boot sequence
│ ├── config.* — NVS read/write
│ ├── provisioning.* — captive portal
│ ├── camera.* — frame capture + CV pipeline
│ ├── ble_scanner.* — BLE passive scan
│ └── reporter.* — hourly batch POST + local buffer
├── tools/
│ ├── flash_device.py — NVS provisioning script
│ ├── ota_push.py — OTA push script
│ └── serial_monitor.py — reset + read serial with timestamps (diagnostic)
├── docs/
│ ├── server-prompt-crossing-cooldown.md — server-side coordination notes
│ └── superpowers/specs/2026-04-13-door-counter-design.md
└── server/ — API server (separate deployment)
```
## Troubleshooting
| Symptom | Likely cause | Remedy |
|---------|--------------|--------|
| ~1 Hz LED blink after boot (FW 1.0), OR reboot loop with `FATAL: device_id/location_id/hmac_secret not provisioned``rst:0xc (SW_CPU_RESET)` (FW 1.1+) | NVS missing `device_id` / `location_id` / `hmac_secret`. Commonly triggered by a firmware upload wiping NVS. FW 1.1+ reboots on FATAL instead of hanging. | Re-run `flash_device.py` with the device's known credentials (see section 2 for dc-0002). |
| Device stays on `DoorCounter-Setup` AP instead of joining customer WiFi | SSID/password in NVS wrong, or network out of range. | Connect phone to `DoorCounter-Setup` → captive portal → re-enter WiFi. Or reflash NVS with correct `--wifi-ssid` / `--wifi-password`. |
| No entries/exits counted for a known-walking doorway | WiFi captive portal still up (camera task starts only after connect); or camera blocked/unfocused. | Check LED: solid on = booting/uploading, off = counting. Run `serial_monitor.py` to see `[CV] entry/exit` log lines. |
Capture a boot log with timestamps:
```bash
python tools/serial_monitor.py --port /dev/ttyUSB0 --reset --timestamp --seconds 30
```
## Deploying firmware 1.1 (network resilience)
### Before you flash
Firmware 1.1 adds five new fields to the `POST /api/v1/heartbeat` payload
(`reset_reason`, `heap_free`, `heap_min_free`, `last_disconnect_code`,
`recent_events`). **The real server must accept these optional fields before
you deploy firmware 1.1**, or strict-schema validation will 4xx every
heartbeat; after 6 consecutive misses (~6h) the heartbeat-miss watchdog
will reboot the device, producing a reboot loop.
Reference migration and handler code for the real server are in this repo:
- `server/heartbeat_diagnostics_stub.py` — Pydantic model extensions,
`store_heartbeat_diagnostics()` helper, and `EVENT_TAG_DECODER` /
`REBOOT_REASON_DECODER` reference tables.
- `server/migrations/005_heartbeat_diagnostics.sql` — adds five nullable
columns to the `heartbeats` table (adjust table name to match the real
server's schema).
Copy the stub additions into the production server repo, run the
migration, and confirm a v1.1.0-shape heartbeat returns 200 before you
flash any device.
### Flash command
```bash
cd firmware && pio run -e timercam -t upload
```
> **If the device reboot-loops after flashing** with `FATAL:
> device_id/location_id/hmac_secret not provisioned`, NVS was wiped. Re-run
> `flash_device.py` (see [section 2](#2-provision-device-identity)). FW 1.1
> turned the old FW 1.0 LED-blink hang into an explicit reboot loop; same
> root cause, same fix.
### Expected first boot
On the serial log (115200 baud), the device prints the boot banner, then
initializes `event_log`, then records the reset reason via `EVT_BOOT`.
The first heartbeat fires roughly 60-70s after power-on (15s WiFi
busy-wait + NTP sync + 60s `BOOT_REPORT_DELAY_S`). Monitor with
`pio device monitor` or:
```bash
python tools/serial_monitor.py --port /dev/ttyUSB0 --reset --timestamp --seconds 90
```
### What's new in 1.1
- Event-driven WiFi reconnect with 1s→60s exponential backoff (`net_guard` module); disconnect reasons logged.
- HTTP timeouts (5s connect / 10s response) + 3-try retry on every POST.
- ESP-IDF Task Watchdog (30s) on camera, reporter, and loop tasks; panic → reboot → reason surfaces in the next heartbeat.
- Software heartbeat-miss watchdog: 6 consecutive missed heartbeats (~6 h) triggers a clean reboot.
- Persistent NVS event-log ring buffer (32 entries) surfaced in the heartbeat's `recent_events` field.
- New heartbeat fields: `reset_reason`, `heap_free`, `heap_min_free`, `last_disconnect_code`, `recent_events`.
### 24-hour field checks
After deploying a device, run through this checklist against the server's
heartbeat records at the 24-hour mark:
- **Heartbeat count ≥ 22** — ≥ 92% uptime across 24 h at the hourly cadence.
- **No sustained `t=6` (EVT_HEARTBEAT_MISS) entries in `recent_events`** — transient singletons are expected; repeated misses indicate a sticky network problem worth investigating.
- **`heap_min_free` stable day over day** — a downward drift indicates a leak. Alert threshold: min-free drops by more than 20% vs baseline.
- **`last_disconnect_code` matches known AP behavior** — reason 8 (assoc lost) and reason 15 (4-way handshake timeout) are common on busy APs; recurring reason 200+ indicates a firmware bug.
- **`reset_reason` has no unexpected values** — see table below.
| `reset_reason` | Meaning | Expected? |
|----------------|---------|-----------|
| 1 | Power-on | Normal immediately after a deployment. |
| 4 | Software reset (our `ESP.restart()`) | Correlate with `EVT_REBOOT` in `recent_events`. |
| 6 | Task watchdog | Investigate — a task hung for 30s. |
| 7 | Brownout | Investigate power supply / USB cable. |
| 8 | SDIO reset | Unusual — investigate. |
### Decoding recent_events
The `recent_events` array is a ring buffer of `{t, d0, d1, ts}` entries.
Tag definitions live in `firmware/lib/event_log/event_log.h`:
| `t` | Event | `d0` | `d1` |
|-----|-------|------|------|
| 1 | `EVT_BOOT` | `esp_reset_reason()` | — |
| 2 | `EVT_WIFI_UP` | RSSI | — |
| 3 | `EVT_WIFI_DOWN` | disconnect reason code; `0xFF` = silent-death fallback | — |
| 4 | `EVT_HTTP_OK` | fnv1a-16 path hash | elapsed ms (capped at 65535) |
| 5 | `EVT_HTTP_FAIL` | path hash | HTTP status or negative errno cast to `uint16` |
| 6 | `EVT_HEARTBEAT_MISS` | consecutive miss count | — |
| 7 | `EVT_NTP_SYNC` | reserved | — |
| 8 | `EVT_REBOOT` | `RebootReason`: 1=HEARTBEAT_MISS, 2=FACTORY_RESET, 3=OTA, 4=WIFI_REPROV | — |
Server-side decoder tables (`EVENT_TAG_DECODER`, `REBOOT_REASON_DECODER`)
live in `server/heartbeat_diagnostics_stub.py`.