Compare commits

44 Commits

Author SHA1 Message Date
d2c2d97fb7 feat(ota): harden OTA apply flow + bump firmware to 1.0.1
End-to-end OTA verified on dc-0002 after resolving server-side schema
mismatch (server now emits update/size/sig_b64 alongside existing fields).

Firmware changes:
- Bump FW_VERSION 1.0.0 -> 1.0.1
- Replace log_i/w/e with Serial.printf in ota_updater so output appears
  regardless of CORE_DEBUG_LEVEL (the prior macros were silent in prod)
- Log partition labels/offsets, per-128KB progress, computed sha256,
  HTTP errors with body, esp_ota_* errors by name, Content-Length vs
  expected size
- Check esp_ota_write return value (previously ignored -- silent
  partition corruption on write failure) and abort cleanly on error
- Reject update if expected_size > target partition size
- Serial.flush() + 500ms delay before esp_restart() so the final log
  line escapes the UART
- Boot-time: log running partition label/offset/state + FW_VERSION,
  and call esp_ota_mark_app_valid_cancel_rollback() on PENDING_VERIFY
  to prevent silent rollback after a successful OTA

Docs:
- Rewrite docs/ota-deployment-status.md to reflect resolved state,
  document the schema fix and the .bin/.sig co-deploy invariant
2026-05-14 12:21:52 -07:00
5ec678dfa3 fix: tighten version parsing, propagate HMAC sign failure, add deployment docs
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-05-11 11:26:44 -07:00
5cf122b922 feat(firmware): wire OTA updater into main loop with 6-hour polling task
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-05-11 11:22:29 -07:00
a21dcfa349 feat(firmware): implement OTA download, ECDSA verify, and flash
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-05-11 11:18:44 -07:00
66e6808e13 feat(firmware): implement ECDSA P-256 signature verification in OTA library
Replaces placeholder ota_verify_signature_with_key with real mbedtls
ECDSA verify; adds 4-case native test suite with generated P-256 vectors.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-05-11 11:15:52 -07:00
8b1fd10db7 feat(firmware): add OTA updater library skeleton with version comparison
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-05-11 06:59:02 -07:00
f37e0d6b07 feat(tools): add firmware deploy tool (sign + stage for server) 2026-05-11 06:55:44 -07:00
81bcc12f2f fix(server): add error handling for malformed OTA manifest and missing sig file
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-05-11 06:54:26 -07:00
d9a242a5fa feat(server): add OTA check and firmware download endpoints
Implements /ota/check (version comparison + sig_b64 payload) and
/ota/firmware (binary stream) using the same _impl pattern as
camera_endpoint.py. HMAC auth left commented pending main app wiring.
6/6 tests passing.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-05-11 06:52:46 -07:00
87b30a64b2 fix(tools): add key type validation and tighten test assertions in sign_firmware
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-05-11 06:50:51 -07:00
031426e364 feat(tools): add ECDSA P-256 firmware signing tool 2026-05-11 06:49:15 -07:00
437f73739f feat(tools): add ECDSA P-256 key generation tool and public key header
Generates firmware signing keypair; private key stays in gitignored
secrets/, public key written as 65-byte C array to
firmware/lib/ota_updater/ota_pubkey.h for compile-time OTA verification.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-05-11 06:47:10 -07:00
21a3c646aa docs(firmware): document FW_VERSION format constraint for OTA version compare 2026-05-11 06:45:53 -07:00
81dc96b100 feat(firmware): add FW_VERSION constant 2026-05-11 06:44:59 -07:00
56fc58b843 fix(tools): reject CSV metacharacters in flash_device.py inputs
device-id, location-id, wifi-ssid, and wifi-password were interpolated
directly into the NVS partition CSV. A value containing comma, double
quote, CR, or LF would split the field/row and silently provision the
wrong NVS keys — easiest concrete failure: a Wi-Fi password containing
a comma. Validate operator-supplied strings before generating the CSV.

Add an empty tools/__init__.py so the regression tests can import the
helper as 'tools.flash_device' (matches the existing 'server.*' test
pattern).

Found via adversarial review (run 2026-05-01-192928, gpt-5.5 reviewer).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 15:44:57 -07:00
641ab29277 fix(server): reject inverted period_start/period_end in CameraRecord
A misbehaving or clock-broken device could submit period_end <=
period_start, polluting the camera_records table with zero-length or
inverted windows that corrupt downstream hourly analytics. Add a
Pydantic model_validator so the request is rejected at the API
boundary instead of silently persisting bad ranges.

Found via adversarial review (run 2026-05-01-191359, both reviewers).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 15:44:57 -07:00
8342904488 fix(firmware/lib): wrap-safe millis() comparison in net_guard reconnect timer
net_guard_tick() compared absolute uint32_t millis() values:
  if (millis() < s_next_retry_ms) return;
This is broken across the ~49.7-day millis() wrap: depending on which
side of the wrap each value lands, retries either tight-loop or stall
indefinitely. The device is designed for multi-month uptime, so this
is a real production case, not a theoretical one.

Replace with the standard wrap-safe pattern using a signed difference.

Found via adversarial review (run 2026-05-01-202910, gpt-5.5 reviewer).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 15:36:06 -07:00
ef00afb14e fix(firmware/lib): validate HMAC secret length and hex format before signing
hmac_sign() previously trusted whatever secret_hex came out of NVS:
- Lengths >128 chars overflowed the fixed 64-byte stack buffer in
  hex_to_bytes (out_len was unbounded).
- Non-hex characters were silently decoded to 0 via strtol with no
  end-pointer check, producing signatures under a corrupted key.
- Empty secrets fell through to mbedtls_md_hmac_starts with len=0.

flash_device.py now rejects malformed --hmac-secret at provision time,
but hmac_sign should also refuse to sign under a malformed key regardless
of how it ended up in NVS (legacy provisioning, partial flash, etc.).

Add length, hex-charset, and even-length validation; make hex_to_bytes
return bool and have hmac_sign return empty HString on any failure
(callers already treat empty as failure via post_json_once).

Found via adversarial review (run 2026-05-01-202910, both reviewers).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 15:36:06 -07:00
96ede7c999 chore: gitignore secrets, pycache, and adversarial-review artifacts
Add patterns for *secret* files (e.g. operator-saved HMAC secrets at
repo root), __pycache__/ directories, and .adversarial-review/ run
artifacts so they don't get accidentally committed via 'git add -A'.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 13:21:15 -07:00
e2dbe6a2d5 fix(server): COALESCE diagnostic columns so v1.0 heartbeats don't clear v1.1 data
store_heartbeat_diagnostics() unconditionally SET each diagnostic column
to its parameter, so a v1.0.0 heartbeat (which omits the five v1.1.0
fields and leaves them as None after Pydantic parsing) erased previously
stored diagnostics for that device. Wrap each parameter in
COALESCE(?, column_name) so omitted fields preserve the existing value.

Found via adversarial review (gpt-5.5 reviewer, run 2026-05-01-191359).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 13:19:23 -07:00
2226c1b4ca fix(tools): validate flash_device.py HMAC secret format before flashing
--hmac-secret accepted any string and passed it through to NVS, silently
producing a device that cannot authenticate to the server. Reject anything
that isn't exactly 64 hex characters (32 bytes) before generating the NVS
image. Auto-generated secrets are validated too as a defensive check.

Found via adversarial review (both reviewers, run 2026-05-01-192928).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 13:19:16 -07:00
a0eee0e6d4 fix(firmware): preserve buffered records appended during flush POST
reporter_flush() snapshotted the buffers under lock, released the lock
to POST, then unconditionally cleared the entire buffer on success.
Records appended by reporter_submit_*() during the in-flight POST were
silently erased. Replace clear() with erase() of just the snapshotted
prefix so concurrent appends survive.

Found via adversarial review (gpt-5.5 reviewer, run 2026-05-01-190903).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 13:19:11 -07:00
a585a56cff fix(firmware): upgrade NimBLE to 2.x + DNS fallback for unreliable resolvers
NimBLE-Arduino 1.4.2 had an init/fire race in its FreeRTOS callout porting
layer where os_callout_timer_cb dispatched a queued TimerHandle expiry
against a not-yet-initialized event (NULL fn pointer), causing PC=0
InstrFetchProhibited within ~1s of boot when the camera task starved the
timer service. Confirmed by ets_printf instrumentation. Upgrading to
^2.0.0 rewrites the porting layer and eliminates the race; verified clean
on the customer network for 1+ hour.

Also rolls in DNS-resilience work that surfaced the BLE crash during
provisioning: pin lwIP/esp-netif resolvers to 1.1.1.1/8.8.8.8 across DHCP
renewals, add three-tier resolver fallback in reporter with a hardcoded
IP of last resort, and switch to raw WiFiClient with manual Host header
to bypass HTTPClient's brittle DNS path.

Migration touches for NimBLE 2.x:
- NimBLEAdvertisedDeviceCallbacks -> NimBLEScanCallbacks
- onResult signature now takes const NimBLEAdvertisedDevice*
- setAdvertisedDeviceCallbacks -> setScanCallbacks
- start(0, nullptr, false) -> start(0, false, false)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 11:34:17 -07:00
461ed7d888 docs(readme): add HMAC secret generation command to operator setup
Step 2 now shows openssl rand -hex 32 (with python and /dev/urandom
fallbacks) and writes to .agent/dc-<id>-secret with chmod 600, so the
flash_device.py example can read $(cat ...) the same way the known-good
dc-0002 command does.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 14:45:08 -07:00
259256a550 docs: retailer packet — setup guide (.docx) + repo QR code
Adds the printed materials shipped with each device:
- retailer-setup-guide.docx — non-technical 1-2 page setup guide
- retailer-setup-guide.py — generator script for the .docx
- doorcounter-repo-qr.png — QR code linking to the public Gitea repo

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 14:38:22 -07:00
be44299d3e docs(readme): add quick-start, hardware sources, power draw + latency notes
Adds a sourced parts table (M5 TimerCamera-F, USB cable, 5V adapter), the
~750 mW measured power draw, the 3-5s detection latency caveat, and a
six-step Quick Start aimed at semi-technical operators deploying their
own device.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 14:26:45 -07:00
268b595340 Merge branch 'feat/network-resilience'
Network resilience hardening: NVS event-log ring buffer, event-driven
WiFi reconnect with backoff, HTTP timeouts + retry, task watchdog,
software heartbeat-miss watchdog (6h), EVT_BOOT/EVT_REBOOT logging,
heartbeat v1.1.0 diagnostic payload, server stub + migration, docs.
2026-04-23 14:12:40 -07:00
a795cfa0ad fix(firmware): reboot on FATAL failures + emit NTP_SYNC + server-coord warning
- Config-load and camera-init FATAL branches now reboot (3s LED signal
  before restart) instead of hanging forever. Matches the enum name
  REBOOT_FATAL_* and makes camera-init failures diagnosable via the
  next boot's heartbeat recent_events. Config failures produce a
  visible reboot loop rather than a silent hang.
- Emit EVT_NTP_SYNC(seconds_since_boot) on the first NTP-synced
  reporter iteration so slow / failed NTP sync is a visible signal in
  the heartbeat's recent_events window.
- README "Deploying firmware 1.1" now opens with a "Before you flash"
  warning directing the operator to land server-side heartbeat
  schema changes first (migration 005 + stub integration) to avoid a
  strict-schema 4xx reboot loop after deployment.
2026-04-23 14:10:32 -07:00
d943b3df5a feat(firmware): log reason before FATAL hang loops
Two FATAL while(true) hangs in main.cpp (config load fail, camera init
fail) previously relied on the hardware watchdog to reboot the device,
leaving the cause invisible beyond a generic TWDT reset reason. Now
each path logs EVT_REBOOT with REBOOT_FATAL_CONFIG or REBOOT_FATAL_CAMERA
before hanging, so the next heartbeat's recent_events surfaces which
branch hung. Server-side decoder updated for the two new enum values.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 14:03:57 -07:00
2d95069bd1 docs: network-resilience firmware 1.1 deployment + field diagnostic guide
Flash command, expected first-boot behavior, per-feature summary of the
1.1 release, 24-hour field-check playbook, and a reference table for
decoding the heartbeat's recent_events array.
2026-04-23 14:02:09 -07:00
867e90b1f6 feat(server): heartbeat-diagnostics stub + migration for real server import
The real server lives in a separate repo; this repo carries reference
stubs for each endpoint (see camera_endpoint.py precedent). Adds the
Pydantic extension, persistence helper, migration 005, and tests that
the real server can copy when adding diagnostic-field support.

Matches the firmware v1.1.0 heartbeat payload shape. Old-shape
payloads (firmware v1.0.0) continue to parse cleanly with the new
fields defaulting to None.
2026-04-23 13:59:31 -07:00
5c9f5df0ce feat(firmware): include diagnostics in heartbeat payload
Heartbeat v1.1.0 now carries heap stats (free + min_free since boot),
esp_reset_reason(), last WiFi disconnect code, and the last 8
persisted event-log entries. Makes field failures diagnosable
server-side without retrieving the device: the post-reboot heartbeat
will include EVT_BOOT with reset reason and whatever EVT_WIFI_DOWN
or EVT_HTTP_FAIL entries preceded it.
2026-04-23 13:54:55 -07:00
f08f70a8fb feat(firmware): software heartbeat-miss watchdog reboots after 6h offline
Reporter task counts consecutive heartbeat failures from the bool
returned by reporter_heartbeat (Task 5). After 6 consecutive misses
(~6 hours at the hourly cadence) the device logs EVT_HEARTBEAT_MISS
then EVT_REBOOT(REBOOT_HEARTBEAT_MISS) and restarts, giving the whole
network stack a clean reinitialization. The 200ms delay before the
restart lets NVS commit the REBOOT entry so the next boot can report
it via EVT_BOOT + esp_reset_reason().
2026-04-23 13:52:07 -07:00
7b546d0ed7 feat(firmware): enable task watchdog on camera/reporter/loop tasks
30s TWDT subscribes all three long-running tasks and panics on hang.
The reporter task's retry loop explicitly feeds between attempts so
the 3-try sequence (worst case 52s) does not itself trip the dog.
Reset reason on next boot is visible via esp_reset_reason() which
EVT_BOOT already logs.
2026-04-23 13:49:05 -07:00
8f8ad0b1b0 fix(firmware): add HTTP timeouts + 3-try retry, report heartbeat status
Unbounded TLS/HTTP POSTs were blocking the reporter task indefinitely
on weak WiFi. Now: 5s connect timeout, 10s response timeout, 3 attempts
with 0/2s/5s backoff. Every attempt logs HTTP_OK or HTTP_FAIL to the
event log. reporter_heartbeat now returns bool so the caller can count
consecutive misses.
2026-04-23 13:44:17 -07:00
57129ba078 fix(firmware): net_guard silent-wifi-death fallback + header hygiene
- net_guard_tick now detects status-vs-event divergence. If s_up is
  true but WiFi.status() says otherwise (rare: driver wedge, silent
  RF failure), force DOWN state and schedule reconnect. Uses 0xFF
  disconnect reason so the event log distinguishes this path.
- Forward-declare DeviceConfig in net_guard.h so consumers that don't
  call net_guard_start don't transitively pull config.h.
2026-04-23 13:41:53 -07:00
af3067d481 refactor(firmware): drive WiFi reconnect from net_guard events
loop() no longer blocks for 5s after a disconnect; reconnect is
scheduled from the WiFi event handler with exponential backoff.
Buffered reports flush on every clean UP transition.
2026-04-23 13:36:29 -07:00
cfa0d2563f fix(firmware): event_log bounded mutex wait, skip on contention
Mutex take in event_log_write and event_log_read_recent switched
from portMAX_DELAY to pdMS_TO_TICKS(50) with skip-on-timeout. Prevents
the high-priority WiFi event task from stalling on NVS writes; diag
loss under contention is preferable to dropped WiFi events.
2026-04-23 13:31:54 -07:00
84d9ba349b fix(firmware): net_guard boot-state seed + no spurious disconnect
- Seed s_up from WiFi.status() in net_guard_start so the first
  STA_GOT_IP (fired during setup's busy-wait, before onEvent was
  registered) is not missed — prevents a reconnect flap on every boot.
- Drop WiFi.disconnect() from net_guard_tick; WiFi.begin() alone
  re-associates cleanly and avoids a spurious STA_DISCONNECTED that
  was double-logging EVT_WIFI_DOWN on every retry.
- Re-check s_up after the millis() timing gate to close the
  GOT_IP-vs-tick race.
- Document the volatile-only shared-state contract.
2026-04-23 13:31:47 -07:00
9f293b4639 feat(firmware): event-driven WiFi reconnect with exponential backoff
net_guard registers WiFi.onEvent() so disconnects are handled
immediately instead of polled every 1s. Backoff 1s->2s->4s->...->60s cap.
Every up/down transition is logged to the event log with the disconnect
reason code, so field failures are diagnosable.
2026-04-23 13:26:10 -07:00
95724bf3ff feat(firmware): log boot and reboot reason to event log
Every boot logs EVT_BOOT with esp_reset_reason(); every deliberate
ESP.restart() is preceded by EVT_REBOOT with a reason code. This
gives us a persistent answer to 'why did the device just reboot?'.
2026-04-23 13:21:23 -07:00
9eb1e19651 test(firmware): event_log boot recovery — partial fill and post-wrap
Exercises the slot-scan logic in event_log_init(): after a simulated
reboot (RAM state cleared, NVS slots preserved) the module must
resume with the correct head/cnt so newest-first read order is
unchanged and subsequent writes continue the seq monotonically.

Adds native-only event_log_test_simulate_reboot() helper. Lifts the
slot-scan loop out of the #ifdef ARDUINO guard so the native stub
exercises the same recovery path as production; the platform-specific
NVS setup remains guarded.
2026-04-23 13:18:08 -07:00
95f91d3656 fix(firmware): event_log thread safety and NVS wear
- Remove monotonic counter writes to NVS (stop burning flash on every
  event). Derive head and cnt by scanning slots on boot.
- Widen seq to uint32 so slot scan works across multi-year lifetimes.
- Add FreeRTOS mutex around write/read so WiFi event handlers can
  safely call event_log_write from another task.
- Check Preferences.begin() return; disable logging if NVS unavailable.
- Extract NTP_SYNC_THRESHOLD constant; drop misleading native uptime.
- Add tests for empty read, max_entries truncation, real-path hash.
2026-04-23 13:13:21 -07:00
9232766e60 feat(firmware): add NVS-backed event log ring buffer
Persistent 32-slot ring buffer of tagged diagnostic events (boot, wifi
up/down, http ok/fail, heartbeat miss, reboot). Used to diagnose field
failures post-hoc via the heartbeat payload, without needing serial
access. Native-native stub lets policy be unit-tested.
2026-04-23 13:06:38 -07:00
44 changed files with 3933 additions and 69 deletions

5
.gitignore vendored
View File

@@ -1,6 +1,11 @@
.worktrees/
.agent/
.claude/
.adversarial-review/
graphify-out/
firmware/.pio/
*.log
*secret*
__pycache__/
secrets/
server/firmware/

231
README.md
View File

@@ -2,13 +2,95 @@
Retail door traffic counter using M5Stack TimerCamera-F (ESP32 + OV3660). Counts walker traversals via overhead camera CV, passively scans BLE foot traffic, and reports hourly to `logs.research.bike`.
> **Known limitation — directional accuracy.** This firmware reports counts as `{entries, exits}` for API compatibility, but **per-walk direction labelling is not reliable at the current mount (7' overhead, straight down).** In bench testing, event detection was 100% (8/8 walks detected) while per-walk direction matched the physical walk only ~50% of the time — the centroid trajectories produced by entries and exits were nearly indistinguishable. **The number to trust is gross traffic: `entries + exits` ≈ total walkers through the doorway.** The directional split is an unreliable best-effort heuristic. See [Directional counting](#directional-counting) for why.
> **Known limitations.**
> - **Directional accuracy.** Counts are reported as `{entries, exits}` for API compatibility, but **per-walk direction labelling is not reliable at the current mount (7' overhead, straight down).** Bench testing: event detection 100% (8/8), per-walk direction ~50% (coin flip). **Trust gross traffic: `entries + exits` ≈ total walkers.** See [Directional counting](#directional-counting).
> - **Detection latency.** A walker takes **35 seconds** from entering the FOV to being registered as a count — the state machine waits for the walker to clear the frame (or a 5s timeout) before finalizing. Counts are not instantaneous; hourly aggregation is the intended consumption mode.
## Hardware
- **Device**: M5Stack TimerCamera-F (ESP32-S, OV3660, PSRAM, WiFi/BLE)
- **Mount**: Overhead, camera pointing straight down, centered above doorway
- **Power**: USB (any phone charger)
| Component | Source | Notes |
|-----------|--------|-------|
| **Camera** | [M5Stack TimerCamera-F (OV3660 fisheye, PSRAM)](https://shop.m5stack.com/products/esp32-psram-timer-camera-fisheye-ov3660) | ESP32 + WiFi/BLE on board |
| **USB cable** | [USB-A → USB-C, right-angle](https://www.amazon.com/dp/B0DWMPVP4F) | Right-angle plug helps with overhead mounts |
| **Power supply** | [5V USB wall adapter](https://www.amazon.com/dp/B0B2WLSY9D) | Any 5V/1A+ USB charger works |
- **Mount**: Overhead, camera pointing straight down, centered above doorway (~7' / 2.1m height)
- **Power draw**: **~750 mW measured at the wall** (camera + WiFi + BLE all active). Runs cool — fanless, can be sealed in a small enclosure. Annual energy cost at US residential rates is well under $1.
## Quick Start (semi-technical)
The fastest path from "box arrived" to "counts in the dashboard." Comfortable with a terminal but not necessarily an embedded developer? Start here.
**You will need**: the camera + cable + power supply listed above, a Linux/macOS computer with USB, and ~20 minutes.
### 1. Install the toolchain (one-time)
```bash
# Python 3.10+ and pip
pip install --user platformio esptool esp-idf-nvs-partition-gen
```
PlatformIO installs the ESP32 compiler on first build — expect a few minutes the first time.
### 2. Clone this repo
```bash
git clone https://github.com/<your-org>/DoorCounter.git
cd DoorCounter
```
### 3. Plug the camera in
Connect the USB-C cable to the TimerCamera and the other end to your computer. On Linux it appears as `/dev/ttyUSB0`; on macOS as `/dev/tty.usbserial-*`. If you don't see it, install [CP210x USB drivers](https://www.silabs.com/developer-tools/usb-to-uart-bridge-vcp-drivers).
### 4. Flash the firmware
```bash
cd firmware
pio run -t upload --upload-port /dev/ttyUSB0
```
### 5. Provision the device with its credentials
Pick a unique device ID (e.g. `dc-0001`), a location ID, and generate a 32-byte HMAC secret. The server admin must record this same secret — counts won't be accepted without it.
```bash
# Generate a fresh secret
openssl rand -hex 32 > my-device-secret.txt
# Provision
python tools/flash_device.py \
--port /dev/ttyUSB0 \
--device-id dc-0001 \
--location-id my-store \
--hmac-secret "$(cat my-device-secret.txt)" \
--wifi-ssid "MyStoreWiFi" \
--wifi-password "wifi-password-here"
```
> If you skip `--wifi-ssid`/`--wifi-password`, the device opens a `DoorCounter-Setup` WiFi access point on boot. Connect a phone to it and enter the credentials in the captive portal.
### 6. Mount the device
1. Position above the doorway, camera lens pointing straight down (~7' / 2.1m up).
2. Plug into the wall adapter — that's it. The LED turns red while joining WiFi, then off once it's counting.
3. First heartbeat lands at the server within ~60 seconds; first hourly count batch arrives at the top of the next hour.
### What "working" looks like
- LED behavior: **off** = counting normally · **red** = no WiFi · **yellow** = uploading · **brief flash** when a walker is registered (1 flash = entry, 2 flashes = exit).
- A walker takes 35 seconds from entering the FOV to triggering the LED flash — this is normal.
- Hourly uploads to `logs.research.bike` (or your configured server) include the entry/exit counts since the last report.
### If something is off
| Symptom | Try |
|---------|-----|
| Red LED stays on | Wrong WiFi password — re-run step 5, or use the `DoorCounter-Setup` captive portal. |
| LED blinks ~1 Hz forever (or device reboots in a loop) | NVS got wiped — re-run step 5 with the same credentials. |
| No counts appearing on server | Run `python tools/serial_monitor.py --port /dev/ttyUSB0 --reset --timestamp --seconds 30` and watch for `[CV] entry/exit` lines as you walk under it. |
For deeper troubleshooting see [Troubleshooting](#troubleshooting) and [Operator Setup](#operator-setup).
## Firmware
@@ -111,22 +193,58 @@ pio run -t upload --upload-port /dev/ttyUSB0
### 2. Provision device identity
Generate a fresh 32-byte HMAC secret (64 hex chars) and stash it where you
won't lose it — the server must store the same value or counts will be
rejected:
```bash
# Generate and save (one device per file; never commit these)
mkdir -p .agent
openssl rand -hex 32 > .agent/dc-0042-secret
chmod 600 .agent/dc-0042-secret
```
> No `openssl`? Equivalents:
> - `python3 -c 'import secrets; print(secrets.token_hex(32))'`
> - `head -c 32 /dev/urandom | xxd -p -c 64`
Then provision:
```bash
python tools/flash_device.py \
--port /dev/ttyUSB0 \
--device-id dc-0042 \
--location-id retailer-123 \
--hmac-secret <32-byte-hex> \
--hmac-secret "$(cat .agent/dc-0042-secret)" \
--wifi-ssid "StoreWiFi" \
--wifi-password "secret"
```
WiFi credentials are optional — if omitted, device starts captive portal on boot.
**Known-good command for dc-0002** (dev device at research.bike):
```bash
python tools/flash_device.py \
--port /dev/ttyUSB0 \
--device-id dc-0002 \
--location-id retailer-123 \
--hmac-secret "$(cat .agent/dc-0002-secret)" \
--wifi-ssid Elly-Fi \
--wifi-password <ask> \
--line-offset 50
```
Secret is stored in `.agent/dc-0002-secret` (gitignored). Server must already
know this secret — do not rotate without updating the server side.
> **Re-provision after firmware uploads.** Flashing firmware via
> `pio run -t upload` may clear the NVS partition on this board. If the device
> boots into a ~1 Hz LED blink (the "not provisioned" fatal state) after a
> firmware update, re-run `flash_device.py` with the same credentials. See
> `pio run -t upload` may clear the NVS partition on this board.
> - **FW 1.0**: device boots into a ~1 Hz LED blink (hang in "not provisioned" fatal).
> - **FW 1.1+**: device reboot-loops with `FATAL: device_id/location_id/hmac_secret not provisioned`
> followed by `rst:0xc (SW_CPU_RESET)` (FATAL paths now reboot instead of hang).
>
> Either way, re-run `flash_device.py` with the same credentials. See
> [Troubleshooting](#troubleshooting).
### 3. OTA updates
@@ -188,7 +306,7 @@ DoorCounter/
| Symptom | Likely cause | Remedy |
|---------|--------------|--------|
| ~1 Hz LED blink after boot, no serial beyond `esp_core_dump_flash: No core dump partition found!` | NVS missing `device_id` / `location_id` / `hmac_secret`. Commonly triggered by a firmware upload wiping NVS. | Re-run `flash_device.py` with the device's known credentials. |
| ~1 Hz LED blink after boot (FW 1.0), OR reboot loop with `FATAL: device_id/location_id/hmac_secret not provisioned``rst:0xc (SW_CPU_RESET)` (FW 1.1+) | NVS missing `device_id` / `location_id` / `hmac_secret`. Commonly triggered by a firmware upload wiping NVS. FW 1.1+ reboots on FATAL instead of hanging. | Re-run `flash_device.py` with the device's known credentials (see section 2 for dc-0002). |
| Device stays on `DoorCounter-Setup` AP instead of joining customer WiFi | SSID/password in NVS wrong, or network out of range. | Connect phone to `DoorCounter-Setup` → captive portal → re-enter WiFi. Or reflash NVS with correct `--wifi-ssid` / `--wifi-password`. |
| No entries/exits counted for a known-walking doorway | WiFi captive portal still up (camera task starts only after connect); or camera blocked/unfocused. | Check LED: solid on = booting/uploading, off = counting. Run `serial_monitor.py` to see `[CV] entry/exit` log lines. |
@@ -197,3 +315,98 @@ Capture a boot log with timestamps:
```bash
python tools/serial_monitor.py --port /dev/ttyUSB0 --reset --timestamp --seconds 30
```
## Deploying firmware 1.1 (network resilience)
### Before you flash
Firmware 1.1 adds five new fields to the `POST /api/v1/heartbeat` payload
(`reset_reason`, `heap_free`, `heap_min_free`, `last_disconnect_code`,
`recent_events`). **The real server must accept these optional fields before
you deploy firmware 1.1**, or strict-schema validation will 4xx every
heartbeat; after 6 consecutive misses (~6h) the heartbeat-miss watchdog
will reboot the device, producing a reboot loop.
Reference migration and handler code for the real server are in this repo:
- `server/heartbeat_diagnostics_stub.py` — Pydantic model extensions,
`store_heartbeat_diagnostics()` helper, and `EVENT_TAG_DECODER` /
`REBOOT_REASON_DECODER` reference tables.
- `server/migrations/005_heartbeat_diagnostics.sql` — adds five nullable
columns to the `heartbeats` table (adjust table name to match the real
server's schema).
Copy the stub additions into the production server repo, run the
migration, and confirm a v1.1.0-shape heartbeat returns 200 before you
flash any device.
### Flash command
```bash
cd firmware && pio run -e timercam -t upload
```
> **If the device reboot-loops after flashing** with `FATAL:
> device_id/location_id/hmac_secret not provisioned`, NVS was wiped. Re-run
> `flash_device.py` (see [section 2](#2-provision-device-identity)). FW 1.1
> turned the old FW 1.0 LED-blink hang into an explicit reboot loop; same
> root cause, same fix.
### Expected first boot
On the serial log (115200 baud), the device prints the boot banner, then
initializes `event_log`, then records the reset reason via `EVT_BOOT`.
The first heartbeat fires roughly 60-70s after power-on (15s WiFi
busy-wait + NTP sync + 60s `BOOT_REPORT_DELAY_S`). Monitor with
`pio device monitor` or:
```bash
python tools/serial_monitor.py --port /dev/ttyUSB0 --reset --timestamp --seconds 90
```
### What's new in 1.1
- Event-driven WiFi reconnect with 1s→60s exponential backoff (`net_guard` module); disconnect reasons logged.
- HTTP timeouts (5s connect / 10s response) + 3-try retry on every POST.
- ESP-IDF Task Watchdog (30s) on camera, reporter, and loop tasks; panic → reboot → reason surfaces in the next heartbeat.
- Software heartbeat-miss watchdog: 6 consecutive missed heartbeats (~6 h) triggers a clean reboot.
- Persistent NVS event-log ring buffer (32 entries) surfaced in the heartbeat's `recent_events` field.
- New heartbeat fields: `reset_reason`, `heap_free`, `heap_min_free`, `last_disconnect_code`, `recent_events`.
### 24-hour field checks
After deploying a device, run through this checklist against the server's
heartbeat records at the 24-hour mark:
- **Heartbeat count ≥ 22** — ≥ 92% uptime across 24 h at the hourly cadence.
- **No sustained `t=6` (EVT_HEARTBEAT_MISS) entries in `recent_events`** — transient singletons are expected; repeated misses indicate a sticky network problem worth investigating.
- **`heap_min_free` stable day over day** — a downward drift indicates a leak. Alert threshold: min-free drops by more than 20% vs baseline.
- **`last_disconnect_code` matches known AP behavior** — reason 8 (assoc lost) and reason 15 (4-way handshake timeout) are common on busy APs; recurring reason 200+ indicates a firmware bug.
- **`reset_reason` has no unexpected values** — see table below.
| `reset_reason` | Meaning | Expected? |
|----------------|---------|-----------|
| 1 | Power-on | Normal immediately after a deployment. |
| 4 | Software reset (our `ESP.restart()`) | Correlate with `EVT_REBOOT` in `recent_events`. |
| 6 | Task watchdog | Investigate — a task hung for 30s. |
| 7 | Brownout | Investigate power supply / USB cable. |
| 8 | SDIO reset | Unusual — investigate. |
### Decoding recent_events
The `recent_events` array is a ring buffer of `{t, d0, d1, ts}` entries.
Tag definitions live in `firmware/lib/event_log/event_log.h`:
| `t` | Event | `d0` | `d1` |
|-----|-------|------|------|
| 1 | `EVT_BOOT` | `esp_reset_reason()` | — |
| 2 | `EVT_WIFI_UP` | RSSI | — |
| 3 | `EVT_WIFI_DOWN` | disconnect reason code; `0xFF` = silent-death fallback | — |
| 4 | `EVT_HTTP_OK` | fnv1a-16 path hash | elapsed ms (capped at 65535) |
| 5 | `EVT_HTTP_FAIL` | path hash | HTTP status or negative errno cast to `uint16` |
| 6 | `EVT_HEARTBEAT_MISS` | consecutive miss count | — |
| 7 | `EVT_NTP_SYNC` | reserved | — |
| 8 | `EVT_REBOOT` | `RebootReason`: 1=HEARTBEAT_MISS, 2=FACTORY_RESET, 3=OTA, 4=WIFI_REPROV | — |
Server-side decoder tables (`EVENT_TAG_DECODER`, `REBOOT_REASON_DECODER`)
live in `server/heartbeat_diagnostics_stub.py`.

Binary file not shown.

After

Width:  |  Height:  |  Size: 645 B

View File

@@ -0,0 +1,88 @@
# OTA Deployment — Status
## Current state (2026-05-14)
**End-to-end OTA verified working on `dc-0002`.** Device polled `engagement-api-1`, received a signed manifest, downloaded and verified firmware 1.0.1, set the alternate boot partition, rebooted, and came up reporting `fw=1.0.1`.
## What's deployed
- **Branch `feat/pull-ota-code-signing`** merged to `main` (13 commits, 17 new files, 936 LOC).
- **Signing toolchain**: `tools/gen_signing_key.py`, `tools/sign_firmware.py`, `tools/deploy_firmware.py`.
- **Firmware OTA library**: `firmware/lib/ota_updater/`.
- **Signing key**: `secrets/firmware_signing_key.pem` (gitignored). Public key committed at `firmware/lib/ota_updater/ota_pubkey.h`.
- **Live OTA handler**: served by `engagement-api-1` Docker service (source not in this repo). The stub at `server/ota_endpoint.py` is unwired and not the one responding to devices.
- **Configurable poll interval** via NVS key `ota_interval`. Provision with `flash_device.py --ota-interval-seconds N`. Min 10 s, default 21600 (6 h).
## Issues resolved
### 1. HMAC format mismatch (resolved 2026-05-13)
Firmware OTA updater was using `X-HMAC-Signature` header + `millis()`-derived timestamp; the reporter component used `X-Signature` + `time(nullptr)`. Server expected the reporter format. Fixed by aligning the OTA updater to the same canonical scheme as the reporter (`firmware/lib/ota_updater/ota_updater.cpp` `add_hmac_headers`).
### 2. `/ota/check` JSON schema mismatch (resolved 2026-05-14)
Server was emitting `{update_available, sha256, url}` but firmware reads `{update, size, sig_b64}`. Device silently decided "up to date" every poll because `doc["update"]` defaulted to `false`. Fixed server-side: the `/ota/check` response now also includes the fields the firmware needs. Firmware schema remains the source of truth.
### 3. Signed firmware artifact pipeline (resolved 2026-05-14)
Deploy flow now bumps `FW_VERSION` → builds → copies `.pio/build/timercam/firmware.bin` to `firmware-<version>.bin` → signs with `tools/sign_firmware.py` → SCPs both `.bin` and `.bin.sig` to `root@nginx:/root/engagement-api/firmware/`. Server team updates `firmware_releases.sha256` to match the new binary.
**Gotcha:** the `.bin` and `.sig` must always be deployed together. The signature is over the bytes; replacing one without the other puts the server in an inconsistent state and devices will reject the update with `SIGNATURE INVALID`.
## Hardening added this session
### Firmware logging (`firmware/lib/ota_updater/ota_updater.cpp`, `firmware/src/main.cpp`)
The previous `log_i/w/e` macros were silenced by the default `CORE_DEBUG_LEVEL`. Replaced with `Serial.printf` so output appears regardless of log level. Now logs at every step:
- `[OTA] task started, interval=N ms`
- Per-tick WiFi status
- Full check URL + HMAC header preview (device id, ts, sig prefix)
- HTTP response code + error body on non-200
- JSON parse errors
- "Up to date" decision
- Partition labels and offsets (running + target)
- Per-128 KB download progress
- Total bytes + elapsed ms
- Computed sha256 of the downloaded image (compare against server `X-SHA256`)
- Signature verify result
- `esp_ota_end` / `esp_ota_set_boot_partition` errors by name
- 500 ms `Serial.flush()` + `delay()` before `esp_restart()` so the final log line escapes the UART
### Boot-time partition state (`firmware/src/main.cpp`)
Logs `running partition '<label>' (off=0x…) state=N fw=…` at every boot. If `state == ESP_OTA_IMG_PENDING_VERIFY` (3), calls `esp_ota_mark_app_valid_cancel_rollback()` to prevent the bootloader from reverting on the next reboot. Harmless no-op when rollback isn't enabled, but eliminates a class of silent OTA failures.
### `esp_ota_write` return value (`firmware/lib/ota_updater/ota_updater.cpp`)
Previously ignored — a failed write would silently corrupt the new partition and the device would still try to boot from it. Now checked, aborts the OTA cleanly, and logs the failing offset.
### Partition size pre-check
Reject the update before `esp_ota_begin` if `expected_size > target->size`.
## Verifying a deployment
After a server push, watch the device's serial output on the next OTA tick:
```
[OTA] tick: WiFi connected, running check
[OTA] check → GET http://logs.research.bike:80/ota/check?version=X.Y.Z
[OTA] check response: HTTP 200
[OTA] Update: X.Y.Z → A.B.C (N bytes)
[OTA] running='app0' (off=…), target='app1' (off=…)
[OTA] progress: N/N bytes
[OTA] sha256(image)=<hex> ← must match server X-SHA256
[OTA] signature OK
[OTA] boot partition set to 'app1' — rebooting in 500 ms
```
Then on reboot:
```
[BOOT] running partition 'app1' (off=…) state=N fw=A.B.C
```
The `fw=A.B.C` line is the success signal — it reflects the `FW_VERSION` macro baked into the freshly-booted image, not just what the device claims to be running.
## Quick reference
- Plan: `docs/superpowers/plans/2026-05-10-pull-ota-code-signing.md`
- Firmware version: `firmware/include/version.h`
- OTA library: `firmware/lib/ota_updater/`
- HMAC implementation: `firmware/lib/hmac/hmac.cpp`
- Provisioning tool: `tools/flash_device.py`
- Signing tools: `tools/gen_signing_key.py`, `tools/sign_firmware.py`, `tools/deploy_firmware.py`
- Server deploy path: `root@nginx:/root/engagement-api/firmware/` (per server team runbook)

Binary file not shown.

View File

@@ -0,0 +1,133 @@
from docx import Document
from docx.shared import Pt, Inches, RGBColor
from docx.enum.text import WD_ALIGN_PARAGRAPH
doc = Document()
for section in doc.sections:
section.top_margin = Inches(0.6)
section.bottom_margin = Inches(0.6)
section.left_margin = Inches(0.8)
section.right_margin = Inches(0.8)
style = doc.styles['Normal']
style.font.name = 'Calibri'
style.font.size = Pt(11)
def heading(text, size=18, color=(0x1F, 0x3A, 0x5F), space_before=6, space_after=4):
p = doc.add_paragraph()
p.paragraph_format.space_before = Pt(space_before)
p.paragraph_format.space_after = Pt(space_after)
run = p.add_run(text)
run.bold = True
run.font.size = Pt(size)
run.font.color.rgb = RGBColor(*color)
return p
def subheading(text):
return heading(text, size=13, color=(0x1F, 0x3A, 0x5F), space_before=8, space_after=2)
def body(text, bold_lead=None):
p = doc.add_paragraph()
p.paragraph_format.space_after = Pt(4)
if bold_lead:
r = p.add_run(bold_lead)
r.bold = True
p.add_run(text)
else:
p.add_run(text)
return p
def bullet(text, bold_lead=None):
p = doc.add_paragraph(style='List Bullet')
p.paragraph_format.space_after = Pt(2)
if bold_lead:
r = p.add_run(bold_lead)
r.bold = True
p.add_run(text)
else:
p.add_run(text)
return p
# ---------- Title ----------
title = doc.add_paragraph()
title.alignment = WD_ALIGN_PARAGRAPH.CENTER
tr = title.add_run('DoorCounter')
tr.bold = True
tr.font.size = Pt(28)
tr.font.color.rgb = RGBColor(0x1F, 0x3A, 0x5F)
sub = doc.add_paragraph()
sub.alignment = WD_ALIGN_PARAGRAPH.CENTER
sr = sub.add_run('A simple, private way to count visitors to your store')
sr.italic = True
sr.font.size = Pt(13)
sr.font.color.rgb = RGBColor(0x55, 0x55, 0x55)
sub.paragraph_format.space_after = Pt(10)
# ---------- What it is ----------
heading('What is in the box?', size=14)
bullet('A small camera (about the size of a matchbox)', bold_lead='Camera — ')
bullet('A USB cable to power it', bold_lead='Cable — ')
bullet('A small wall plug', bold_lead='Power adapter — ')
body('That\'s it. There is nothing to install on your computer or phone, no software to log into, and no monthly fee.')
# ---------- What it does ----------
heading('What does it do?', size=14)
body('The camera mounts above your front door, pointing straight down at the floor. Whenever someone walks underneath, it counts them. Once an hour, it sends the count to us so we can share visitor traffic reports with you.')
p = doc.add_paragraph()
p.paragraph_format.space_after = Pt(4)
r = p.add_run('Your privacy is protected. ')
r.bold = True
p.add_run('The camera looks straight down at the top of people\'s heads — it cannot see faces. No video or photos are ever saved or sent anywhere. Only the count of how many people walked through.')
# ---------- Setup ----------
heading('How do I set it up?', size=14)
body('The whole process takes about 5 minutes. You will need a stepladder and your store\'s WiFi password.')
subheading('Step 1 — Mount the camera above your door')
body('Use the included double-sided tape (or a screw, if you prefer) to stick the camera to the ceiling, directly above where people walk through your front door. The lens should point straight down at the floor. Aim for roughly 7 feet (about 2 meters) above the floor — most ceilings work fine.')
subheading('Step 2 — Plug it in')
body('Connect the USB cable to the camera and to the wall plug. Plug the wall plug into any standard outlet. The camera will turn on automatically — you will see a small red light.')
subheading('Step 3 — Connect it to your WiFi')
body('Take out your phone and open its WiFi settings. You will see a new network called "DoorCounter-Setup". Connect to it. Your phone will automatically open a setup page — enter your store\'s WiFi name and password, then tap Save.')
body('After about 30 seconds, the red light on the camera will turn off. That means it is connected and counting. You are done!', bold_lead='')
# ---------- Day to day ----------
heading('What do I do day-to-day?', size=14)
body('Nothing. The camera works on its own, 24 hours a day. It uses about as much electricity as a nightlight (less than $1 per year), runs cool, and never needs to be touched.')
p = doc.add_paragraph()
p.paragraph_format.space_after = Pt(4)
r = p.add_run('A small light blinks each time someone walks through. ')
r.bold = True
p.add_run('You may notice the count happens 35 seconds after the person passes — that is normal.')
# ---------- Troubleshooting ----------
heading('If something seems wrong', size=14)
bullet('your WiFi password is probably wrong, or the WiFi network is out of range. Reconnect your phone to "DoorCounter-Setup" and re-enter the password.', bold_lead='Red light stays on — ')
bullet('unplug it for 10 seconds and plug it back in.', bold_lead='No light at all — ')
bullet('please contact us using the information below.', bold_lead='Anything else — ')
# ---------- Contact ----------
heading('Questions?', size=14)
body('We are happy to help. Reach out anytime:')
bullet('peter@research.bike', bold_lead='Email: ')
bullet('https://git.research.bike/Bicycle_Market_Research/DoorCounter', bold_lead='Project page: ')
footer = doc.add_paragraph()
footer.alignment = WD_ALIGN_PARAGRAPH.CENTER
fr = footer.add_run('Thank you for participating in our retail traffic study.')
fr.italic = True
fr.font.size = Pt(10)
fr.font.color.rgb = RGBColor(0x77, 0x77, 0x77)
footer.paragraph_format.space_before = Pt(12)
import sys
out = sys.argv[1] if len(sys.argv) > 1 else 'retailer-setup-guide.docx'
doc.save(out)
print(f"wrote {out}")

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,189 @@
# BLE / NimBLE Timer-Callout Crash — Handoff
**Date opened:** 2026-05-01
**Status:** Resolved 2026-05-01 by upgrading `h2zero/NimBLE-Arduino` from `^1.4.2` to `^2.0.0` (`firmware/platformio.ini:24`). BLE scanning re-enabled via `BLE_SCANNING_ENABLED 1` (`firmware/src/main.cpp:30`). Verified clean on customer network for 1+ hour with no panics.
**Goal:** Re-enable BLE scanning without the device crashing within ~1s of boot.
**Confirmed root cause:** Instrumented `os_callout_timer_cb` with `ets_printf` and observed the very first callout fire on the direct-call path with both `evq=NULL` and `fn=NULL`, while the same `co` address later (post-init) showed valid `evq` and `fn`. Same callout struct reused — classic NimBLE 1.x callout init/fire race where the FreeRTOS `TimerHandle_t` had a queued expiry against a not-yet-initialized event. NimBLE 2.x rewrote the porting layer; the race is gone.
**Migration touches (NimBLE 1.x → 2.x):**
- `NimBLEAdvertisedDeviceCallbacks``NimBLEScanCallbacks`
- `onResult(NimBLEAdvertisedDevice*)``onResult(const NimBLEAdvertisedDevice*)`
- `setAdvertisedDeviceCallbacks(cb, true)``setScanCallbacks(cb, true)`
- `start(0, nullptr, false)``start(0, false, false)` (signature: `duration, isContinue, restart`)
BLE was working before today's customer-site provisioning trip. The crash is reliably reproducible on the current build at the customer location whenever `BLE_SCANNING_ENABLED` is set back to `1`. It may or may not reproduce on a quieter network — the camera task's CPU-starvation pattern is shared, but the crash window's exact trigger is unconfirmed.
---
## Symptom
Within ~1s of boot, after several `cam_hal: EV-VSYNC-OVF` lines from the camera driver:
```
Guru Meditation Error: Core 0 panic'ed (InstrFetchProhibited). Exception was unhandled.
Core 0 register dump:
PC : 0x00000000 PS : 0x00060630 A0 : 0x8009a9af A1 : 0x3ffbd6e0
A2 : 0x3fff1ef8 A3 : 0x00000001 ...
A8 : 0x800f2ebc ...
EXCCAUSE: 0x00000014 EXCVADDR: 0x00000000
Backtrace: 0xfffffffd:0x3ffbd6e0 0x4009a9ac:0x3ffbd700
```
Decoded with `~/.platformio/packages/toolchain-xtensa-esp32/bin/xtensa-esp32-elf-addr2line -e .pio/build/timercam/firmware.elf -pfiC 0x4009a9ac 0x400f2ebc`:
```
prvProcessReceivedCommands at freertos/timers.c:852
(inlined by) prvTimerTask at freertos/timers.c:600
os_callout_timer_cb at NimBLE-Arduino/.../npl_os_freertos.c:1742
```
`PC=0` + `EXCCAUSE=0x14` (InstrFetchProhibited) = jump-to-NULL. The FreeRTOS timer-service task is dispatching a NimBLE callout whose callback function pointer is NULL.
The relevant NimBLE source:
```c
// firmware/.pio/libdeps/timercam/NimBLE-Arduino/src/nimble/porting/npl/freertos/src/npl_os_freertos.c:1729-1742
static void
os_callout_timer_cb(TimerHandle_t timer)
{
struct ble_npl_callout *co;
co = pvTimerGetTimerID(timer);
assert(co);
if (co->evq) {
ble_npl_eventq_put(co->evq, &co->ev);
} else {
co->ev.fn(&co->ev); // <-- co->ev.fn is NULL
}
}
```
Either `co->ev.fn` is genuinely NULL on the direct-call path, OR — given the addr2line frame is a few lines off and the callsite is ambiguous — the FreeRTOS timer's own callback pointer (`pxTimer->pxCallbackFunction`) is NULL inside `prvProcessReceivedCommands`. Both indicate a callout/timer being freed or zeroed while the FreeRTOS timer service still has a command queued for it.
---
## Environment
- Board: M5Stack TimerCam-F (ESP32-D0WDQ6-V3, dual-core 240 MHz, 4MB flash).
- BLE library: `h2zero/NimBLE-Arduino@^1.4.2` (`firmware/platformio.ini`). 1.4.2 is end-of-life on the 1.x branch; 2.x exists with breaking API changes.
- Camera: OV3660 via `esp32-camera` driver, 96×96 grayscale @ 5 FPS.
- BLE scan: passive, low-overhead, hash-collected by `firmware/src/ble_scanner.cpp`.
- Tasks: `task_camera` (core 1, prio 2, 8KB stack), `task_reporter` (core 0, prio 1, 8KB stack), Arduino loop (default).
- The camera task triggers `cam_hal: EV-VSYNC-OVF` whenever frame capture overlaps another long operation — this consistently precedes the crash in logs.
---
## What's been ruled out
1. **DNS / network code** — entirely unrelated. DNS path works in production via the new fallback-IP machinery (`firmware/src/reporter.cpp` `resolve_api_ip` and `firmware/src/reporter.h` `REPORTER_API_FALLBACK_IP`). Do not regress this; it shipped with reports working at the customer site.
2. **Our BLE app code** — the backtrace stays inside the FreeRTOS timer service and NimBLE's own porting layer; nothing in `ble_scanner.cpp` is on the call stack. The bug is in vendored NimBLE.
3. **Memory corruption from our side**`A2 = 0x3fff1ef8` is a normal heap address, no obvious overrun pattern. Heap is healthy at the time (we'd see a different fault otherwise).
4. **Stack overflow** — A1 = 0x3ffbd6e0 is well within the FreeRTOS timer-service task's stack range; no canary smash log.
---
## What changed today
| File | Change | Keep? |
|---|---|---|
| `firmware/src/main.cpp` | Added `BLE_SCANNING_ENABLED 0` gate; all `ble_scanner_*` callsites compile out; `BLEHourlyRecord` zero-stubbed when off | Keep until crash fixed; flip to `1` to reproduce |
| `firmware/src/main.cpp` | Removed verbose `[F]`/`[CV] spawn` per-frame logging; kept entry/exit + heartbeat | Keep |
| `firmware/src/ble_scanner.cpp` | Removed `[BLE] new device:` per-discovery log | Keep |
| `firmware/src/reporter.{h,cpp}` | DNS resolution with fallback IP, raw `WiFiClient` HTTP, manual `Host:` header | Keep — production fix |
| `firmware/lib/net_guard/net_guard.{h,cpp}` | DNS pin to 1.1.1.1/8.8.8.8 at lwIP + esp-netif layers; `net_guard_dump_dns` diagnostic | Keep |
---
## Reproduction
1. `cd firmware && pio run -e timercam`.
2. Edit `firmware/src/main.cpp`, set `#define BLE_SCANNING_ENABLED 1`. Rebuild.
3. Flash a TimerCam: `python tools/flash_device.py --port /dev/ttyUSB0 --device-id dc-XXXX --location-id <loc> --hmac-secret <secret> --wifi-ssid "<ssid>" --wifi-password "<pw>"`.
4. `pio device monitor --port /dev/ttyUSB0 --baud 115200`.
5. Wait ≤30s. Expect the `Guru Meditation Error: Core 0 panic'ed (InstrFetchProhibited)` traceback above.
Crash is **deterministic** on the customer's network (Elly-Fi). Worth retesting on a quiet desk network — if it doesn't repro there, the trigger is camera-task starvation interacting with NimBLE timers, not a pure NimBLE bug.
To decode any future crash backtrace:
```sh
~/.platformio/packages/toolchain-xtensa-esp32/bin/xtensa-esp32-elf-addr2line \
-e firmware/.pio/build/timercam/firmware.elf -pfiC <addr1> <addr2> ...
```
---
## Investigation paths, in order of effort/confidence
### 1. Confirm the failing call site (cheap, do this first)
The addr2line line numbers can be off by ±3 due to inlining. Add a temporary `Serial.printf` patch to `npl_os_freertos.c` `os_callout_timer_cb` to log `co`, `co->evq`, `co->ev.fn` on entry. Reproduce. Then we know with certainty whether `co->ev.fn` is NULL on the direct-call path or whether this is an FreeRTOS-level issue (queued command for a deleted timer).
If `evq != NULL` and we still crash, the NULL is in the queued event dispatcher (a different code path; pivot the investigation).
### 2. Try upgrading NimBLE-Arduino to 2.x (medium effort, likely-fix)
`platformio.ini` has `h2zero/NimBLE-Arduino@^1.4.2`. 2.x rewrote the porting layer significantly. Breaking API changes — `NimBLEAdvertisedDeviceCallbacks` was renamed/restructured. Touch points: `firmware/src/ble_scanner.cpp` (the only file that uses NimBLE).
Try: pin `^2.0.0`, fix the API breakage in `ble_scanner.cpp` (it's <100 lines). If 2.x crashes too, the issue is independent of NimBLE version → pivot to (3) or (4).
### 3. Reduce camera-task starvation (cheap, may be sufficient)
The `EV-VSYNC-OVF` lines are the canary. The camera task pins core 1 at priority 2 doing CV processing every 200ms. NimBLE host task runs on core 0 by default but the FreeRTOS timer service task is core-agnostic and may be starved during long CV passes that hold a mutex.
Things to try in `firmware/src/main.cpp`:
- Lower `CAM_FPS` from 5 to 3, see if VSYNC-OVF still appears.
- Move CV processing off the capture path (capture into a queue, process at lower priority).
- Raise FreeRTOS timer-service task priority via `configTIMER_TASK_PRIORITY` (sdkconfig).
- Confirm NimBLE host task pinning — `CONFIG_BT_NIMBLE_PINNED_TO_CORE` should be 0 or 1 (not unpinned).
### 4. Local NULL-guard patch (last resort, masks the bug)
If upgrade is blocked and starvation reduction isn't enough, patch the vendored source:
```c
// npl_os_freertos.c:1740
} else {
if (co->ev.fn) co->ev.fn(&co->ev);
}
```
This silences the crash but drops the dropped event. The dropped events are likely scan-result deliveries; we'd undercount BLE devices but not crash. Acceptable as a stopgap with a `// TODO: remove when NimBLE upgraded` and a note in this doc.
Caveat: vendored library files in `.pio/libdeps/` get blown away by clean builds. Either copy NimBLE into `firmware/lib/` and pin it (vendored), or use `lib_archive` + a post-install script. Don't ship a build that depends on an unpinned hand-edit.
### 5. Replace BLE stack (high effort)
If 2.x also crashes and starvation reduction doesn't help, switch to the IDF-native bluedroid stack via the Arduino-ESP32 `BLEDevice` API. Larger memory footprint (~30KB more heap) but a different lifecycle model — won't share NimBLE's bug.
---
## Constraints / things not to break
- `firmware/src/reporter.cpp` DNS path with `REPORTER_API_FALLBACK_IP` — production fix, must keep working. Do not regress to `HTTPClient`.
- `BLE_SCANNING_ENABLED 0` is the **shipping default** until this is resolved. Devices in the field rely on this; flip to `1` only in dev builds.
- `firmware/lib/net_guard/net_guard.cpp` `net_guard_pin_dns()` is called both at boot and on every WiFi reconnect; if reorganizing net_guard, preserve both call sites.
- The `ble_scanner` module supports `ble_scanner_pause`/`resume` for OTA — verify it still works after any NimBLE upgrade (`ArduinoOTA.onStart` hook in `main.cpp:248`).
---
## Open questions
- Does the crash repro on a quiet network with no `EV-VSYNC-OVF`? (Determines whether starvation is necessary vs sufficient.)
- Was BLE working in a previous build, and on which NimBLE version? Earliest BLE-related commit traced to is well before today; binary search across firmware commits with BLE enabled would identify the regression boundary if it's our code.
- Does the customer site have an unusual RF environment (very dense BLE) that increases the callout-churn rate, making the race more likely? Worth a `nimble_scan_event` count log during a 60s capture window.
---
## Quick verification once you think it's fixed
1. Set `BLE_SCANNING_ENABLED 1`, rebuild, flash.
2. Run for at least 10 minutes on the customer network — the original crash hit within ~1s, so 10 min with no panic is strong evidence.
3. Confirm a successful hourly cycle: `[CV] entry/exit`, then `[HTTP] POST .../events/batch ... -> 200`, BLE record with non-zero `unique_devices`.
4. Run a second device side-by-side; confirm no cross-device interference.
When done, set `BLE_SCANNING_ENABLED 1` as the default and remove the gate (keep the comment block as institutional memory of the bug).

View File

@@ -0,0 +1,3 @@
#pragma once
// Format: MAJOR.MINOR.PATCH (SemVer) — OTA version compare uses sscanf("%d.%d.%d")
#define FW_VERSION "1.0.1"

View File

@@ -0,0 +1,156 @@
// firmware/lib/event_log/event_log.cpp
#include "event_log.h"
#include <string.h>
#include <stdio.h>
#ifdef ARDUINO
#include <Arduino.h>
#include <Preferences.h>
#include <time.h>
#include <freertos/FreeRTOS.h>
#include <freertos/semphr.h>
static Preferences s_prefs;
static const char* NVS_NS = "evlog";
static bool s_ok = false;
static SemaphoreHandle_t s_mutex = nullptr;
static uint32_t g_head = 0; // next write slot (0..31), RAM-only
static uint32_t g_cnt = 0; // total writes since boot scan, RAM-only
static constexpr time_t NTP_SYNC_THRESHOLD = 1700000000; // 2023-11-14
#else
// Native build: in-memory stub
#include <cstdint>
static uint8_t g_slots[32 * 32];
static uint32_t g_head = 0;
static uint32_t g_cnt = 0;
extern "C" void event_log_test_reset() {
memset(g_slots, 0, sizeof(g_slots));
g_head = 0;
g_cnt = 0;
}
extern "C" void event_log_test_simulate_reboot() {
// Simulate device reboot: clear in-RAM state, keep persistent slots.
g_head = 0;
g_cnt = 0;
}
#endif
static const size_t SLOTS = 32;
static const size_t SLOT_SIZE = sizeof(EventLogEntry);
uint16_t event_log_path_hash(const char* path) {
// fnv1a-16 (fold 32-bit fnv1a down to 16 bits)
uint32_t h = 0x811c9dc5u;
while (*path) { h ^= (uint8_t)*path++; h *= 0x01000193u; }
return (uint16_t)((h >> 16) ^ (h & 0xFFFF));
}
static void slot_write(size_t idx, const EventLogEntry& e) {
#ifdef ARDUINO
char key[8]; snprintf(key, sizeof(key), "s%u", (unsigned)idx);
s_prefs.putBytes(key, &e, SLOT_SIZE);
#else
memcpy(&g_slots[idx * SLOT_SIZE], &e, SLOT_SIZE);
#endif
}
static bool slot_read(size_t idx, EventLogEntry& e) {
#ifdef ARDUINO
char key[8]; snprintf(key, sizeof(key), "s%u", (unsigned)idx);
size_t n = s_prefs.getBytes(key, &e, SLOT_SIZE);
return n == SLOT_SIZE;
#else
memcpy(&e, &g_slots[idx * SLOT_SIZE], SLOT_SIZE);
return true;
#endif
}
void event_log_init() {
#ifdef ARDUINO
if (s_mutex == nullptr) {
s_mutex = xSemaphoreCreateMutex();
}
s_ok = s_prefs.begin(NVS_NS, /*readOnly=*/false);
if (!s_ok) {
Serial.println("[evlog] NVS begin failed");
return;
}
#endif
// Scan all 32 slots; locate the one with the largest seq.
// Empty log: every slot tag == 0 (not a valid EventLogTag, which starts at 1).
uint32_t max_seq = 0;
int max_idx = -1;
bool any_valid = false;
for (size_t i = 0; i < SLOTS; i++) {
EventLogEntry e = {};
if (!slot_read(i, e)) continue;
if (e.tag == 0) continue;
any_valid = true;
if (max_idx < 0 || e.seq >= max_seq) {
max_seq = e.seq;
max_idx = (int)i;
}
}
if (any_valid) {
g_head = (uint32_t)((max_idx + 1) % SLOTS);
g_cnt = max_seq + 1;
} else {
g_head = 0;
g_cnt = 0;
}
}
void event_log_write(EventLogTag tag, uint16_t data0, uint16_t data1) {
#ifdef ARDUINO
if (!s_ok) return;
// Bounded wait: skip on contention rather than stall the calling task.
// This matters because event_log_write runs from the WiFi event task
// (priority 23); blocking it on a 10-100ms NVS write can overflow the
// event queue. Diagnostic loss is preferable to dropped WiFi events.
if (s_mutex && xSemaphoreTake(s_mutex, pdMS_TO_TICKS(50)) != pdTRUE) return;
EventLogEntry e = {};
time_t now = time(nullptr);
e.ts_unix = (now > NTP_SYNC_THRESHOLD) ? (uint32_t)now : 0;
e.uptime_s = (uint32_t)(millis() / 1000);
e.tag = (uint8_t)tag;
e.data0 = data0;
e.data1 = data1;
e.seq = g_cnt;
slot_write(g_head % SLOTS, e);
g_head = (g_head + 1) % SLOTS;
g_cnt = g_cnt + 1;
if (s_mutex) xSemaphoreGive(s_mutex);
#else
EventLogEntry e = {};
e.ts_unix = 0;
e.uptime_s = 0;
e.tag = (uint8_t)tag;
e.data0 = data0;
e.data1 = data1;
e.seq = g_cnt;
slot_write(g_head % SLOTS, e);
g_head = (g_head + 1) % SLOTS;
g_cnt = g_cnt + 1;
#endif
}
size_t event_log_read_recent(EventLogEntry* out, size_t max_entries) {
#ifdef ARDUINO
if (!s_ok) return 0;
// Bounded wait to match event_log_write. Reads are slower (32 NVS gets),
// but returning 0 entries under contention beats blocking the caller.
if (s_mutex && xSemaphoreTake(s_mutex, pdMS_TO_TICKS(50)) != pdTRUE) return 0;
#endif
uint32_t head = g_head;
uint32_t cnt = g_cnt;
size_t available = (cnt < SLOTS) ? (size_t)cnt : SLOTS;
size_t n = (max_entries < available) ? max_entries : available;
for (size_t i = 0; i < n; i++) {
// newest is at (head - 1), then (head - 2), ... modulo SLOTS
size_t idx = (head + SLOTS - 1 - i) % SLOTS;
slot_read(idx, out[i]);
}
#ifdef ARDUINO
if (s_mutex) xSemaphoreGive(s_mutex);
#endif
return n;
}

View File

@@ -0,0 +1,48 @@
// firmware/lib/event_log/event_log.h
#pragma once
#include <stdint.h>
#include <stddef.h>
enum EventLogTag : uint8_t {
EVT_BOOT = 1, // data0 = esp_reset_reason() value
EVT_WIFI_UP = 2, // data0 = rssi (signed, cast)
EVT_WIFI_DOWN = 3, // data0 = disconnect reason code
EVT_HTTP_OK = 4, // data0 = path hash (fnv1a16), data1 = elapsed_ms
EVT_HTTP_FAIL = 5, // data0 = path hash, data1 = (http_code or negative errno)
EVT_HEARTBEAT_MISS = 6, // data0 = consecutive miss count
EVT_NTP_SYNC = 7, // data0 = seconds since boot
EVT_REBOOT = 8, // data0 = reason enum (defined below)
};
enum RebootReason : uint8_t {
REBOOT_HEARTBEAT_MISS = 1,
REBOOT_FACTORY_RESET = 2,
REBOOT_OTA = 3,
REBOOT_WIFI_REPROV = 4,
REBOOT_FATAL_CONFIG = 5,
REBOOT_FATAL_CAMERA = 6,
};
struct EventLogEntry {
uint32_t ts_unix; // 0 if NTP not synced yet; fall back to millis/1000
uint32_t uptime_s; // millis()/1000 at log time
uint16_t data0;
uint16_t data1;
uint8_t tag; // EventLogTag
uint32_t seq; // widened; survives multi-year event rates
uint8_t _pad[15]; // pad to 32 bytes for fixed slot size
} __attribute__((packed));
static_assert(sizeof(EventLogEntry) == 32, "EventLogEntry must be 32 bytes");
// NVS-backed 32-slot ring buffer. Safe to call before NTP sync.
// Call exactly once from application setup, before any task writes events.
void event_log_init();
// Safe to call from any FreeRTOS task after event_log_init().
// Bounded mutex wait (~50ms) — will silently skip on contention rather than
// block the calling task. Acceptable for diagnostic logging.
void event_log_write(EventLogTag tag, uint16_t data0 = 0, uint16_t data1 = 0);
// Same bounded-wait contract as event_log_write: returns 0 on mutex timeout.
size_t event_log_read_recent(EventLogEntry* out, size_t max_entries);
uint16_t event_log_path_hash(const char* path); // fnv1a16 — exposed for tests

View File

@@ -14,12 +14,21 @@ static HString bytes_to_hex(const uint8_t* bytes, size_t len) {
return out;
}
static void hex_to_bytes(const HString& hex, uint8_t* out, size_t out_len) {
if (hex.length() % 2 != 0) return; // malformed — odd-length hex
for (size_t i = 0; i < out_len && (i * 2 + 1) < hex.length(); i++) {
char byte_str[3] = {hex[i*2], hex[i*2+1], 0};
static bool is_hex_char(char c) {
return (c >= '0' && c <= '9') ||
(c >= 'a' && c <= 'f') ||
(c >= 'A' && c <= 'F');
}
static bool hex_to_bytes(const HString& hex, uint8_t* out, size_t out_len) {
if (hex.length() != out_len * 2) return false;
for (size_t i = 0; i < out_len; i++) {
char a = hex[i*2], b = hex[i*2+1];
if (!is_hex_char(a) || !is_hex_char(b)) return false;
char byte_str[3] = {a, b, 0};
out[i] = (uint8_t)strtol(byte_str, nullptr, 16);
}
return true;
}
static bool sha256(const uint8_t* data, size_t len, uint8_t out[32]) {
@@ -52,10 +61,20 @@ HString hmac_sign(const HString& secret_hex,
snprintf(ts_buf, sizeof(ts_buf), "%u", (unsigned)timestamp);
HString message = method + "\n" + path + "\n" + ts_buf + "\n" + body_hash_hex;
// 3. Decode secret from hex
// 3. Decode secret from hex. Reject empty / odd-length / oversized /
// non-hex inputs — flash_device.py validates at provision time, but
// hmac_sign refuses to sign under a malformed key regardless of how it
// ended up in NVS (legacy provisioning, NVS corruption, etc.).
if (secret_hex.length() == 0 ||
secret_hex.length() > 128 ||
secret_hex.length() % 2 != 0) {
return HString{};
}
size_t secret_len = secret_hex.length() / 2;
uint8_t secret[64] = {};
hex_to_bytes(secret_hex, secret, secret_len);
if (!hex_to_bytes(secret_hex, secret, secret_len)) {
return HString{};
}
// 4. HMAC-SHA256(secret, message)
uint8_t hmac_result[32];

View File

@@ -0,0 +1,6 @@
{
"name": "net_guard",
"build": {
"flags": ["-I$PROJECT_SRC_DIR"]
}
}

View File

@@ -0,0 +1,142 @@
// firmware/lib/net_guard/net_guard.cpp
#include "net_guard.h"
uint32_t net_guard_next_backoff_ms(uint32_t attempt) {
if (attempt >= 6) return 60000;
return 1000u * (1u << attempt);
}
#ifdef ARDUINO
#include "config.h"
#include <WiFi.h>
#include <Arduino.h>
#include <lwip/dns.h>
#include <esp_netif.h>
#include "event_log.h"
// Both lwIP's ip_addr_t and esp-netif's esp_ip_addr_t alias the same on-disk
// layout for IPv4, but the C++ types differ. Take the raw u32 to sidestep it.
static String fmt_v4(uint32_t addr_be) {
if (addr_be == 0) return String("0.0.0.0");
char b[16];
snprintf(b, sizeof(b), "%u.%u.%u.%u",
(unsigned)((addr_be >> 0) & 0xFF),
(unsigned)((addr_be >> 8) & 0xFF),
(unsigned)((addr_be >> 16) & 0xFF),
(unsigned)((addr_be >> 24) & 0xFF));
return String(b);
}
void net_guard_dump_dns(const char* tag) {
const ip_addr_t* d0 = dns_getserver(0);
const ip_addr_t* d1 = dns_getserver(1);
Serial.printf("[DNS] %s lwip: %s , %s\n", tag,
fmt_v4(d0 ? ip_2_ip4(d0)->addr : 0).c_str(),
fmt_v4(d1 ? ip_2_ip4(d1)->addr : 0).c_str());
esp_netif_t* sta = esp_netif_get_handle_from_ifkey("WIFI_STA_DEF");
if (sta) {
esp_netif_dns_info_t main_dns{}, backup_dns{};
esp_netif_get_dns_info(sta, ESP_NETIF_DNS_MAIN, &main_dns);
esp_netif_get_dns_info(sta, ESP_NETIF_DNS_BACKUP, &backup_dns);
Serial.printf("[DNS] %s netif: %s , %s\n", tag,
fmt_v4(main_dns.ip.u_addr.ip4.addr).c_str(),
fmt_v4(backup_dns.ip.u_addr.ip4.addr).c_str());
} else {
Serial.printf("[DNS] %s netif: <no STA handle>\n", tag);
}
}
void net_guard_pin_dns() {
ip_addr_t d1, d2;
IP_ADDR4(&d1, 1, 1, 1, 1);
IP_ADDR4(&d2, 8, 8, 8, 8);
dns_setserver(0, &d1);
dns_setserver(1, &d2);
// Also push through the esp_netif layer. dns_setserver() writes the
// global lwIP table directly; esp_netif_set_dns_info() is what the
// DHCP client itself calls, so writing here prevents the next DHCP
// event from silently overwriting our pin.
esp_netif_t* sta = esp_netif_get_handle_from_ifkey("WIFI_STA_DEF");
if (sta) {
esp_netif_dns_info_t info{};
IP_ADDR4(&info.ip, 1, 1, 1, 1);
esp_netif_set_dns_info(sta, ESP_NETIF_DNS_MAIN, &info);
IP_ADDR4(&info.ip, 8, 8, 8, 8);
esp_netif_set_dns_info(sta, ESP_NETIF_DNS_BACKUP, &info);
}
net_guard_dump_dns("pinned");
}
// Shared with the WiFi event task. 32-bit aligned loads/stores are atomic on
// Xtensa; volatile suffices. Tick re-evaluates every loop iteration, so stale
// reads self-correct within ~200ms.
static const DeviceConfig* s_cfg = nullptr;
static volatile uint8_t s_last_disconnect = 0;
static volatile bool s_up = false;
static volatile uint32_t s_attempts = 0;
static volatile uint32_t s_next_retry_ms = 0;
static void on_wifi_event(WiFiEvent_t event, WiFiEventInfo_t info) {
switch (event) {
case ARDUINO_EVENT_WIFI_STA_GOT_IP:
// Override DHCP-supplied DNS. Some routers return TC=1 for short
// answers (forcing TCP fallback that lwIP can't follow), or hand
// out an unreachable resolver. Pin to public resolvers so
// hostByName() never depends on the local network's DNS quality.
net_guard_pin_dns();
s_up = true;
s_attempts = 0;
s_next_retry_ms = 0;
event_log_write(EVT_WIFI_UP, (uint16_t)(int16_t)WiFi.RSSI(), 0);
break;
case ARDUINO_EVENT_WIFI_STA_DISCONNECTED:
s_up = false;
s_last_disconnect = (uint8_t)info.wifi_sta_disconnected.reason;
event_log_write(EVT_WIFI_DOWN, s_last_disconnect, 0);
s_next_retry_ms = millis() + net_guard_next_backoff_ms(s_attempts);
break;
default: break;
}
}
void net_guard_start(const DeviceConfig& cfg) {
s_cfg = &cfg;
// Seed s_up from the current WiFi state. setup()'s busy-wait on
// WiFi.begin() can produce a STA_GOT_IP before onEvent() is registered;
// without this seed, the first tick would force a spurious reconnect.
if (WiFi.status() == WL_CONNECTED) s_up = true;
WiFi.onEvent(on_wifi_event);
WiFi.setAutoReconnect(false); // we drive reconnect ourselves
}
bool net_guard_is_up() { return s_up; }
uint8_t net_guard_last_disconnect_reason() { return s_last_disconnect; }
extern "C" void net_guard_tick() {
// Watchdog against silent WiFi death: if we think we're up but the radio
// disagrees, force the DOWN state so reconnect scheduling kicks in.
if (s_up && WiFi.status() != WL_CONNECTED) {
s_up = false;
s_last_disconnect = 0xFF; // 0xFF = "silent death, no event"
event_log_write(EVT_WIFI_DOWN, s_last_disconnect, 0);
s_next_retry_ms = millis() + net_guard_next_backoff_ms(s_attempts);
}
if (s_up || s_cfg == nullptr) return;
// Wrap-safe: signed difference handles the ~49.7-day millis() wrap. The
// device is meant to run for months between reboots, so absolute compare
// (millis() < s_next_retry_ms) would either tight-loop retries across the
// wrap or stall them until millis() climbed back past an old high mark.
if ((int32_t)(millis() - s_next_retry_ms) < 0) return;
if (s_up) return; // re-check after the timing gate — closes GOT_IP-vs-tick race
s_attempts++;
// WiFi.begin() alone re-associates cleanly; a prior WiFi.disconnect() call
// synchronously emits STA_DISCONNECTED on the event task, which would
// double-log EVT_WIFI_DOWN (reason=ASSOC_LEAVE) on every retry.
WiFi.begin(s_cfg->wifi_ssid.c_str(), s_cfg->wifi_pass.c_str());
s_next_retry_ms = millis() + net_guard_next_backoff_ms(s_attempts);
}
#endif

View File

@@ -0,0 +1,33 @@
// firmware/lib/net_guard/net_guard.h
#pragma once
#include <stdint.h>
// Exponential backoff: 1s, 2s, 4s, 8s, 16s, 32s, 60s, 60s, ...
// attempt 0 -> 1000ms, clamped at 60000ms.
uint32_t net_guard_next_backoff_ms(uint32_t attempt);
#ifdef ARDUINO
struct DeviceConfig; // forward-decl; only net_guard_start needs the full type
// Registers WiFi.onEvent() handler and starts auto-reconnect loop.
// Must be called once after WiFi.begin() succeeds.
void net_guard_start(const DeviceConfig& cfg);
// True iff WiFi is currently associated with IP.
bool net_guard_is_up();
// Last disconnect reason code from WIFI_EVENT_STA_DISCONNECTED (0 = none).
uint8_t net_guard_last_disconnect_reason();
// Non-blocking tick called from loop(); kicks reconnect if due.
extern "C" void net_guard_tick();
// Override DHCP-supplied DNS with public resolvers (1.1.1.1, 8.8.8.8).
// Idempotent; safe to call repeatedly. net_guard re-applies on every GOT_IP,
// but main.cpp must call it once for the boot association (which completes
// before net_guard_start() registers its event handler).
void net_guard_pin_dns();
// Diagnostic: print current DNS table state from both lwIP and esp_netif.
void net_guard_dump_dns(const char* tag);
#endif

View File

@@ -0,0 +1,6 @@
{
"name": "ota_updater",
"build": {
"flags": ["-I$PROJECT_INCLUDE_DIR"]
}
}

View File

@@ -0,0 +1,4 @@
#pragma once
// Auto-generated by tools/gen_signing_key.py — DO NOT EDIT
// ECDSA P-256 public key, uncompressed X9.62 (04 || X || Y)
static const uint8_t kOtaPublicKey[65] = {0x04, 0x1c, 0x92, 0x43, 0x23, 0xe9, 0xac, 0xd1, 0xe8, 0x05, 0x32, 0x49, 0x39, 0x12, 0x95, 0xb2, 0x0a, 0x3e, 0xfb, 0x9d, 0xdf, 0xee, 0xd1, 0x98, 0x87, 0x97, 0xa3, 0xb8, 0xcb, 0x2b, 0xa6, 0x06, 0xe0, 0x83, 0x32, 0x71, 0xd2, 0x5f, 0x80, 0x40, 0x68, 0xcd, 0x00, 0xe5, 0x0e, 0xba, 0x13, 0xf6, 0x97, 0x43, 0x6f, 0xe6, 0x4f, 0xd0, 0x95, 0x53, 0x0e, 0xd7, 0x9a, 0x8a, 0x2e, 0x25, 0x52, 0xb4, 0xaf};

View File

@@ -0,0 +1,319 @@
// firmware/lib/ota_updater/ota_updater.cpp
#include "ota_updater.h"
#include <stdio.h>
#include <string.h>
#include <mbedtls/ecdsa.h>
#include <mbedtls/ecp.h>
#include <mbedtls/bignum.h>
// ── version comparison ─────────────────────────────────────────────────────
bool ota_version_newer(const char* current, const char* remote) {
int ca = 0, cb = 0, cc = 0;
int ra = 0, rb = 0, rc = 0;
if (sscanf(current, "%d.%d.%d", &ca, &cb, &cc) != 3) return false;
if (sscanf(remote, "%d.%d.%d", &ra, &rb, &rc) != 3) return false;
if (ra != ca) return ra > ca;
if (rb != cb) return rb > cb;
return rc > cc;
}
// ── signature verification ─────────────────────────────────────────────────
bool ota_verify_signature_with_key(const uint8_t hash32[32], const uint8_t sig64[64],
const uint8_t pubkey65[65]) {
mbedtls_ecp_group grp;
mbedtls_ecp_point Q;
mbedtls_mpi r, s;
mbedtls_ecp_group_init(&grp);
mbedtls_ecp_point_init(&Q);
mbedtls_mpi_init(&r);
mbedtls_mpi_init(&s);
bool ok = false;
if (mbedtls_ecp_group_load(&grp, MBEDTLS_ECP_DP_SECP256R1) == 0 &&
mbedtls_ecp_point_read_binary(&grp, &Q, pubkey65, 65) == 0 &&
mbedtls_mpi_read_binary(&r, sig64, 32) == 0 &&
mbedtls_mpi_read_binary(&s, sig64 + 32, 32) == 0 &&
mbedtls_ecdsa_verify(&grp, hash32, 32, &Q, &r, &s) == 0) {
ok = true;
}
mbedtls_ecp_group_free(&grp);
mbedtls_ecp_point_free(&Q);
mbedtls_mpi_free(&r);
mbedtls_mpi_free(&s);
return ok;
}
// ── device-only code ───────────────────────────────────────────────────────
#ifndef NATIVE_TEST
#include <Arduino.h>
#include <time.h>
#include <HTTPClient.h>
#include <WiFi.h>
#include <ArduinoJson.h>
#include <esp_ota_ops.h>
#include <mbedtls/sha256.h>
#include <mbedtls/base64.h>
#include "hmac.h"
#include "ota_pubkey.h"
#include "version.h"
bool ota_verify_signature(const uint8_t hash32[32], const uint8_t sig64[64]) {
return ota_verify_signature_with_key(hash32, sig64, kOtaPublicKey);
}
static const char* s_server_base = nullptr;
static const char* s_device_id = nullptr;
static const char* s_hmac_secret = nullptr;
static uint32_t s_interval_ms = 21600000UL; // 6 h default
static uint32_t s_last_check_ms = 0;
void ota_updater_init(const char* server_base, const char* device_id,
const char* hmac_secret, uint32_t check_interval_ms) {
s_server_base = server_base;
s_device_id = device_id;
s_hmac_secret = hmac_secret;
s_interval_ms = check_interval_ms;
s_last_check_ms = 0; // force first check on next call
}
static bool add_hmac_headers(HTTPClient& http, const char* method, const char* path) {
uint32_t ts = (uint32_t)time(nullptr);
if (ts < 1700000000UL) {
Serial.printf("[OTA] Clock not synced (ts=%u) — skipping HMAC sign\n", (unsigned)ts);
return false;
}
String sig = hmac_sign(s_hmac_secret, method, path, ts, "");
if (sig.isEmpty()) {
Serial.println("[OTA] HMAC sign failed");
return false;
}
Serial.printf("[OTA] HMAC headers: device=%s ts=%u sig=%s...\n",
s_device_id, (unsigned)ts, sig.substring(0, 12).c_str());
http.addHeader("X-Device-Id", s_device_id);
http.addHeader("X-Timestamp", String(ts));
http.addHeader("X-Signature", sig);
return true;
}
static bool download_and_flash(const char* fw_url, size_t expected_size,
const uint8_t sig64[64]) {
const esp_partition_t* running = esp_ota_get_running_partition();
const esp_partition_t* target = esp_ota_get_next_update_partition(nullptr);
if (!target) {
Serial.println("[OTA] No update partition found");
return false;
}
Serial.printf("[OTA] running='%s' (off=0x%x sz=0x%x), target='%s' (off=0x%x sz=0x%x)\n",
running ? running->label : "?",
running ? (unsigned)running->address : 0,
running ? (unsigned)running->size : 0,
target->label,
(unsigned)target->address, (unsigned)target->size);
if (expected_size > target->size) {
Serial.printf("[OTA] image (%zu) larger than partition (%u)\n",
expected_size, (unsigned)target->size);
return false;
}
esp_ota_handle_t handle;
esp_err_t er = esp_ota_begin(target, OTA_WITH_SEQUENTIAL_WRITES, &handle);
if (er != ESP_OK) {
Serial.printf("[OTA] esp_ota_begin failed: %s\n", esp_err_to_name(er));
return false;
}
mbedtls_sha256_context sha_ctx;
mbedtls_sha256_init(&sha_ctx);
mbedtls_sha256_starts(&sha_ctx, 0);
HTTPClient http;
http.begin(fw_url);
http.setTimeout(30000);
if (!add_hmac_headers(http, "GET", "/ota/firmware")) {
Serial.println("[OTA] Aborting firmware download: HMAC sign failed");
mbedtls_sha256_free(&sha_ctx);
esp_ota_abort(handle);
return false;
}
Serial.printf("[OTA] downloading firmware: %s\n", fw_url);
int code = http.GET();
Serial.printf("[OTA] firmware response: HTTP %d\n", code);
if (code != HTTP_CODE_OK) {
String body = http.getString();
Serial.printf("[OTA] error body: %s\n", body.c_str());
http.end();
mbedtls_sha256_free(&sha_ctx);
esp_ota_abort(handle);
return false;
}
int content_len = http.getSize();
Serial.printf("[OTA] Content-Length: %d (expected %zu)\n",
content_len, expected_size);
WiFiClient* stream = http.getStreamPtr();
uint8_t buf[4096];
size_t written = 0;
size_t last_log_at = 0;
bool write_failed = false;
uint32_t start_ms = millis();
while (written < expected_size) {
size_t want = min((size_t)sizeof(buf), expected_size - written);
int got = stream->readBytes(buf, want);
if (got <= 0) {
Serial.printf("[OTA] stream ended at %zu/%zu bytes (readBytes=%d)\n",
written, expected_size, got);
break;
}
esp_err_t we = esp_ota_write(handle, buf, (size_t)got);
if (we != ESP_OK) {
Serial.printf("[OTA] esp_ota_write failed at offset %zu: %s\n",
written, esp_err_to_name(we));
write_failed = true;
break;
}
mbedtls_sha256_update(&sha_ctx, buf, (size_t)got);
written += (size_t)got;
if (written - last_log_at >= 131072 || written == expected_size) {
Serial.printf("[OTA] progress: %zu/%zu bytes\n", written, expected_size);
last_log_at = written;
}
}
uint32_t elapsed_ms = millis() - start_ms;
http.end();
Serial.printf("[OTA] download done: %zu bytes in %u ms\n",
written, (unsigned)elapsed_ms);
uint8_t hash[32];
mbedtls_sha256_finish(&sha_ctx, hash);
mbedtls_sha256_free(&sha_ctx);
char hex[65];
for (int i = 0; i < 32; i++) snprintf(hex + i*2, 3, "%02x", hash[i]);
Serial.printf("[OTA] sha256(image)=%s\n", hex);
if (write_failed) {
esp_ota_abort(handle);
return false;
}
if (written != expected_size) {
Serial.printf("[OTA] Download truncated (%zu/%zu bytes)\n", written, expected_size);
esp_ota_abort(handle);
return false;
}
if (!ota_verify_signature_with_key(hash, sig64, kOtaPublicKey)) {
Serial.println("[OTA] SIGNATURE INVALID — staying on current firmware");
esp_ota_abort(handle);
return false;
}
Serial.println("[OTA] signature OK");
esp_err_t end_err = esp_ota_end(handle);
if (end_err != ESP_OK) {
Serial.printf("[OTA] esp_ota_end failed: %s\n", esp_err_to_name(end_err));
return false;
}
esp_err_t boot_err = esp_ota_set_boot_partition(target);
if (boot_err != ESP_OK) {
Serial.printf("[OTA] esp_ota_set_boot_partition failed: %s\n",
esp_err_to_name(boot_err));
return false;
}
Serial.printf("[OTA] boot partition set to '%s' — rebooting in 500 ms\n",
target->label);
Serial.flush();
delay(500);
esp_restart();
return true; // unreachable
}
bool ota_updater_check_and_apply() {
if (!s_server_base || !s_device_id || !s_hmac_secret) {
Serial.println("[OTA] check skipped: updater not initialized");
return false;
}
if (s_last_check_ms != 0 &&
(uint32_t)(millis() - s_last_check_ms) < s_interval_ms) {
return false;
}
s_last_check_ms = millis();
if (WiFi.status() != WL_CONNECTED) {
Serial.printf("[OTA] check skipped: WiFi not connected (status=%d)\n",
WiFi.status());
return false;
}
char check_path[128];
snprintf(check_path, sizeof(check_path), "/ota/check?version=%s", FW_VERSION);
char check_url[256];
snprintf(check_url, sizeof(check_url), "%s%s", s_server_base, check_path);
Serial.printf("[OTA] check → GET %s (fw=%s)\n", check_url, FW_VERSION);
HTTPClient http;
if (!http.begin(check_url)) {
Serial.println("[OTA] http.begin() failed");
return false;
}
if (!add_hmac_headers(http, "GET", check_path)) {
Serial.println("[OTA] Aborting check: HMAC sign failed");
http.end();
return false;
}
int code = http.GET();
Serial.printf("[OTA] check response: HTTP %d\n", code);
if (code != HTTP_CODE_OK) {
String body = http.getString();
Serial.printf("[OTA] error body: %s\n", body.c_str());
http.end();
return false;
}
JsonDocument doc;
DeserializationError err = deserializeJson(doc, http.getStream());
http.end();
if (err) {
Serial.printf("[OTA] JSON parse error: %s\n", err.c_str());
return false;
}
if (!doc["update"].as<bool>()) {
Serial.printf("[OTA] Firmware up to date (%s)\n", FW_VERSION);
return false;
}
const char* remote_ver = doc["version"] | "";
size_t fw_size = doc["size"] | 0;
const char* sig_b64 = doc["sig_b64"] | "";
if (fw_size == 0 || strlen(sig_b64) == 0) {
log_e("[OTA] Invalid update manifest");
return false;
}
log_i("[OTA] Update: %s → %s (%zu bytes)", FW_VERSION, remote_ver, fw_size);
uint8_t sig64[64];
size_t sig_len = 0;
if (mbedtls_base64_decode(sig64, sizeof(sig64), &sig_len,
(const uint8_t*)sig_b64, strlen(sig_b64)) != 0 ||
sig_len != 64) {
log_e("[OTA] Bad signature encoding (len=%zu)", sig_len);
return false;
}
char fw_url[256];
snprintf(fw_url, sizeof(fw_url), "%s/ota/firmware", s_server_base);
return download_and_flash(fw_url, fw_size, sig64);
}
#endif // NATIVE_TEST

View File

@@ -0,0 +1,27 @@
// firmware/lib/ota_updater/ota_updater.h
#pragma once
#include <stdint.h>
#include <stddef.h>
#include <stdbool.h>
// One-time init. Call from setup() after WiFi is ready.
// server_base: e.g. "http://logs.research.bike:8000"
// check_interval_ms: milliseconds between polls (e.g. 6*3600*1000 = 21600000)
void ota_updater_init(const char* server_base,
const char* device_id,
const char* hmac_secret,
uint32_t check_interval_ms);
// Polls server; downloads, verifies, and flashes if newer version available.
// Returns true if update was applied (device reboots before returning false path).
// Safe to call from any task; blocks during download.
bool ota_updater_check_and_apply();
// Exposed for unit testing — pass an arbitrary 65-byte uncompressed P-256 pubkey.
bool ota_version_newer(const char* current, const char* remote);
bool ota_verify_signature_with_key(const uint8_t hash32[32], const uint8_t sig64[64],
const uint8_t pubkey65[65]);
// Production wrapper — uses the compiled-in kOtaPublicKey from ota_pubkey.h.
// Not callable from native tests (requires ota_pubkey.h / device build).
bool ota_verify_signature(const uint8_t hash32[32], const uint8_t sig64[64]);

View File

@@ -21,7 +21,7 @@ upload_flags = --no-stub
lib_deps =
tzapu/WiFiManager@^2.0.17
bblanchon/ArduinoJson@^7.0.0
h2zero/NimBLE-Arduino@^1.4.2
h2zero/NimBLE-Arduino@^2.0.0
espressif/esp32-camera
; Frame-capture build. Strips WiFi/BLE/CV/reporter; streams raw 96x96 frames

View File

@@ -42,8 +42,8 @@ static String sha256_prefix(const String& input) {
return hex;
}
class ScanCallback : public NimBLEAdvertisedDeviceCallbacks {
void onResult(NimBLEAdvertisedDevice* dev) override {
class ScanCallback : public NimBLEScanCallbacks {
void onResult(const NimBLEAdvertisedDevice* dev) override {
String mac = String(dev->getAddress().toString().c_str());
String hash = sha256_prefix(mac);
int rssi = dev->getRSSI();
@@ -51,7 +51,6 @@ class ScanCallback : public NimBLEAdvertisedDeviceCallbacks {
std::lock_guard<std::mutex> lock(s_mutex);
auto it = s_seen.find(hash);
if (it == s_seen.end()) {
Serial.printf("[BLE] new device: %s (rssi %d)\n", hash.c_str(), rssi);
s_seen[hash] = {rssi, 1};
} else {
it->second.rssi_sum += rssi;
@@ -68,16 +67,16 @@ static NimBLEScan* s_scan = nullptr;
void ble_scanner_start() {
NimBLEDevice::init("");
s_scan = NimBLEDevice::getScan();
s_scan->setAdvertisedDeviceCallbacks(&s_callback, true); // true = allow duplicates
s_scan->setScanCallbacks(&s_callback, true); // true = allow duplicates
s_scan->setActiveScan(false); // passive
s_scan->setInterval(100);
s_scan->setWindow(99);
s_scan->setMaxResults(0); // don't store results — callback-only
s_scan->start(0, nullptr, false); // 0 = continuous
s_scan->start(0, false, false); // duration=0 (forever), isContinue=false, restart=false
}
void ble_scanner_pause() { if (s_scan) s_scan->stop(); }
void ble_scanner_resume() { if (s_scan) s_scan->start(0, nullptr, false); }
void ble_scanner_resume() { if (s_scan) s_scan->start(0, false, false); }
void ble_scanner_deinit() {
if (s_scan) s_scan->stop();

View File

@@ -8,6 +8,13 @@
#include "cv.h"
#include "ble_scanner.h"
#include "reporter.h"
#include "event_log.h"
#include "net_guard.h"
#include "version.h"
#include "ota_updater.h"
#include <esp_system.h>
#include <esp_task_wdt.h>
#include <esp_ota_ops.h>
// LED on GPIO2 (TimerCamera-F built-in LED) — verify against board schematic
// Factory reset: hold GPIO37 (BOOT button) for 5 seconds
@@ -15,6 +22,15 @@
#define BUTTON_PIN 37
#define FACTORY_RESET_HOLD_MS 5000
// BLE scanning disabled in production until the NimBLE-Arduino 1.4.2 timer
// race is resolved. Symptom: FreeRTOS timer task dispatches an
// os_callout_timer_cb whose callback fn is NULL, causing PC=0 fetch and
// Historical note: NimBLE-Arduino 1.4.2 had an init/fire race in its FreeRTOS
// callout porting layer that caused a NULL-fn dispatch (PC=0,
// InstrFetchProhibited) within ~1s of boot when the camera task starved the
// timer service. Fixed by upgrading to 2.x (see platformio.ini).
#define BLE_SCANNING_ENABLED 1
#define CAM_FPS 5
#define CAM_INTERVAL_MS (1000 / CAM_FPS)
#define REPORT_INTERVAL_S 3600
@@ -45,10 +61,12 @@ static void check_factory_reset() {
uint32_t held = millis();
while (digitalRead(BUTTON_PIN) == LOW) {
if (millis() - held >= FACTORY_RESET_HOLD_MS) {
event_log_write(EVT_REBOOT, REBOOT_FACTORY_RESET, 0);
config_clear_wifi();
ESP.restart();
}
delay(50);
esp_task_wdt_reset();
}
}
@@ -56,20 +74,12 @@ static void check_factory_reset() {
static void task_camera(void*) {
static uint8_t frame[CV_PIXELS]; // static: avoids 9KB on task stack
int last_logged_track_id = 0; // diagnostic: log each new track once
esp_task_wdt_add(nullptr);
while (true) {
if (camera_capture_96(frame)) {
if (xSemaphoreTake(s_cv_mutex, pdMS_TO_TICKS(100)) == pdTRUE) {
CVResult r = cv_process(g_cv, frame, g_cfg.line_offset);
for (const auto& t : g_cv.tracks) {
if (t.id > last_logged_track_id) {
last_logged_track_id = t.id;
Serial.printf("[CV] spawn id=%d y=%.1f\n", t.id, t.spawn_y);
}
}
if (r.fg_count > 0) {
Serial.printf("[F] n=%d y=%d..%d c=%.1f\n",
r.fg_count, r.fg_min_y, r.fg_max_y, r.fg_centroid_y);
}
(void)last_logged_track_id;
if (r.entries_delta) Serial.printf("[CV] entry +%d (total %d) first=%.1f min=%.1f max=%.1f last=%.1f dur=%d\n",
r.entries_delta, g_cv.entries,
r.fire_first_c, r.fire_min_c, r.fire_max_c, r.fire_last_c, r.fire_duration);
@@ -82,21 +92,41 @@ static void task_camera(void*) {
}
}
vTaskDelay(pdMS_TO_TICKS(CAM_INTERVAL_MS));
esp_task_wdt_reset();
}
}
static void ota_task(void*) {
// Min 10s to avoid pathological fast loops if NVS is corrupted
uint32_t interval_ms = g_cfg.ota_interval_s < 10 ? 10000UL : g_cfg.ota_interval_s * 1000UL;
Serial.printf("[OTA] task started, interval=%u ms\n", (unsigned)interval_ms);
for (;;) {
if (WiFi.isConnected()) {
Serial.println("[OTA] tick: WiFi connected, running check");
ota_updater_check_and_apply();
} else {
Serial.printf("[OTA] tick: WiFi not connected (status=%d), skipping\n",
WiFi.status());
}
vTaskDelay(pdMS_TO_TICKS(interval_ms));
}
}
// Hourly reporter task — runs on core 0
static void task_reporter(void*) {
uint32_t last_report_ts = 0; // 0 = not initialized yet
esp_task_wdt_add(nullptr);
while (true) {
vTaskDelay(pdMS_TO_TICKS(10000)); // check every 10s
esp_task_wdt_reset();
uint32_t now = (uint32_t)(time(nullptr));
if (now < 1700000000UL) continue; // NTP not synced
// First valid timestamp — schedule boot report 60s from now
if (last_report_ts == 0) {
event_log_write(EVT_NTP_SYNC, (uint16_t)(millis() / 1000), 0);
last_report_ts = now - (REPORT_INTERVAL_S - BOOT_REPORT_DELAY_S);
continue;
}
@@ -108,7 +138,9 @@ static void task_reporter(void*) {
last_report_ts = now;
// Deinit BLE to free ~25KB heap for SSL handshakes
#if BLE_SCANNING_ENABLED
ble_scanner_deinit();
#endif
led_set(true); // on = uploading
CameraHourlyRecord cam_rec;
@@ -118,19 +150,41 @@ static void task_reporter(void*) {
xSemaphoreGive(s_cv_mutex);
} else {
// Failed to acquire — skip this cycle, will report next hour
#if BLE_SCANNING_ENABLED
ble_scanner_reinit();
#endif
led_set(false);
continue;
}
#if !BLE_SCANNING_ENABLED
BLEHourlyRecord ble_rec = {period_start, period_end, 0, 0};
#else
BLEHourlyRecord ble_rec = ble_scanner_collect(period_start, period_end);
#endif
reporter_submit_camera(g_cfg, cam_rec);
reporter_submit_ble(g_cfg, ble_rec);
reporter_heartbeat(g_cfg, millis() / 1000, WiFi.RSSI());
bool hb_ok = reporter_heartbeat(g_cfg, millis() / 1000, WiFi.RSSI());
#if BLE_SCANNING_ENABLED
ble_scanner_reinit();
#endif
led_set(false);
static uint8_t consecutive_misses = 0;
if (hb_ok) {
consecutive_misses = 0;
} else {
consecutive_misses++;
event_log_write(EVT_HEARTBEAT_MISS, consecutive_misses, 0);
Serial.printf("[WDG] heartbeat miss %u/6\n", consecutive_misses);
if (consecutive_misses >= 6) {
event_log_write(EVT_REBOOT, REBOOT_HEARTBEAT_MISS, 0);
delay(200); // let NVS commit before reboot
ESP.restart();
}
}
}
}
@@ -140,14 +194,47 @@ void setup() {
pinMode(BUTTON_PIN, INPUT_PULLUP);
led_set(true); // on = booting
// OTA rollback guard: if booted from a freshly-flashed OTA image while the
// bootloader has rollback enabled, the image is PENDING_VERIFY and will be
// rolled back on the next reboot unless we mark it valid. Harmless no-op
// when rollback is disabled. Always log the running partition + state so
// we can see post-OTA boot behavior on serial.
{
const esp_partition_t* running = esp_ota_get_running_partition();
esp_ota_img_states_t state = ESP_OTA_IMG_UNDEFINED;
if (running) {
esp_ota_get_state_partition(running, &state);
Serial.printf("[BOOT] running partition '%s' (off=0x%x) state=%d fw=%s\n",
running->label, (unsigned)running->address,
(int)state, FW_VERSION);
}
if (state == ESP_OTA_IMG_PENDING_VERIFY) {
esp_err_t e = esp_ota_mark_app_valid_cancel_rollback();
Serial.printf("[BOOT] esp_ota_mark_app_valid_cancel_rollback: %s\n",
esp_err_to_name(e));
}
}
event_log_init();
event_log_write(EVT_BOOT, (uint16_t)esp_reset_reason(), 0);
if (!config_load(g_cfg)) {
Serial.println("FATAL: device_id/location_id/hmac_secret not provisioned");
while (true) { delay(500); led_set(!digitalRead(LED_PIN)); } // fast blink
event_log_write(EVT_REBOOT, REBOOT_FATAL_CONFIG, 0);
// Blink fast for 3s so a physically-present operator can see it,
// then reboot so EVT_BOOT history on the next heartbeat surfaces
// the failure — though in this case the device can't heartbeat
// without config, so the real signal is the fast-blink-then-reboot
// cycle visible on the LED.
uint32_t t0 = millis();
while (millis() - t0 < 3000) { led_set(!digitalRead(LED_PIN)); delay(100); }
ESP.restart();
}
// Connect to WiFi
if (!config_has_wifi()) {
provisioning_run();
event_log_write(EVT_REBOOT, REBOOT_WIFI_REPROV, 0);
ESP.restart();
}
@@ -161,9 +248,16 @@ void setup() {
if (WiFi.status() != WL_CONNECTED) {
// Saved creds failed — re-provision
provisioning_run();
event_log_write(EVT_REBOOT, REBOOT_WIFI_REPROV, 0);
ESP.restart();
}
// Boot connect happens before net_guard registers its WiFi event handler,
// so the GOT_IP-driven DNS override there won't fire for this association.
// Pin DNS now; net_guard re-applies it on every subsequent reconnect.
net_guard_pin_dns();
net_guard_start(g_cfg);
led_set(false); // off = connected
// NTP sync (UTC)
@@ -173,38 +267,72 @@ void setup() {
if (!camera_init()) {
Serial.println("FATAL: camera init failed");
while (true) delay(1000);
event_log_write(EVT_REBOOT, REBOOT_FATAL_CAMERA, 0);
uint32_t t0 = millis();
while (millis() - t0 < 3000) { led_set(!digitalRead(LED_PIN)); delay(100); }
ESP.restart();
}
reporter_init();
#if BLE_SCANNING_ENABLED
ble_scanner_start();
#endif
// OTA update support
ArduinoOTA.setHostname(g_cfg.device_id.c_str());
#if !BLE_SCANNING_ENABLED
ArduinoOTA.onStart([]() { });
#else
ArduinoOTA.onStart([]() { ble_scanner_pause(); });
ArduinoOTA.onEnd([]() { ble_scanner_resume(); ESP.restart(); });
#endif
ArduinoOTA.onEnd([]() {
#if BLE_SCANNING_ENABLED
ble_scanner_resume();
#endif
event_log_write(EVT_REBOOT, REBOOT_OTA, 0);
ESP.restart();
});
#if !BLE_SCANNING_ENABLED
ArduinoOTA.onError([](ota_error_t e) { });
#else
ArduinoOTA.onError([](ota_error_t e) { ble_scanner_resume(); });
#endif
ArduinoOTA.begin();
s_cv_mutex = xSemaphoreCreateMutex();
// Task watchdog: 30s timeout, panic on trigger so we reboot and log
// via esp_reset_reason() in EVT_BOOT on the next boot.
esp_task_wdt_init(30, /*panic=*/true);
esp_task_wdt_add(nullptr); // subscribe the Arduino loopTask
xTaskCreatePinnedToCore(task_camera, "cam", 8192, nullptr, 2, nullptr, 1);
xTaskCreatePinnedToCore(task_reporter, "rep", 8192, nullptr, 1, nullptr, 0);
// static: ota_updater stores raw pointer; must outlive setup()
static String s_ota_base = String("http://") + REPORTER_API_HOST_NAME + ":" + REPORTER_API_PORT;
ota_updater_init(
s_ota_base.c_str(),
g_cfg.device_id.c_str(),
g_cfg.hmac_secret.c_str(),
g_cfg.ota_interval_s < 10 ? 10000UL : g_cfg.ota_interval_s * 1000UL
);
xTaskCreate(ota_task, "ota", 8192, nullptr, 1, nullptr);
}
void loop() {
esp_task_wdt_reset();
ArduinoOTA.handle();
check_factory_reset();
net_guard_tick();
if (WiFi.status() != WL_CONNECTED) {
led_set(true); // on = no WiFi
WiFi.reconnect();
delay(5000);
if (WiFi.status() == WL_CONNECTED) {
led_set(false);
reporter_flush(g_cfg);
}
static bool s_was_up = true;
bool up = net_guard_is_up();
if (up != s_was_up) {
led_set(!up); // LED on when NOT up
if (up) reporter_flush(g_cfg);
s_was_up = up;
}
delay(1000);
delay(200);
}

View File

@@ -1,12 +1,18 @@
// firmware/src/reporter.cpp
#include "reporter.h"
#include "hmac.h"
#include "event_log.h"
#include "net_guard.h"
#include <HTTPClient.h>
#include <ArduinoJson.h>
#include <WiFi.h>
#include <algorithm>
#include <vector>
#include <time.h>
#include <freertos/semphr.h>
#include <esp_task_wdt.h>
#include <esp_system.h>
#include <esp_heap_caps.h>
static std::vector<CameraHourlyRecord> s_cam_buf;
static std::vector<BLEHourlyRecord> s_ble_buf;
@@ -21,25 +27,127 @@ static uint32_t now_ts() {
return (uint32_t)time(nullptr);
}
static bool post_json(const DeviceConfig& cfg, const char* path, const String& body) {
// Last successfully resolved IP — used as a warm fallback if a subsequent
// resolution fails. Never takes precedence over a fresh successful resolve.
static IPAddress s_cached_api_ip;
// Resolve the API host. Tries hostByName first; on failure falls back to the
// last good resolution, then to the hardcoded fallback IP. Returns the IP via
// out-param and a label describing where it came from for logging.
static bool resolve_api_ip(IPAddress& out, const char*& source) {
IPAddress ip;
uint32_t r0 = millis();
bool ok = WiFi.hostByName(REPORTER_API_HOST_NAME, ip);
uint32_t elapsed = millis() - r0;
if (ok) {
s_cached_api_ip = ip;
out = ip;
source = "dns";
Serial.printf("[DNS] %s -> %s (%u ms)\n",
REPORTER_API_HOST_NAME, ip.toString().c_str(), (unsigned)elapsed);
return true;
}
Serial.printf("[DNS] %s -> FAIL (%u ms)\n",
REPORTER_API_HOST_NAME, (unsigned)elapsed);
net_guard_dump_dns("on-fail");
net_guard_pin_dns(); // re-assert in case something overwrote the table
if ((uint32_t)s_cached_api_ip != 0) {
out = s_cached_api_ip;
source = "cache";
return true;
}
if (out.fromString(REPORTER_API_FALLBACK_IP)) {
source = "fallback";
return true;
}
return false;
}
// Drains and parses the HTTP response status line. Returns the numeric status
// code, or -1 on read timeout / malformed response.
static int read_http_status(WiFiClient& client, uint32_t timeout_ms) {
uint32_t deadline = millis() + timeout_ms;
while (!client.available() && millis() < deadline) vTaskDelay(pdMS_TO_TICKS(10));
if (!client.available()) return -1;
String line = client.readStringUntil('\n');
line.trim();
// Format: "HTTP/1.1 200 OK"
int sp1 = line.indexOf(' ');
if (sp1 < 0) return -1;
int sp2 = line.indexOf(' ', sp1 + 1);
String code_str = (sp2 > 0) ? line.substring(sp1 + 1, sp2) : line.substring(sp1 + 1);
return code_str.toInt();
}
static bool post_json_once(const DeviceConfig& cfg, const char* path, const String& body) {
uint32_t ts = now_ts();
// Reject if NTP hasn't synced yet (timestamp would be near epoch 0)
if (ts < 1700000000UL) return false; // pre-2023 → clock not valid
if (ts < 1700000000UL) return false;
String sig = hmac_sign(cfg.hmac_secret, "POST", path, ts, body);
if (sig.isEmpty()) return false; // HMAC failed
if (sig.isEmpty()) return false;
HTTPClient http;
String url = String(REPORTER_API_HOST) + path;
http.begin(url);
http.addHeader("Content-Type", "application/json");
http.addHeader("X-Device-Id", cfg.device_id);
http.addHeader("X-Timestamp", String(ts));
http.addHeader("X-Signature", sig);
IPAddress ip;
const char* ip_source = "?";
if (!resolve_api_ip(ip, ip_source)) {
Serial.printf("[HTTP] POST %s -> resolve-fail\n", path);
event_log_write(EVT_HTTP_FAIL, event_log_path_hash(path), (uint16_t)-1);
return false;
}
int code = http.POST(body);
http.end();
Serial.printf("[HTTP] POST %s → %d\n", url.c_str(), code);
return (code == 200);
uint32_t t0 = millis();
WiFiClient client;
client.setTimeout(10); // seconds — read timeout
if (!client.connect(ip, REPORTER_API_PORT, 5000 /*ms connect timeout*/)) {
uint32_t elapsed = millis() - t0;
Serial.printf("[HTTP] connect %s:%u (%s) -> failed (%u ms)\n",
ip.toString().c_str(), REPORTER_API_PORT, ip_source, (unsigned)elapsed);
event_log_write(EVT_HTTP_FAIL, event_log_path_hash(path), (uint16_t)-1);
return false;
}
// Manual HTTP/1.1 — gives us full control over the Host header so the
// server's vhost routing works even when we connect by IP.
client.printf("POST %s HTTP/1.1\r\n", path);
client.printf("Host: %s\r\n", REPORTER_API_HOST_NAME);
client.print ("Connection: close\r\n");
client.print ("Content-Type: application/json\r\n");
client.printf("Content-Length: %u\r\n", (unsigned)body.length());
client.printf("X-Device-Id: %s\r\n", cfg.device_id.c_str());
client.printf("X-Timestamp: %u\r\n", (unsigned)ts);
client.printf("X-Signature: %s\r\n", sig.c_str());
client.print ("\r\n");
client.print(body);
int code = read_http_status(client, 10000);
// Drain so the server can close cleanly.
while (client.connected() && client.available()) client.read();
client.stop();
uint32_t elapsed = millis() - t0;
uint16_t phash = event_log_path_hash(path);
Serial.printf("[HTTP] POST %s%s (%s %s) -> %d (%u ms)\n",
REPORTER_API_HOST_NAME, path, ip_source, ip.toString().c_str(),
code, (unsigned)elapsed);
if (code == 200) {
event_log_write(EVT_HTTP_OK, phash, (uint16_t)((elapsed > 65535) ? 65535 : elapsed));
return true;
}
event_log_write(EVT_HTTP_FAIL, phash, (uint16_t)code);
return false;
}
static bool post_json(const DeviceConfig& cfg, const char* path, const String& body) {
// 3 attempts. Worst case per call: 3 × (5s connect + 10s response) + 0 + 2 + 5 = 52s.
// TWDT is fed before the backoff delay and before each attempt so the 30s
// timeout doesn't fire mid-sequence.
static const uint16_t DELAYS_MS[] = { 0, 2000, 5000 };
for (int i = 0; i < 3; i++) {
esp_task_wdt_reset();
if (DELAYS_MS[i]) vTaskDelay(pdMS_TO_TICKS(DELAYS_MS[i]));
esp_task_wdt_reset();
if (post_json_once(cfg, path, body)) return true;
}
return false;
}
static String build_camera_batch(const DeviceConfig& cfg,
@@ -147,16 +255,36 @@ void reporter_submit_ble(const DeviceConfig& cfg, const BLEHourlyRecord& rec) {
}
}
void reporter_heartbeat(const DeviceConfig& cfg, uint32_t uptime_s, int wifi_rssi) {
bool reporter_heartbeat(const DeviceConfig& cfg, uint32_t uptime_s, int wifi_rssi) {
JsonDocument doc;
doc["device_id"] = cfg.device_id;
doc["firmware_version"] = "1.0.0";
doc["firmware_version"] = "1.1.0";
doc["free_storage_pct"] = 100;
doc["wifi_rssi"] = wifi_rssi;
doc["pending_records"] = (int)(s_cam_buf.size() + s_ble_buf.size());
doc["uptime_seconds"] = uptime_s;
// Diagnostics (new in 1.1.0)
doc["reset_reason"] = (int)esp_reset_reason();
doc["heap_free"] = (int)esp_get_free_heap_size();
doc["heap_min_free"] = (int)esp_get_minimum_free_heap_size();
doc["last_disconnect_code"] = (int)net_guard_last_disconnect_reason();
// Last 8 event-log entries, newest first
EventLogEntry recent[8];
size_t n = event_log_read_recent(recent, 8);
JsonArray evs = doc["recent_events"].to<JsonArray>();
for (size_t i = 0; i < n; i++) {
JsonObject e = evs.add<JsonObject>();
e["t"] = recent[i].tag;
e["d0"] = recent[i].data0;
e["d1"] = recent[i].data1;
e["ts"] = recent[i].ts_unix;
e["up"] = recent[i].uptime_s;
}
String body; serializeJson(doc, body);
post_json(cfg, "/api/v1/heartbeat", body);
return post_json(cfg, "/api/v1/heartbeat", body);
}
void reporter_flush(const DeviceConfig& cfg) {
@@ -169,7 +297,10 @@ void reporter_flush(const DeviceConfig& cfg) {
String body = build_camera_batch(cfg, cam_snap);
if (post_json(cfg, "/api/v1/camera/events/batch", body)) {
xSemaphoreTake(s_buf_mutex, portMAX_DELAY);
s_cam_buf.clear();
// Erase only the prefix we snapshotted; FIFO append from
// submit_camera during the in-flight POST stays buffered.
size_t n = std::min(cam_snap.size(), s_cam_buf.size());
s_cam_buf.erase(s_cam_buf.begin(), s_cam_buf.begin() + n);
xSemaphoreGive(s_buf_mutex);
}
}
@@ -177,7 +308,8 @@ void reporter_flush(const DeviceConfig& cfg) {
String body = build_ble_batch(cfg, ble_snap);
if (post_json(cfg, "/api/v1/events/batch", body)) {
xSemaphoreTake(s_buf_mutex, portMAX_DELAY);
s_ble_buf.clear();
size_t n = std::min(ble_snap.size(), s_ble_buf.size());
s_ble_buf.erase(s_ble_buf.begin(), s_ble_buf.begin() + n);
xSemaphoreGive(s_buf_mutex);
}
}

View File

@@ -11,11 +11,16 @@ struct CameraHourlyRecord {
int exits;
};
static const int REPORTER_MAX_BUFFER = 24;
static const char* REPORTER_API_HOST = "http://logs.research.bike";
static const int REPORTER_MAX_BUFFER = 24;
static const char* REPORTER_API_HOST_NAME = "logs.research.bike";
static const uint16_t REPORTER_API_PORT = 80;
// Hardcoded fallback used when DNS fails (some customer networks intercept
// :53 with a transparent proxy that mangles responses). Update if the
// server's IP changes — but a successful hostByName() always wins over this.
static const char* REPORTER_API_FALLBACK_IP = "5.78.114.131";
void reporter_init();
void reporter_submit_camera(const DeviceConfig& cfg, const CameraHourlyRecord& rec);
void reporter_submit_ble(const DeviceConfig& cfg, const BLEHourlyRecord& rec);
void reporter_heartbeat(const DeviceConfig& cfg, uint32_t uptime_s, int wifi_rssi);
bool reporter_heartbeat(const DeviceConfig& cfg, uint32_t uptime_s, int wifi_rssi);
void reporter_flush(const DeviceConfig& cfg);

View File

@@ -0,0 +1,141 @@
// firmware/test/test_native/test_event_log.cpp
#include <unity.h>
#include <string.h>
#include "event_log.h"
// --- Native NVS stub (declared in event_log.cpp for native builds) ---
extern "C" void event_log_test_reset();
void setUp() { event_log_test_reset(); }
void tearDown() {}
void test_entry_is_32_bytes() {
TEST_ASSERT_EQUAL(32, sizeof(EventLogEntry));
}
void test_path_hash_is_stable_and_differs() {
uint16_t a = event_log_path_hash("/api/v1/heartbeat");
uint16_t b = event_log_path_hash("/api/v1/heartbeat");
uint16_t c = event_log_path_hash("/api/v1/camera/events/batch");
TEST_ASSERT_EQUAL(a, b);
TEST_ASSERT_NOT_EQUAL(a, c);
}
void test_write_then_read_recent_returns_newest_first() {
event_log_init();
event_log_write(EVT_BOOT, 1, 0);
event_log_write(EVT_WIFI_UP, 2, 0);
event_log_write(EVT_HTTP_FAIL, 3, 500);
EventLogEntry buf[8];
size_t n = event_log_read_recent(buf, 8);
TEST_ASSERT_EQUAL(3, n);
TEST_ASSERT_EQUAL(EVT_HTTP_FAIL, buf[0].tag);
TEST_ASSERT_EQUAL(500, buf[0].data1);
TEST_ASSERT_EQUAL(EVT_WIFI_UP, buf[1].tag);
TEST_ASSERT_EQUAL(EVT_BOOT, buf[2].tag);
}
void test_ring_buffer_wraps_after_32_entries() {
event_log_init();
for (int i = 0; i < 40; i++) event_log_write(EVT_HTTP_OK, (uint16_t)i, 0);
EventLogEntry buf[32];
size_t n = event_log_read_recent(buf, 32);
TEST_ASSERT_EQUAL(32, n);
// Newest first: data0 should be 39, 38, 37, ... down to 8
TEST_ASSERT_EQUAL(39, buf[0].data0);
TEST_ASSERT_EQUAL(8, buf[31].data0);
}
void test_empty_log_read_returns_zero() {
event_log_init();
EventLogEntry buf[8];
size_t n = event_log_read_recent(buf, 8);
TEST_ASSERT_EQUAL(0, n);
}
void test_read_recent_truncates_to_max_entries() {
event_log_init();
for (int i = 0; i < 10; i++) event_log_write(EVT_HTTP_OK, (uint16_t)i, 0);
EventLogEntry buf[3];
size_t n = event_log_read_recent(buf, 3);
TEST_ASSERT_EQUAL(3, n);
// Newest 3: data0 == 9, 8, 7
TEST_ASSERT_EQUAL(9, buf[0].data0);
TEST_ASSERT_EQUAL(8, buf[1].data0);
TEST_ASSERT_EQUAL(7, buf[2].data0);
}
void test_path_hash_distinguishes_real_api_paths() {
uint16_t h1 = event_log_path_hash("/api/v1/heartbeat");
uint16_t h2 = event_log_path_hash("/api/v1/camera/events/batch");
uint16_t h3 = event_log_path_hash("/api/v1/events/batch");
TEST_ASSERT_NOT_EQUAL(h1, h2);
TEST_ASSERT_NOT_EQUAL(h1, h3);
TEST_ASSERT_NOT_EQUAL(h2, h3);
}
extern "C" void event_log_test_simulate_reboot();
void test_boot_recovery_after_partial_fill() {
// Phase 1: write 5 entries before "reboot"
event_log_init();
for (uint16_t i = 0; i < 5; i++) event_log_write(EVT_HTTP_OK, i, 0);
// Phase 2: simulate reboot (clear RAM state, keep slots), re-init, verify
event_log_test_simulate_reboot();
event_log_init();
// All 5 original entries should still be readable, newest first
EventLogEntry buf[8];
size_t n = event_log_read_recent(buf, 8);
TEST_ASSERT_EQUAL(5, n);
TEST_ASSERT_EQUAL(4, buf[0].data0); // newest
TEST_ASSERT_EQUAL(0, buf[4].data0); // oldest
// Phase 3: write one more — seq must continue (not restart at 0),
// so the new entry is the newest and slot index 5 holds it
event_log_write(EVT_HTTP_OK, 99, 0);
n = event_log_read_recent(buf, 8);
TEST_ASSERT_EQUAL(6, n);
TEST_ASSERT_EQUAL(99, buf[0].data0);
TEST_ASSERT_EQUAL(4, buf[1].data0);
}
void test_boot_recovery_after_wrap() {
// Phase 1: write 40 entries (wraps the 32-slot ring once; oldest 8 dropped)
event_log_init();
for (uint16_t i = 0; i < 40; i++) event_log_write(EVT_HTTP_OK, i, 0);
// Phase 2: simulate reboot, re-init
event_log_test_simulate_reboot();
event_log_init();
// Still 32 entries visible, newest=39, oldest=8
EventLogEntry buf[32];
size_t n = event_log_read_recent(buf, 32);
TEST_ASSERT_EQUAL(32, n);
TEST_ASSERT_EQUAL(39, buf[0].data0);
TEST_ASSERT_EQUAL(8, buf[31].data0);
// Phase 3: one more write — newest becomes 100, head advances past
// wherever the max-seq slot was, oldest drops to data0=9
event_log_write(EVT_HTTP_OK, 100, 0);
n = event_log_read_recent(buf, 32);
TEST_ASSERT_EQUAL(32, n);
TEST_ASSERT_EQUAL(100, buf[0].data0);
TEST_ASSERT_EQUAL(9, buf[31].data0);
}
int main() {
UNITY_BEGIN();
RUN_TEST(test_entry_is_32_bytes);
RUN_TEST(test_path_hash_is_stable_and_differs);
RUN_TEST(test_write_then_read_recent_returns_newest_first);
RUN_TEST(test_ring_buffer_wraps_after_32_entries);
RUN_TEST(test_empty_log_read_returns_zero);
RUN_TEST(test_read_recent_truncates_to_max_entries);
RUN_TEST(test_path_hash_distinguishes_real_api_paths);
RUN_TEST(test_boot_recovery_after_partial_fill);
RUN_TEST(test_boot_recovery_after_wrap);
return UNITY_END();
}

View File

@@ -0,0 +1,32 @@
// firmware/test/test_net_guard/test_net_guard.cpp
#include <unity.h>
#include "net_guard.h"
void setUp() {}
void tearDown() {}
void test_backoff_starts_at_one_second() {
TEST_ASSERT_EQUAL(1000, net_guard_next_backoff_ms(0));
}
void test_backoff_doubles_each_attempt() {
TEST_ASSERT_EQUAL(2000, net_guard_next_backoff_ms(1));
TEST_ASSERT_EQUAL(4000, net_guard_next_backoff_ms(2));
TEST_ASSERT_EQUAL(8000, net_guard_next_backoff_ms(3));
TEST_ASSERT_EQUAL(16000, net_guard_next_backoff_ms(4));
TEST_ASSERT_EQUAL(32000, net_guard_next_backoff_ms(5));
}
void test_backoff_clamps_at_60s() {
TEST_ASSERT_EQUAL(60000, net_guard_next_backoff_ms(6));
TEST_ASSERT_EQUAL(60000, net_guard_next_backoff_ms(7));
TEST_ASSERT_EQUAL(60000, net_guard_next_backoff_ms(100));
}
int main() {
UNITY_BEGIN();
RUN_TEST(test_backoff_starts_at_one_second);
RUN_TEST(test_backoff_doubles_each_attempt);
RUN_TEST(test_backoff_clamps_at_60s);
return UNITY_END();
}

View File

@@ -0,0 +1,44 @@
// firmware/test/test_ota/test_version.cpp
#include <unity.h>
// Pull in the function under test — include .cpp directly for native builds
// so we don't need a separate compilation unit per test.
#define NATIVE_TEST
#include "../../lib/ota_updater/ota_updater.cpp"
void setUp() {}
void tearDown() {}
void test_remote_newer_patch() {
TEST_ASSERT_TRUE(ota_version_newer("1.0.0", "1.0.1"));
}
void test_remote_newer_minor() {
TEST_ASSERT_TRUE(ota_version_newer("1.0.9", "1.1.0"));
}
void test_remote_newer_major() {
TEST_ASSERT_TRUE(ota_version_newer("0.9.9", "1.0.0"));
}
void test_same_version() {
TEST_ASSERT_FALSE(ota_version_newer("1.2.3", "1.2.3"));
}
void test_remote_older() {
TEST_ASSERT_FALSE(ota_version_newer("1.2.3", "1.2.2"));
}
void test_malformed_current() {
TEST_ASSERT_FALSE(ota_version_newer("bad", "1.0.0"));
}
void test_malformed_remote() {
TEST_ASSERT_FALSE(ota_version_newer("1.0.0", "bad"));
}
int main() {
UNITY_BEGIN();
RUN_TEST(test_remote_newer_patch);
RUN_TEST(test_remote_newer_minor);
RUN_TEST(test_remote_newer_major);
RUN_TEST(test_same_version);
RUN_TEST(test_remote_older);
RUN_TEST(test_malformed_current);
RUN_TEST(test_malformed_remote);
return UNITY_END();
}

View File

@@ -0,0 +1,62 @@
// firmware/test/test_ota_sig/test_sig_verify.cpp
#include <unity.h>
#include <string.h>
#define NATIVE_TEST
#include "../../lib/ota_updater/ota_updater.cpp"
// ── Test vectors generated by Python/cryptography (ECDSA P-256) ────────────
static const uint8_t TEST_PUBKEY[65] = {
0x04, 0x96, 0x18, 0x6c, 0x8b, 0xb2, 0xdf, 0xea, 0x3f, 0xe4, 0x75, 0x35, 0x0e, 0x8a, 0x3e, 0x7d,
0x49, 0x7f, 0x56, 0xb5, 0xb4, 0x1a, 0xae, 0x05, 0xa3, 0x10, 0x6f, 0x02, 0x43, 0x84, 0xb3, 0x1c,
0x1f, 0x44, 0xef, 0x08, 0x84, 0x57, 0xca, 0x6e, 0xd8, 0x19, 0x74, 0x10, 0x8d, 0x95, 0xcc, 0x8c,
0x61, 0x89, 0x56, 0xea, 0xbc, 0x0c, 0xa2, 0x54, 0xd7, 0x02, 0xf3, 0x1d, 0x67, 0x7c, 0xa5, 0xba,
0x42
};
static const uint8_t TEST_HASH[32] = {
0x0a, 0x7e, 0x5f, 0x6a, 0x4c, 0x72, 0x11, 0xb7, 0x14, 0x3f, 0x85, 0x59, 0x50, 0x61, 0x8a, 0xa1,
0xab, 0xee, 0x7b, 0x57, 0x08, 0x59, 0x56, 0x09, 0x6d, 0x18, 0xaf, 0x70, 0xe6, 0x6e, 0x6c, 0xa8
};
static const uint8_t TEST_SIG[64] = {
0x4f, 0xff, 0xc3, 0xc6, 0xd5, 0x04, 0x71, 0x37, 0x87, 0x8c, 0xe1, 0xe5, 0x79, 0xef, 0x59, 0x2a,
0x63, 0xde, 0xf6, 0x96, 0x3e, 0x8f, 0x90, 0x2f, 0x46, 0x1f, 0x1b, 0x8a, 0xd5, 0x94, 0xb8, 0x28,
0x80, 0xfa, 0xe4, 0x26, 0x14, 0xbf, 0x91, 0x54, 0xbf, 0xa6, 0x2f, 0x67, 0xf9, 0x97, 0x45, 0x3a,
0x0f, 0xdc, 0x66, 0xcd, 0x21, 0xb8, 0x91, 0xdb, 0xb9, 0xaa, 0x6b, 0x5d, 0x6c, 0xa5, 0xcb, 0x96
};
// ──────────────────────────────────────────────────────────────────────────
void setUp() {}
void tearDown() {}
void test_valid_signature_accepted() {
TEST_ASSERT_TRUE(ota_verify_signature_with_key(TEST_HASH, TEST_SIG, TEST_PUBKEY));
}
void test_corrupted_hash_rejected() {
uint8_t bad_hash[32];
memcpy(bad_hash, TEST_HASH, 32);
bad_hash[0] ^= 0xff;
TEST_ASSERT_FALSE(ota_verify_signature_with_key(bad_hash, TEST_SIG, TEST_PUBKEY));
}
void test_corrupted_signature_rejected() {
uint8_t bad_sig[64];
memcpy(bad_sig, TEST_SIG, 64);
bad_sig[0] ^= 0xff;
TEST_ASSERT_FALSE(ota_verify_signature_with_key(TEST_HASH, bad_sig, TEST_PUBKEY));
}
void test_zero_signature_rejected() {
uint8_t zero_sig[64] = {};
TEST_ASSERT_FALSE(ota_verify_signature_with_key(TEST_HASH, zero_sig, TEST_PUBKEY));
}
int main() {
UNITY_BEGIN();
RUN_TEST(test_valid_signature_accepted);
RUN_TEST(test_corrupted_hash_rejected);
RUN_TEST(test_corrupted_signature_rejected);
RUN_TEST(test_zero_signature_rejected);
return UNITY_END();
}

View File

@@ -11,7 +11,7 @@ import sqlite3
from typing import List
from fastapi import Depends
from pydantic import BaseModel, Field
from pydantic import BaseModel, Field, model_validator
class CameraRecord(BaseModel):
@@ -20,6 +20,12 @@ class CameraRecord(BaseModel):
entries: int = Field(ge=0)
exits: int = Field(ge=0)
@model_validator(mode="after")
def _period_order(self):
if self.period_end <= self.period_start:
raise ValueError("period_end must be strictly greater than period_start")
return self
class CameraEventsRequest(BaseModel):
location_id: str

View File

@@ -0,0 +1,129 @@
# server/heartbeat_diagnostics_stub.py
# Add these models and the persistence helper to the server's main.py alongside
# the existing heartbeat endpoint (POST /api/v1/heartbeat).
# Requires: diagnostic columns on the heartbeats table (see migrations/005_heartbeat_diagnostics.sql)
#
# Firmware v1.1.0 extends the heartbeat payload with five optional diagnostic
# fields. v1.0.0-shape payloads (without these fields) must continue to parse
# cleanly — every new field is Optional and defaults to None.
#
# IMPORTANT: Adjust the table name in store_heartbeat_diagnostics to match the
# real server's schema if it differs from "heartbeats".
import json
import sqlite3
from typing import List, Optional
from pydantic import BaseModel
class RecentEvent(BaseModel):
t: int # EventLogTag (see EVENT_TAG_DECODER)
d0: int # tag-specific datum 0
d1: int # tag-specific datum 1
ts: int # unix timestamp (seconds)
up: int # seconds since boot when event was logged
# Extend the existing HeartbeatRequest model in main.py by adding these five
# optional fields. The rest of the heartbeat model (device_id, uptime, etc.)
# stays as-is. Shown here as a standalone model for reference/testing.
class HeartbeatDiagnosticsFields(BaseModel):
reset_reason: Optional[int] = None
heap_free: Optional[int] = None
heap_min_free: Optional[int] = None
last_disconnect_code: Optional[int] = None
recent_events: Optional[List[RecentEvent]] = None
# Example of the fully-extended heartbeat request model (merge into the
# existing HeartbeatRequest in main.py rather than introducing a second class):
class HeartbeatRequestWithDiagnostics(BaseModel):
device_id: str
uptime: int
# ... existing fields from the v1.0.0 heartbeat model go here ...
# New v1.1.0 diagnostic fields:
reset_reason: Optional[int] = None
heap_free: Optional[int] = None
heap_min_free: Optional[int] = None
last_disconnect_code: Optional[int] = None
recent_events: Optional[List[RecentEvent]] = None
# Call this inside the existing receive_heartbeat handler after the base
# heartbeat row has been inserted/updated. It persists the diagnostic fields
# on the same row keyed by device_id.
def store_heartbeat_diagnostics(
db: sqlite3.Connection,
device_id: str,
hb: HeartbeatRequestWithDiagnostics,
) -> None:
"""Persist the v1.1.0 diagnostic fields onto the heartbeats row for device_id.
recent_events is JSON-serialized into a TEXT column for flexibility;
the other four fields are stored as INTEGERs. All fields are nullable
and left untouched when the payload omits them (v1.0.0 compatibility).
"""
recent_events_json = (
json.dumps([ev.model_dump() for ev in hb.recent_events])
if hb.recent_events is not None
else None
)
cursor = db.cursor()
# COALESCE preserves existing column values when the v1.0.0 payload omits
# diagnostic fields (Pydantic resolves them to None).
cursor.execute(
"""UPDATE heartbeats
SET reset_reason = COALESCE(?, reset_reason),
heap_free = COALESCE(?, heap_free),
heap_min_free = COALESCE(?, heap_min_free),
last_disconnect_code = COALESCE(?, last_disconnect_code),
recent_events = COALESCE(?, recent_events)
WHERE device_id = ?""",
(
hb.reset_reason,
hb.heap_free,
hb.heap_min_free,
hb.last_disconnect_code,
recent_events_json,
device_id,
),
)
db.commit()
# ---------------------------------------------------------------------------
# Decoders — use these in dashboards / alerting to label the integer tags the
# firmware emits. Keep in sync with firmware/include/event_log.h.
# ---------------------------------------------------------------------------
# EventLogTag values (RecentEvent.t) -> human name.
# Per-tag interpretation of d0/d1:
# EVT_BOOT d0=esp_reset_reason()
# EVT_WIFI_UP d0=RSSI (int16 cast to uint16)
# EVT_WIFI_DOWN d0=disconnect reason (0xFF = silent-death)
# EVT_HTTP_OK d0=path_hash, d1=elapsed_ms
# EVT_HTTP_FAIL d0=path_hash, d1=http_status_or_errno
# EVT_HEARTBEAT_MISS d0=consecutive_count
# EVT_NTP_SYNC d0=seconds_since_boot (reserved, not emitted)
# EVT_REBOOT d0=RebootReason (see REBOOT_REASON_DECODER)
EVENT_TAG_DECODER = {
1: "EVT_BOOT",
2: "EVT_WIFI_UP",
3: "EVT_WIFI_DOWN",
4: "EVT_HTTP_OK",
5: "EVT_HTTP_FAIL",
6: "EVT_HEARTBEAT_MISS",
7: "EVT_NTP_SYNC",
8: "EVT_REBOOT",
}
# EVT_REBOOT.d0 values -> human name. Firmware-initiated reboot reasons.
REBOOT_REASON_DECODER = {
1: "HEARTBEAT_MISS",
2: "FACTORY_RESET",
3: "OTA",
4: "WIFI_REPROV",
5: "FATAL_CONFIG",
6: "FATAL_CAMERA",
}

View File

@@ -0,0 +1,14 @@
-- migrations/005_heartbeat_diagnostics.sql
-- Add v1.1.0 diagnostic columns to the existing heartbeats table.
-- Adjust the table name ("heartbeats") to match the real server's schema.
-- Apply: sqlite3 <db_file> < migrations/005_heartbeat_diagnostics.sql
--
-- sqlite's ALTER TABLE ADD COLUMN only takes one column per statement, so
-- each field is added separately. All columns are nullable, so firmware
-- v1.0.0 payloads (which omit these fields) remain accepted unchanged.
ALTER TABLE heartbeats ADD COLUMN reset_reason INTEGER;
ALTER TABLE heartbeats ADD COLUMN heap_free INTEGER;
ALTER TABLE heartbeats ADD COLUMN heap_min_free INTEGER;
ALTER TABLE heartbeats ADD COLUMN last_disconnect_code INTEGER;
ALTER TABLE heartbeats ADD COLUMN recent_events TEXT; -- JSON-serialized list of {t,d0,d1,ts,up}

120
server/ota_endpoint.py Normal file
View File

@@ -0,0 +1,120 @@
# server/ota_endpoint.py
"""
OTA firmware update endpoints.
Deployment workflow:
1. Generate signing key (one-time):
python tools/gen_signing_key.py
→ secrets/firmware_signing_key.pem (keep offline)
→ firmware/lib/ota_updater/ota_pubkey.h (commit this)
2. Build and deploy a new firmware version:
pio run -e timercam # build
python tools/deploy_firmware.py \\
firmware/.pio/build/timercam/firmware.bin 1.2.3
→ server/firmware/ updated with current.bin, current.sig, manifest.json
3. Bump FW_VERSION in firmware/include/version.h before each release.
4. Register in server main app:
from server.ota_endpoint import router as ota_router
app.include_router(ota_router)
Also uncomment Depends(verify_device_hmac) on both route handlers
and confirm the HMAC format matches hmac.cpp:
method + "\\n" + path + "\\n" + timestamp + "\\n" + sha256_hex(body)
Note: HMAC auth is currently commented out on route handlers — must be wired
before production use. verify_device_hmac must use the same format as hmac.cpp.
"""
import base64
import json
from pathlib import Path
from fastapi import APIRouter
from fastapi.responses import FileResponse
FIRMWARE_DIR = Path(__file__).parent / "firmware"
router = APIRouter(prefix="/ota", tags=["ota"])
class FirmwareNotFoundError(Exception):
pass
def _parse_version(v: str) -> tuple:
"""Parse semver string to comparable tuple; returns (0,0,0) on malformed input."""
try:
parts = v.strip().split(".")
if len(parts) != 3:
return (0, 0, 0)
return tuple(int(x) for x in parts)
except (ValueError, AttributeError):
return (0, 0, 0)
def ota_check_impl(current_version: str, firmware_dir: Path = FIRMWARE_DIR) -> dict:
"""
Compare device's current_version against staged manifest.
Returns {"update": False} when no update is available or manifest is missing.
Returns full update payload when server version is strictly newer.
"""
manifest_path = firmware_dir / "manifest.json"
if not manifest_path.exists():
return {"update": False}
try:
manifest = json.loads(manifest_path.read_text())
version = manifest["version"]
size = manifest["size"]
sha256 = manifest["sha256"]
except (json.JSONDecodeError, KeyError):
return {"update": False}
if _parse_version(version) <= _parse_version(current_version):
return {"update": False}
sig_path = firmware_dir / "current.sig"
if not sig_path.exists():
return {"update": False}
sig_b64 = base64.b64encode(sig_path.read_bytes()).decode()
return {
"update": True,
"version": version,
"size": size,
"sha256": sha256,
"sig_b64": sig_b64,
}
def ota_firmware_impl(firmware_dir: Path = FIRMWARE_DIR) -> bytes:
"""
Return raw firmware binary bytes.
Raises FirmwareNotFoundError if current.bin is absent.
"""
bin_path = firmware_dir / "current.bin"
if not bin_path.exists():
raise FirmwareNotFoundError("No firmware staged")
return bin_path.read_bytes()
@router.get("/check")
async def ota_check(
version: str,
# device_id: str = Depends(verify_device_hmac), # uncomment when wiring into app
):
"""Check whether a firmware update is available for the given device version."""
return ota_check_impl(current_version=version)
@router.get("/firmware")
async def ota_firmware(
# device_id: str = Depends(verify_device_hmac), # uncomment when wiring into app
):
"""Stream the staged firmware binary to the device."""
from fastapi import HTTPException
bin_path = FIRMWARE_DIR / "current.bin"
if not bin_path.exists():
raise HTTPException(status_code=404, detail="No firmware available")
return FileResponse(bin_path, media_type="application/octet-stream")

View File

@@ -98,3 +98,15 @@ def test_negative_counts_rejected():
with pytest.raises(ValidationError):
CameraRecord(period_start=1712000000, period_end=1712003600,
entries=-1, exits=0)
def test_inverted_period_rejected():
"""Pydantic should reject period_end <= period_start."""
from pydantic import ValidationError
from server.camera_endpoint import CameraRecord
with pytest.raises(ValidationError):
CameraRecord(period_start=1712003600, period_end=1712003600,
entries=0, exits=0)
with pytest.raises(ValidationError):
CameraRecord(period_start=1712003600, period_end=1712000000,
entries=0, exits=0)

View File

@@ -0,0 +1,156 @@
# server/test_heartbeat_diagnostics_stub.py
# Template tests for the heartbeat diagnostic-fields extension.
# Adapt imports and fixtures to match the actual server's test structure.
#
# To run against the actual server (once integrated):
# pytest server/test_heartbeat_diagnostics_stub.py -v
import json
import sqlite3
def _make_db() -> sqlite3.Connection:
"""In-memory sqlite fixture matching migrations/005_heartbeat_diagnostics.sql
applied on top of a minimal heartbeats table."""
db = sqlite3.connect(":memory:")
db.execute("""
CREATE TABLE heartbeats (
device_id TEXT PRIMARY KEY,
uptime INTEGER,
reset_reason INTEGER,
heap_free INTEGER,
heap_min_free INTEGER,
last_disconnect_code INTEGER,
recent_events TEXT
)
""")
db.commit()
return db
def _v10_payload() -> dict:
"""Firmware v1.0.0-shape heartbeat: no diagnostic fields."""
return {"device_id": "dc-test-01", "uptime": 12345}
def _v11_payload() -> dict:
"""Firmware v1.1.0-shape heartbeat: includes all five diagnostic fields."""
return {
"device_id": "dc-test-01",
"uptime": 12345,
"reset_reason": 1,
"heap_free": 123456,
"heap_min_free": 100000,
"last_disconnect_code": 201,
"recent_events": [
{"t": 1, "d0": 1, "d1": 0, "ts": 1712000000, "up": 0},
{"t": 3, "d0": 255, "d1": 0, "ts": 1712000050, "up": 50},
],
}
def test_v10_shape_parses_with_new_fields_none():
"""A v1.0.0 heartbeat (no diagnostic fields) must parse cleanly; all new
fields default to None."""
from server.heartbeat_diagnostics_stub import HeartbeatRequestWithDiagnostics
hb = HeartbeatRequestWithDiagnostics(**_v10_payload())
assert hb.device_id == "dc-test-01"
assert hb.uptime == 12345
assert hb.reset_reason is None
assert hb.heap_free is None
assert hb.heap_min_free is None
assert hb.last_disconnect_code is None
assert hb.recent_events is None
def test_v11_shape_populates_new_fields():
"""A v1.1.0 heartbeat populates each diagnostic field and the event list."""
from server.heartbeat_diagnostics_stub import HeartbeatRequestWithDiagnostics
hb = HeartbeatRequestWithDiagnostics(**_v11_payload())
assert hb.reset_reason == 1
assert hb.heap_free == 123456
assert hb.heap_min_free == 100000
assert hb.last_disconnect_code == 201
assert hb.recent_events is not None
assert len(hb.recent_events) == 2
assert hb.recent_events[0].t == 1
assert hb.recent_events[1].t == 3
assert hb.recent_events[1].d0 == 255 # 0xFF silent-death marker
assert hb.recent_events[1].ts == 1712000050
def test_store_heartbeat_diagnostics_writes_fields_and_json():
"""store_heartbeat_diagnostics must JSON-serialize recent_events and write
each integer field as submitted."""
from server.heartbeat_diagnostics_stub import (
HeartbeatRequestWithDiagnostics,
store_heartbeat_diagnostics,
)
db = _make_db()
# Seed the heartbeats row the base handler would have inserted first.
db.execute(
"INSERT INTO heartbeats (device_id, uptime) VALUES (?, ?)",
("dc-test-01", 12345),
)
db.commit()
hb = HeartbeatRequestWithDiagnostics(**_v11_payload())
store_heartbeat_diagnostics(db, "dc-test-01", hb)
row = db.execute(
"""SELECT reset_reason, heap_free, heap_min_free,
last_disconnect_code, recent_events
FROM heartbeats
WHERE device_id = ?""",
("dc-test-01",),
).fetchone()
assert row[0] == 1
assert row[1] == 123456
assert row[2] == 100000
assert row[3] == 201
events = json.loads(row[4])
assert isinstance(events, list)
assert len(events) == 2
assert events[0] == {"t": 1, "d0": 1, "d1": 0, "ts": 1712000000, "up": 0}
assert events[1]["d0"] == 255
def test_store_heartbeat_diagnostics_v10_leaves_fields_null():
"""v1.0.0 payload: all diagnostic columns should remain NULL after store."""
from server.heartbeat_diagnostics_stub import (
HeartbeatRequestWithDiagnostics,
store_heartbeat_diagnostics,
)
db = _make_db()
db.execute(
"INSERT INTO heartbeats (device_id, uptime) VALUES (?, ?)",
("dc-test-01", 12345),
)
db.commit()
hb = HeartbeatRequestWithDiagnostics(**_v10_payload())
store_heartbeat_diagnostics(db, "dc-test-01", hb)
row = db.execute(
"""SELECT reset_reason, heap_free, heap_min_free,
last_disconnect_code, recent_events
FROM heartbeats
WHERE device_id = ?""",
("dc-test-01",),
).fetchone()
assert row == (None, None, None, None, None)
def test_event_tag_decoder_labels():
"""Sanity check: decoder maps firmware tag values to the expected names."""
from server.heartbeat_diagnostics_stub import EVENT_TAG_DECODER, REBOOT_REASON_DECODER
assert EVENT_TAG_DECODER[1] == "EVT_BOOT"
assert EVENT_TAG_DECODER[3] == "EVT_WIFI_DOWN"
assert EVENT_TAG_DECODER[8] == "EVT_REBOOT"
assert REBOOT_REASON_DECODER[1] == "HEARTBEAT_MISS"
assert REBOOT_REASON_DECODER[4] == "WIFI_REPROV"

View File

@@ -0,0 +1,83 @@
# server/test_ota_endpoint.py
import base64
import hashlib
import json
from pathlib import Path
import pytest
from server.ota_endpoint import ota_check_impl, ota_firmware_impl
def write_firmware(firmware_dir: Path, version: str, data: bytes = b"fake_fw") -> None:
sig = bytes(64) # zero sig (not validated server-side)
manifest = {
"version": version,
"size": len(data),
"sha256": hashlib.sha256(data).hexdigest(),
}
(firmware_dir / "current.bin").write_bytes(data)
(firmware_dir / "current.sig").write_bytes(sig)
(firmware_dir / "manifest.json").write_text(json.dumps(manifest))
@pytest.fixture(autouse=True)
def patch_firmware_dir(tmp_path, monkeypatch):
import server.ota_endpoint as mod
monkeypatch.setattr(mod, "FIRMWARE_DIR", tmp_path)
yield tmp_path
def test_check_no_update_same_version(tmp_path):
write_firmware(tmp_path, "1.0.0")
result = ota_check_impl(current_version="1.0.0", firmware_dir=tmp_path)
assert result["update"] is False
def test_check_no_update_newer_local(tmp_path):
write_firmware(tmp_path, "1.0.0")
result = ota_check_impl(current_version="1.1.0", firmware_dir=tmp_path)
assert result["update"] is False
def test_check_update_available(tmp_path):
write_firmware(tmp_path, "1.1.0", data=b"new firmware")
result = ota_check_impl(current_version="1.0.0", firmware_dir=tmp_path)
assert result["update"] is True
assert result["version"] == "1.1.0"
assert result["size"] == len(b"new firmware")
assert "sha256" in result
assert "sig_b64" in result
sig_bytes = base64.b64decode(result["sig_b64"])
assert len(sig_bytes) == 64
def test_check_no_manifest(tmp_path):
result = ota_check_impl(current_version="1.0.0", firmware_dir=tmp_path)
assert result["update"] is False
def test_firmware_endpoint_returns_binary(tmp_path):
fw_data = b"firmware binary content"
write_firmware(tmp_path, "1.1.0", data=fw_data)
content = ota_firmware_impl(firmware_dir=tmp_path)
assert content == fw_data
def test_firmware_endpoint_missing_raises(tmp_path):
import server.ota_endpoint as mod
with pytest.raises(mod.FirmwareNotFoundError):
ota_firmware_impl(firmware_dir=tmp_path)
def test_check_malformed_manifest(tmp_path):
(tmp_path / "manifest.json").write_text("not valid json{{{")
result = ota_check_impl(current_version="1.0.0", firmware_dir=tmp_path)
assert result["update"] is False
def test_check_wrong_arity_version_no_update(tmp_path):
write_firmware(tmp_path, "1.2") # wrong arity server version
result = ota_check_impl(current_version="1.0.0", firmware_dir=tmp_path)
# server "1.2" → (0,0,0) ≤ client (1,0,0) → no update
assert result["update"] is False

0
tools/__init__.py Normal file
View File

43
tools/deploy_firmware.py Normal file
View File

@@ -0,0 +1,43 @@
#!/usr/bin/env python3
"""Sign firmware and stage it for the server OTA endpoint."""
import argparse, hashlib, json
from pathlib import Path
from sign_firmware import sign_firmware
def deploy(firmware_path: Path, key_path: Path,
version: str, output_dir: Path) -> None:
output_dir.mkdir(parents=True, exist_ok=True)
data = firmware_path.read_bytes()
sig = sign_firmware(firmware_path, key_path)
(output_dir / "current.bin").write_bytes(data)
(output_dir / "current.sig").write_bytes(sig)
(output_dir / "manifest.json").write_text(json.dumps({
"version": version,
"size": len(data),
"sha256": hashlib.sha256(data).hexdigest(),
}, indent=2))
print(f"Deployed {firmware_path.name} v{version}{output_dir}/")
def main() -> None:
p = argparse.ArgumentParser(description=__doc__)
p.add_argument("firmware", help="Path to .bin")
p.add_argument("version", help="Version string, e.g. 1.2.3")
p.add_argument("--key", default="secrets/firmware_signing_key.pem")
p.add_argument("--out-dir", default="server/firmware")
args = p.parse_args()
deploy(
firmware_path=Path(args.firmware),
key_path=Path(args.key),
version=args.version,
output_dir=Path(args.out_dir),
)
if __name__ == "__main__":
main()

View File

@@ -15,16 +15,38 @@ Usage:
"""
import argparse
import os
import re
import secrets
import subprocess
import sys
import tempfile
HMAC_SECRET_RE = re.compile(r"^[0-9a-fA-F]{64}$")
NVS_NAMESPACE = "doorcounter"
NVS_PARTITION_OFFSET = "0x9000"
NVS_PARTITION_SIZE = "0x5000" # matches firmware partition table (20KB)
# Characters that would change the field/row structure of the NVS-CSV format
# (key,type,encoding,value). A value containing any of these would either
# split into more fields or add rows, silently provisioning the wrong keys.
_CSV_FORBIDDEN = (",", '"', "\n", "\r")
def _reject_csv_metacharacters(name, value):
"""Exit with an error if value contains a character that would corrupt
the NVS CSV. Used for operator-supplied strings (device id, location id,
WiFi credentials)."""
for c in _CSV_FORBIDDEN:
if c in value:
print(
f"Error: --{name} contains forbidden character {c!r}; "
f"this would corrupt the NVS partition CSV.",
file=sys.stderr,
)
sys.exit(1)
def build_nvs_csv(device_id, location_id, hmac_secret,
wifi_ssid=None, wifi_pass=None, line_offset=50):
@@ -63,6 +85,10 @@ def main():
args = parser.parse_args()
hmac_secret = args.hmac_secret or secrets.token_hex(32)
if not HMAC_SECRET_RE.match(hmac_secret):
print("Error: --hmac-secret must be exactly 64 hex characters (32 bytes)",
file=sys.stderr)
sys.exit(1)
if args.hmac_secret is None:
print(f"Generated HMAC secret: {hmac_secret}")
print(" *** SAVE THIS — you need it to register the device on the server ***")
@@ -71,6 +97,13 @@ def main():
print("Error: --line-offset must be 0-100", file=sys.stderr)
sys.exit(1)
_reject_csv_metacharacters("device-id", args.device_id)
_reject_csv_metacharacters("location-id", args.location_id)
if args.wifi_ssid is not None:
_reject_csv_metacharacters("wifi-ssid", args.wifi_ssid)
if args.wifi_password is not None:
_reject_csv_metacharacters("wifi-password", args.wifi_password)
with tempfile.TemporaryDirectory() as tmp:
csv_path = os.path.join(tmp, "nvs.csv")
bin_path = os.path.join(tmp, "nvs.bin")

57
tools/gen_signing_key.py Normal file
View File

@@ -0,0 +1,57 @@
#!/usr/bin/env python3
"""Generate ECDSA P-256 signing keypair for OTA firmware verification."""
import argparse
import os
from pathlib import Path
from cryptography.hazmat.primitives.asymmetric import ec
from cryptography.hazmat.primitives import serialization
def generate(secrets_dir: Path, header_out: Path) -> None:
secrets_dir.mkdir(parents=True, exist_ok=True)
key = ec.generate_private_key(ec.SECP256R1())
pem = key.private_bytes(
encoding=serialization.Encoding.PEM,
format=serialization.PrivateFormat.PKCS8,
encryption_algorithm=serialization.NoEncryption(),
)
key_path = secrets_dir / "firmware_signing_key.pem"
key_path.write_bytes(pem)
key_path.chmod(0o600)
pub_bytes = key.public_key().public_bytes(
encoding=serialization.Encoding.X962,
format=serialization.PublicFormat.UncompressedPoint,
)
assert len(pub_bytes) == 65 and pub_bytes[0] == 0x04
hex_values = ", ".join(f"0x{b:02x}" for b in pub_bytes)
header = (
"#pragma once\n"
"// Auto-generated by tools/gen_signing_key.py — DO NOT EDIT\n"
"// ECDSA P-256 public key, uncompressed X9.62 (04 || X || Y)\n"
f"static const uint8_t kOtaPublicKey[65] = {{{hex_values}}};\n"
)
header_out.parent.mkdir(parents=True, exist_ok=True)
header_out.write_text(header)
print(f"Private key → {key_path}")
print(f"Public key header → {header_out}")
def main() -> None:
p = argparse.ArgumentParser(description=__doc__)
p.add_argument("--secrets-dir", default="secrets",
help="Directory for private key (default: secrets/)")
p.add_argument("--header-out",
default="firmware/lib/ota_updater/ota_pubkey.h",
help="Path to write the C header")
args = p.parse_args()
generate(Path(args.secrets_dir), Path(args.header_out))
if __name__ == "__main__":
main()

52
tools/sign_firmware.py Normal file
View File

@@ -0,0 +1,52 @@
#!/usr/bin/env python3
"""Sign a firmware binary with ECDSA P-256. Outputs a raw 64-byte r||s .sig file."""
import argparse
import sys
from pathlib import Path
from cryptography.hazmat.primitives.asymmetric import ec
from cryptography.hazmat.primitives import hashes, serialization
from cryptography.hazmat.primitives.asymmetric.utils import decode_dss_signature
def load_private_key(key_path: Path) -> ec.EllipticCurvePrivateKey:
key = serialization.load_pem_private_key(key_path.read_bytes(), password=None)
if not isinstance(key, ec.EllipticCurvePrivateKey):
raise ValueError("Key must be an EC private key")
if not isinstance(key.curve, ec.SECP256R1):
raise ValueError(f"Key must use SECP256R1 curve, got {key.curve.name}")
return key
def sign_firmware(firmware_path: Path, key_path: Path) -> bytes:
key = load_private_key(key_path)
data = firmware_path.read_bytes()
sig_der = key.sign(data, ec.ECDSA(hashes.SHA256()))
r, s = decode_dss_signature(sig_der)
# Returns raw 64-byte r‖s (not DER) — mbedtls_ecdsa_verify expects this layout
return r.to_bytes(32, 'big') + s.to_bytes(32, 'big')
def main() -> None:
p = argparse.ArgumentParser(description=__doc__)
p.add_argument("firmware", help="Path to firmware .bin")
p.add_argument("--key", default="secrets/firmware_signing_key.pem",
help="Path to PEM private key")
p.add_argument("--out", help="Output .sig path (default: firmware.bin.sig)")
args = p.parse_args()
firmware = Path(args.firmware)
key_path = Path(args.key)
out_path = Path(args.out) if args.out else firmware.with_suffix(".bin.sig")
try:
sig = sign_firmware(firmware, key_path)
except (FileNotFoundError, ValueError) as e:
print(f"Error: {e}", file=sys.stderr)
raise SystemExit(1)
out_path.write_bytes(sig)
print(f"Signed {firmware.name}{out_path} ({len(sig)} bytes)")
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,63 @@
import json, hashlib, sys
from pathlib import Path
import pytest
REPO_ROOT = Path(__file__).parent.parent
sys.path.insert(0, str(REPO_ROOT / "tools"))
from cryptography.hazmat.primitives.asymmetric import ec
from cryptography.hazmat.primitives import serialization
from deploy_firmware import deploy
@pytest.fixture()
def key_pem(tmp_path):
key = ec.generate_private_key(ec.SECP256R1())
pem_path = tmp_path / "key.pem"
pem_path.write_bytes(key.private_bytes(
serialization.Encoding.PEM,
serialization.PrivateFormat.PKCS8,
serialization.NoEncryption(),
))
return pem_path
def test_deploy_writes_all_artifacts(tmp_path, key_pem):
firmware = tmp_path / "firmware.bin"
firmware.write_bytes(b"fake firmware" * 200)
out_dir = tmp_path / "server_firmware"
deploy(firmware_path=firmware, key_path=key_pem,
version="1.2.3", output_dir=out_dir)
assert (out_dir / "current.bin").exists()
assert (out_dir / "current.sig").exists()
assert (out_dir / "manifest.json").exists()
def test_manifest_contents(tmp_path, key_pem):
data = b"firmware payload"
firmware = tmp_path / "fw.bin"
firmware.write_bytes(data)
out_dir = tmp_path / "out"
deploy(firmware_path=firmware, key_path=key_pem,
version="2.0.1", output_dir=out_dir)
manifest = json.loads((out_dir / "manifest.json").read_text())
assert manifest["version"] == "2.0.1"
assert manifest["size"] == len(data)
assert manifest["sha256"] == hashlib.sha256(data).hexdigest()
def test_signature_is_64_bytes(tmp_path, key_pem):
firmware = tmp_path / "fw.bin"
firmware.write_bytes(b"fw")
out_dir = tmp_path / "out"
deploy(firmware_path=firmware, key_path=key_pem,
version="1.0.0", output_dir=out_dir)
sig = (out_dir / "current.sig").read_bytes()
assert len(sig) == 64

View File

@@ -0,0 +1,17 @@
import pytest
from tools.flash_device import _reject_csv_metacharacters
def test_clean_value_accepted():
"""A value with no metacharacters should pass without exiting."""
_reject_csv_metacharacters("device-id", "dc-0042")
_reject_csv_metacharacters("location-id", "retailer-123")
_reject_csv_metacharacters("wifi-ssid", "StoreWiFi-2.4GHz")
_reject_csv_metacharacters("wifi-password", "p@ssw0rd!~#$%^&*()_+-=:;<>?/")
@pytest.mark.parametrize("bad", ["Home,Network", 'pa"ss', "ssid\nfoo", "name\rbar"])
def test_metacharacter_rejected(bad):
with pytest.raises(SystemExit):
_reject_csv_metacharacters("wifi-ssid", bad)

View File

@@ -0,0 +1,49 @@
import os, subprocess, sys, tempfile
from pathlib import Path
REPO_ROOT = Path(__file__).parent.parent
def run_gen(secrets_dir, header_path):
env = os.environ.copy()
result = subprocess.run(
[sys.executable, str(REPO_ROOT / "tools/gen_signing_key.py"),
"--secrets-dir", str(secrets_dir),
"--header-out", str(header_path)],
capture_output=True, text=True, env=env
)
assert result.returncode == 0, result.stderr
return result
def test_private_key_created():
with tempfile.TemporaryDirectory() as d:
header = Path(d) / "ota_pubkey.h"
run_gen(d, header)
pem = Path(d) / "firmware_signing_key.pem"
assert pem.exists()
content = pem.read_text()
assert "BEGIN PRIVATE KEY" in content
def test_header_created():
with tempfile.TemporaryDirectory() as d:
header = Path(d) / "ota_pubkey.h"
run_gen(d, header)
assert header.exists()
content = header.read_text()
assert "kOtaPublicKey" in content
assert "0x04" in content # uncompressed point prefix
assert "[65]" in content
def test_public_key_is_valid_p256_point():
from cryptography.hazmat.primitives.asymmetric import ec
from cryptography.hazmat.primitives import serialization
with tempfile.TemporaryDirectory() as d:
header = Path(d) / "ota_pubkey.h"
run_gen(d, header)
pem = (Path(d) / "firmware_signing_key.pem").read_bytes()
priv = serialization.load_pem_private_key(pem, password=None)
pub_bytes = priv.public_key().public_bytes(
serialization.Encoding.X962,
serialization.PublicFormat.UncompressedPoint,
)
assert len(pub_bytes) == 65
assert pub_bytes[0] == 0x04

View File

@@ -0,0 +1,64 @@
import sys
from pathlib import Path
import pytest
from cryptography.exceptions import InvalidSignature
from cryptography.hazmat.primitives.asymmetric import ec
from cryptography.hazmat.primitives import hashes, serialization
from cryptography.hazmat.primitives.asymmetric.utils import decode_dss_signature
REPO_ROOT = Path(__file__).parent.parent
sys.path.insert(0, str(REPO_ROOT / "tools"))
from sign_firmware import sign_firmware, load_private_key
@pytest.fixture()
def keypair(tmp_path):
key = ec.generate_private_key(ec.SECP256R1())
pem_path = tmp_path / "key.pem"
pem_path.write_bytes(key.private_bytes(
serialization.Encoding.PEM,
serialization.PrivateFormat.PKCS8,
serialization.NoEncryption(),
))
return key, pem_path
def test_signature_is_64_bytes(keypair, tmp_path):
key, key_path = keypair
firmware = tmp_path / "fw.bin"
firmware.write_bytes(b"fake firmware data" * 100)
sig = sign_firmware(firmware, key_path)
assert len(sig) == 64
def test_signature_verifies(keypair, tmp_path):
key, key_path = keypair
data = b"test firmware payload"
firmware = tmp_path / "fw.bin"
firmware.write_bytes(data)
sig_raw = sign_firmware(firmware, key_path)
# Convert raw r||s back to DER for cryptography lib verify
r = int.from_bytes(sig_raw[:32], 'big')
s = int.from_bytes(sig_raw[32:], 'big')
from cryptography.hazmat.primitives.asymmetric.utils import encode_dss_signature
sig_der = encode_dss_signature(r, s)
key.public_key().verify(sig_der, data, ec.ECDSA(hashes.SHA256()))
def test_wrong_key_fails_verification(keypair, tmp_path):
key, key_path = keypair
firmware = tmp_path / "fw.bin"
firmware.write_bytes(b"firmware")
sig_raw = sign_firmware(firmware, key_path)
other_key = ec.generate_private_key(ec.SECP256R1())
r = int.from_bytes(sig_raw[:32], 'big')
s = int.from_bytes(sig_raw[32:], 'big')
from cryptography.hazmat.primitives.asymmetric.utils import encode_dss_signature
sig_der = encode_dss_signature(r, s)
with pytest.raises(InvalidSignature):
other_key.public_key().verify(sig_der, b"firmware", ec.ECDSA(hashes.SHA256()))