Compare commits

26 Commits

Author SHA1 Message Date
d2c2d97fb7 feat(ota): harden OTA apply flow + bump firmware to 1.0.1
End-to-end OTA verified on dc-0002 after resolving server-side schema
mismatch (server now emits update/size/sig_b64 alongside existing fields).

Firmware changes:
- Bump FW_VERSION 1.0.0 -> 1.0.1
- Replace log_i/w/e with Serial.printf in ota_updater so output appears
  regardless of CORE_DEBUG_LEVEL (the prior macros were silent in prod)
- Log partition labels/offsets, per-128KB progress, computed sha256,
  HTTP errors with body, esp_ota_* errors by name, Content-Length vs
  expected size
- Check esp_ota_write return value (previously ignored -- silent
  partition corruption on write failure) and abort cleanly on error
- Reject update if expected_size > target partition size
- Serial.flush() + 500ms delay before esp_restart() so the final log
  line escapes the UART
- Boot-time: log running partition label/offset/state + FW_VERSION,
  and call esp_ota_mark_app_valid_cancel_rollback() on PENDING_VERIFY
  to prevent silent rollback after a successful OTA

Docs:
- Rewrite docs/ota-deployment-status.md to reflect resolved state,
  document the schema fix and the .bin/.sig co-deploy invariant
2026-05-14 12:21:52 -07:00
5ec678dfa3 fix: tighten version parsing, propagate HMAC sign failure, add deployment docs
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-05-11 11:26:44 -07:00
5cf122b922 feat(firmware): wire OTA updater into main loop with 6-hour polling task
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-05-11 11:22:29 -07:00
a21dcfa349 feat(firmware): implement OTA download, ECDSA verify, and flash
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-05-11 11:18:44 -07:00
66e6808e13 feat(firmware): implement ECDSA P-256 signature verification in OTA library
Replaces placeholder ota_verify_signature_with_key with real mbedtls
ECDSA verify; adds 4-case native test suite with generated P-256 vectors.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-05-11 11:15:52 -07:00
8b1fd10db7 feat(firmware): add OTA updater library skeleton with version comparison
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-05-11 06:59:02 -07:00
f37e0d6b07 feat(tools): add firmware deploy tool (sign + stage for server) 2026-05-11 06:55:44 -07:00
81bcc12f2f fix(server): add error handling for malformed OTA manifest and missing sig file
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-05-11 06:54:26 -07:00
d9a242a5fa feat(server): add OTA check and firmware download endpoints
Implements /ota/check (version comparison + sig_b64 payload) and
/ota/firmware (binary stream) using the same _impl pattern as
camera_endpoint.py. HMAC auth left commented pending main app wiring.
6/6 tests passing.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-05-11 06:52:46 -07:00
87b30a64b2 fix(tools): add key type validation and tighten test assertions in sign_firmware
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-05-11 06:50:51 -07:00
031426e364 feat(tools): add ECDSA P-256 firmware signing tool 2026-05-11 06:49:15 -07:00
437f73739f feat(tools): add ECDSA P-256 key generation tool and public key header
Generates firmware signing keypair; private key stays in gitignored
secrets/, public key written as 65-byte C array to
firmware/lib/ota_updater/ota_pubkey.h for compile-time OTA verification.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-05-11 06:47:10 -07:00
21a3c646aa docs(firmware): document FW_VERSION format constraint for OTA version compare 2026-05-11 06:45:53 -07:00
81dc96b100 feat(firmware): add FW_VERSION constant 2026-05-11 06:44:59 -07:00
56fc58b843 fix(tools): reject CSV metacharacters in flash_device.py inputs
device-id, location-id, wifi-ssid, and wifi-password were interpolated
directly into the NVS partition CSV. A value containing comma, double
quote, CR, or LF would split the field/row and silently provision the
wrong NVS keys — easiest concrete failure: a Wi-Fi password containing
a comma. Validate operator-supplied strings before generating the CSV.

Add an empty tools/__init__.py so the regression tests can import the
helper as 'tools.flash_device' (matches the existing 'server.*' test
pattern).

Found via adversarial review (run 2026-05-01-192928, gpt-5.5 reviewer).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 15:44:57 -07:00
641ab29277 fix(server): reject inverted period_start/period_end in CameraRecord
A misbehaving or clock-broken device could submit period_end <=
period_start, polluting the camera_records table with zero-length or
inverted windows that corrupt downstream hourly analytics. Add a
Pydantic model_validator so the request is rejected at the API
boundary instead of silently persisting bad ranges.

Found via adversarial review (run 2026-05-01-191359, both reviewers).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 15:44:57 -07:00
8342904488 fix(firmware/lib): wrap-safe millis() comparison in net_guard reconnect timer
net_guard_tick() compared absolute uint32_t millis() values:
  if (millis() < s_next_retry_ms) return;
This is broken across the ~49.7-day millis() wrap: depending on which
side of the wrap each value lands, retries either tight-loop or stall
indefinitely. The device is designed for multi-month uptime, so this
is a real production case, not a theoretical one.

Replace with the standard wrap-safe pattern using a signed difference.

Found via adversarial review (run 2026-05-01-202910, gpt-5.5 reviewer).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 15:36:06 -07:00
ef00afb14e fix(firmware/lib): validate HMAC secret length and hex format before signing
hmac_sign() previously trusted whatever secret_hex came out of NVS:
- Lengths >128 chars overflowed the fixed 64-byte stack buffer in
  hex_to_bytes (out_len was unbounded).
- Non-hex characters were silently decoded to 0 via strtol with no
  end-pointer check, producing signatures under a corrupted key.
- Empty secrets fell through to mbedtls_md_hmac_starts with len=0.

flash_device.py now rejects malformed --hmac-secret at provision time,
but hmac_sign should also refuse to sign under a malformed key regardless
of how it ended up in NVS (legacy provisioning, partial flash, etc.).

Add length, hex-charset, and even-length validation; make hex_to_bytes
return bool and have hmac_sign return empty HString on any failure
(callers already treat empty as failure via post_json_once).

Found via adversarial review (run 2026-05-01-202910, both reviewers).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 15:36:06 -07:00
96ede7c999 chore: gitignore secrets, pycache, and adversarial-review artifacts
Add patterns for *secret* files (e.g. operator-saved HMAC secrets at
repo root), __pycache__/ directories, and .adversarial-review/ run
artifacts so they don't get accidentally committed via 'git add -A'.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 13:21:15 -07:00
e2dbe6a2d5 fix(server): COALESCE diagnostic columns so v1.0 heartbeats don't clear v1.1 data
store_heartbeat_diagnostics() unconditionally SET each diagnostic column
to its parameter, so a v1.0.0 heartbeat (which omits the five v1.1.0
fields and leaves them as None after Pydantic parsing) erased previously
stored diagnostics for that device. Wrap each parameter in
COALESCE(?, column_name) so omitted fields preserve the existing value.

Found via adversarial review (gpt-5.5 reviewer, run 2026-05-01-191359).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 13:19:23 -07:00
2226c1b4ca fix(tools): validate flash_device.py HMAC secret format before flashing
--hmac-secret accepted any string and passed it through to NVS, silently
producing a device that cannot authenticate to the server. Reject anything
that isn't exactly 64 hex characters (32 bytes) before generating the NVS
image. Auto-generated secrets are validated too as a defensive check.

Found via adversarial review (both reviewers, run 2026-05-01-192928).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 13:19:16 -07:00
a0eee0e6d4 fix(firmware): preserve buffered records appended during flush POST
reporter_flush() snapshotted the buffers under lock, released the lock
to POST, then unconditionally cleared the entire buffer on success.
Records appended by reporter_submit_*() during the in-flight POST were
silently erased. Replace clear() with erase() of just the snapshotted
prefix so concurrent appends survive.

Found via adversarial review (gpt-5.5 reviewer, run 2026-05-01-190903).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 13:19:11 -07:00
a585a56cff fix(firmware): upgrade NimBLE to 2.x + DNS fallback for unreliable resolvers
NimBLE-Arduino 1.4.2 had an init/fire race in its FreeRTOS callout porting
layer where os_callout_timer_cb dispatched a queued TimerHandle expiry
against a not-yet-initialized event (NULL fn pointer), causing PC=0
InstrFetchProhibited within ~1s of boot when the camera task starved the
timer service. Confirmed by ets_printf instrumentation. Upgrading to
^2.0.0 rewrites the porting layer and eliminates the race; verified clean
on the customer network for 1+ hour.

Also rolls in DNS-resilience work that surfaced the BLE crash during
provisioning: pin lwIP/esp-netif resolvers to 1.1.1.1/8.8.8.8 across DHCP
renewals, add three-tier resolver fallback in reporter with a hardcoded
IP of last resort, and switch to raw WiFiClient with manual Host header
to bypass HTTPClient's brittle DNS path.

Migration touches for NimBLE 2.x:
- NimBLEAdvertisedDeviceCallbacks -> NimBLEScanCallbacks
- onResult signature now takes const NimBLEAdvertisedDevice*
- setAdvertisedDeviceCallbacks -> setScanCallbacks
- start(0, nullptr, false) -> start(0, false, false)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 11:34:17 -07:00
461ed7d888 docs(readme): add HMAC secret generation command to operator setup
Step 2 now shows openssl rand -hex 32 (with python and /dev/urandom
fallbacks) and writes to .agent/dc-<id>-secret with chmod 600, so the
flash_device.py example can read $(cat ...) the same way the known-good
dc-0002 command does.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 14:45:08 -07:00
259256a550 docs: retailer packet — setup guide (.docx) + repo QR code
Adds the printed materials shipped with each device:
- retailer-setup-guide.docx — non-technical 1-2 page setup guide
- retailer-setup-guide.py — generator script for the .docx
- doorcounter-repo-qr.png — QR code linking to the public Gitea repo

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 14:38:22 -07:00
be44299d3e docs(readme): add quick-start, hardware sources, power draw + latency notes
Adds a sourced parts table (M5 TimerCamera-F, USB cable, 5V adapter), the
~750 mW measured power draw, the 3-5s detection latency caveat, and a
six-step Quick Start aimed at semi-technical operators deploying their
own device.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 14:26:45 -07:00
37 changed files with 2952 additions and 55 deletions

5
.gitignore vendored
View File

@@ -1,6 +1,11 @@
.worktrees/
.agent/
.claude/
.adversarial-review/
graphify-out/
firmware/.pio/
*.log
*secret*
__pycache__/
secrets/
server/firmware/

142
README.md
View File

@@ -2,13 +2,95 @@
Retail door traffic counter using M5Stack TimerCamera-F (ESP32 + OV3660). Counts walker traversals via overhead camera CV, passively scans BLE foot traffic, and reports hourly to `logs.research.bike`.
> **Known limitation — directional accuracy.** This firmware reports counts as `{entries, exits}` for API compatibility, but **per-walk direction labelling is not reliable at the current mount (7' overhead, straight down).** In bench testing, event detection was 100% (8/8 walks detected) while per-walk direction matched the physical walk only ~50% of the time — the centroid trajectories produced by entries and exits were nearly indistinguishable. **The number to trust is gross traffic: `entries + exits` ≈ total walkers through the doorway.** The directional split is an unreliable best-effort heuristic. See [Directional counting](#directional-counting) for why.
> **Known limitations.**
> - **Directional accuracy.** Counts are reported as `{entries, exits}` for API compatibility, but **per-walk direction labelling is not reliable at the current mount (7' overhead, straight down).** Bench testing: event detection 100% (8/8), per-walk direction ~50% (coin flip). **Trust gross traffic: `entries + exits` ≈ total walkers.** See [Directional counting](#directional-counting).
> - **Detection latency.** A walker takes **35 seconds** from entering the FOV to being registered as a count — the state machine waits for the walker to clear the frame (or a 5s timeout) before finalizing. Counts are not instantaneous; hourly aggregation is the intended consumption mode.
## Hardware
- **Device**: M5Stack TimerCamera-F (ESP32-S, OV3660, PSRAM, WiFi/BLE)
- **Mount**: Overhead, camera pointing straight down, centered above doorway
- **Power**: USB (any phone charger)
| Component | Source | Notes |
|-----------|--------|-------|
| **Camera** | [M5Stack TimerCamera-F (OV3660 fisheye, PSRAM)](https://shop.m5stack.com/products/esp32-psram-timer-camera-fisheye-ov3660) | ESP32 + WiFi/BLE on board |
| **USB cable** | [USB-A → USB-C, right-angle](https://www.amazon.com/dp/B0DWMPVP4F) | Right-angle plug helps with overhead mounts |
| **Power supply** | [5V USB wall adapter](https://www.amazon.com/dp/B0B2WLSY9D) | Any 5V/1A+ USB charger works |
- **Mount**: Overhead, camera pointing straight down, centered above doorway (~7' / 2.1m height)
- **Power draw**: **~750 mW measured at the wall** (camera + WiFi + BLE all active). Runs cool — fanless, can be sealed in a small enclosure. Annual energy cost at US residential rates is well under $1.
## Quick Start (semi-technical)
The fastest path from "box arrived" to "counts in the dashboard." Comfortable with a terminal but not necessarily an embedded developer? Start here.
**You will need**: the camera + cable + power supply listed above, a Linux/macOS computer with USB, and ~20 minutes.
### 1. Install the toolchain (one-time)
```bash
# Python 3.10+ and pip
pip install --user platformio esptool esp-idf-nvs-partition-gen
```
PlatformIO installs the ESP32 compiler on first build — expect a few minutes the first time.
### 2. Clone this repo
```bash
git clone https://github.com/<your-org>/DoorCounter.git
cd DoorCounter
```
### 3. Plug the camera in
Connect the USB-C cable to the TimerCamera and the other end to your computer. On Linux it appears as `/dev/ttyUSB0`; on macOS as `/dev/tty.usbserial-*`. If you don't see it, install [CP210x USB drivers](https://www.silabs.com/developer-tools/usb-to-uart-bridge-vcp-drivers).
### 4. Flash the firmware
```bash
cd firmware
pio run -t upload --upload-port /dev/ttyUSB0
```
### 5. Provision the device with its credentials
Pick a unique device ID (e.g. `dc-0001`), a location ID, and generate a 32-byte HMAC secret. The server admin must record this same secret — counts won't be accepted without it.
```bash
# Generate a fresh secret
openssl rand -hex 32 > my-device-secret.txt
# Provision
python tools/flash_device.py \
--port /dev/ttyUSB0 \
--device-id dc-0001 \
--location-id my-store \
--hmac-secret "$(cat my-device-secret.txt)" \
--wifi-ssid "MyStoreWiFi" \
--wifi-password "wifi-password-here"
```
> If you skip `--wifi-ssid`/`--wifi-password`, the device opens a `DoorCounter-Setup` WiFi access point on boot. Connect a phone to it and enter the credentials in the captive portal.
### 6. Mount the device
1. Position above the doorway, camera lens pointing straight down (~7' / 2.1m up).
2. Plug into the wall adapter — that's it. The LED turns red while joining WiFi, then off once it's counting.
3. First heartbeat lands at the server within ~60 seconds; first hourly count batch arrives at the top of the next hour.
### What "working" looks like
- LED behavior: **off** = counting normally · **red** = no WiFi · **yellow** = uploading · **brief flash** when a walker is registered (1 flash = entry, 2 flashes = exit).
- A walker takes 35 seconds from entering the FOV to triggering the LED flash — this is normal.
- Hourly uploads to `logs.research.bike` (or your configured server) include the entry/exit counts since the last report.
### If something is off
| Symptom | Try |
|---------|-----|
| Red LED stays on | Wrong WiFi password — re-run step 5, or use the `DoorCounter-Setup` captive portal. |
| LED blinks ~1 Hz forever (or device reboots in a loop) | NVS got wiped — re-run step 5 with the same credentials. |
| No counts appearing on server | Run `python tools/serial_monitor.py --port /dev/ttyUSB0 --reset --timestamp --seconds 30` and watch for `[CV] entry/exit` lines as you walk under it. |
For deeper troubleshooting see [Troubleshooting](#troubleshooting) and [Operator Setup](#operator-setup).
## Firmware
@@ -111,22 +193,58 @@ pio run -t upload --upload-port /dev/ttyUSB0
### 2. Provision device identity
Generate a fresh 32-byte HMAC secret (64 hex chars) and stash it where you
won't lose it — the server must store the same value or counts will be
rejected:
```bash
# Generate and save (one device per file; never commit these)
mkdir -p .agent
openssl rand -hex 32 > .agent/dc-0042-secret
chmod 600 .agent/dc-0042-secret
```
> No `openssl`? Equivalents:
> - `python3 -c 'import secrets; print(secrets.token_hex(32))'`
> - `head -c 32 /dev/urandom | xxd -p -c 64`
Then provision:
```bash
python tools/flash_device.py \
--port /dev/ttyUSB0 \
--device-id dc-0042 \
--location-id retailer-123 \
--hmac-secret <32-byte-hex> \
--hmac-secret "$(cat .agent/dc-0042-secret)" \
--wifi-ssid "StoreWiFi" \
--wifi-password "secret"
```
WiFi credentials are optional — if omitted, device starts captive portal on boot.
**Known-good command for dc-0002** (dev device at research.bike):
```bash
python tools/flash_device.py \
--port /dev/ttyUSB0 \
--device-id dc-0002 \
--location-id retailer-123 \
--hmac-secret "$(cat .agent/dc-0002-secret)" \
--wifi-ssid Elly-Fi \
--wifi-password <ask> \
--line-offset 50
```
Secret is stored in `.agent/dc-0002-secret` (gitignored). Server must already
know this secret — do not rotate without updating the server side.
> **Re-provision after firmware uploads.** Flashing firmware via
> `pio run -t upload` may clear the NVS partition on this board. If the device
> boots into a ~1 Hz LED blink (the "not provisioned" fatal state) after a
> firmware update, re-run `flash_device.py` with the same credentials. See
> `pio run -t upload` may clear the NVS partition on this board.
> - **FW 1.0**: device boots into a ~1 Hz LED blink (hang in "not provisioned" fatal).
> - **FW 1.1+**: device reboot-loops with `FATAL: device_id/location_id/hmac_secret not provisioned`
> followed by `rst:0xc (SW_CPU_RESET)` (FATAL paths now reboot instead of hang).
>
> Either way, re-run `flash_device.py` with the same credentials. See
> [Troubleshooting](#troubleshooting).
### 3. OTA updates
@@ -188,7 +306,7 @@ DoorCounter/
| Symptom | Likely cause | Remedy |
|---------|--------------|--------|
| ~1 Hz LED blink after boot, no serial beyond `esp_core_dump_flash: No core dump partition found!` | NVS missing `device_id` / `location_id` / `hmac_secret`. Commonly triggered by a firmware upload wiping NVS. | Re-run `flash_device.py` with the device's known credentials. |
| ~1 Hz LED blink after boot (FW 1.0), OR reboot loop with `FATAL: device_id/location_id/hmac_secret not provisioned``rst:0xc (SW_CPU_RESET)` (FW 1.1+) | NVS missing `device_id` / `location_id` / `hmac_secret`. Commonly triggered by a firmware upload wiping NVS. FW 1.1+ reboots on FATAL instead of hanging. | Re-run `flash_device.py` with the device's known credentials (see section 2 for dc-0002). |
| Device stays on `DoorCounter-Setup` AP instead of joining customer WiFi | SSID/password in NVS wrong, or network out of range. | Connect phone to `DoorCounter-Setup` → captive portal → re-enter WiFi. Or reflash NVS with correct `--wifi-ssid` / `--wifi-password`. |
| No entries/exits counted for a known-walking doorway | WiFi captive portal still up (camera task starts only after connect); or camera blocked/unfocused. | Check LED: solid on = booting/uploading, off = counting. Run `serial_monitor.py` to see `[CV] entry/exit` log lines. |
@@ -228,6 +346,12 @@ flash any device.
cd firmware && pio run -e timercam -t upload
```
> **If the device reboot-loops after flashing** with `FATAL:
> device_id/location_id/hmac_secret not provisioned`, NVS was wiped. Re-run
> `flash_device.py` (see [section 2](#2-provision-device-identity)). FW 1.1
> turned the old FW 1.0 LED-blink hang into an explicit reboot loop; same
> root cause, same fix.
### Expected first boot
On the serial log (115200 baud), the device prints the boot banner, then

Binary file not shown.

After

Width:  |  Height:  |  Size: 645 B

View File

@@ -0,0 +1,88 @@
# OTA Deployment — Status
## Current state (2026-05-14)
**End-to-end OTA verified working on `dc-0002`.** Device polled `engagement-api-1`, received a signed manifest, downloaded and verified firmware 1.0.1, set the alternate boot partition, rebooted, and came up reporting `fw=1.0.1`.
## What's deployed
- **Branch `feat/pull-ota-code-signing`** merged to `main` (13 commits, 17 new files, 936 LOC).
- **Signing toolchain**: `tools/gen_signing_key.py`, `tools/sign_firmware.py`, `tools/deploy_firmware.py`.
- **Firmware OTA library**: `firmware/lib/ota_updater/`.
- **Signing key**: `secrets/firmware_signing_key.pem` (gitignored). Public key committed at `firmware/lib/ota_updater/ota_pubkey.h`.
- **Live OTA handler**: served by `engagement-api-1` Docker service (source not in this repo). The stub at `server/ota_endpoint.py` is unwired and not the one responding to devices.
- **Configurable poll interval** via NVS key `ota_interval`. Provision with `flash_device.py --ota-interval-seconds N`. Min 10 s, default 21600 (6 h).
## Issues resolved
### 1. HMAC format mismatch (resolved 2026-05-13)
Firmware OTA updater was using `X-HMAC-Signature` header + `millis()`-derived timestamp; the reporter component used `X-Signature` + `time(nullptr)`. Server expected the reporter format. Fixed by aligning the OTA updater to the same canonical scheme as the reporter (`firmware/lib/ota_updater/ota_updater.cpp` `add_hmac_headers`).
### 2. `/ota/check` JSON schema mismatch (resolved 2026-05-14)
Server was emitting `{update_available, sha256, url}` but firmware reads `{update, size, sig_b64}`. Device silently decided "up to date" every poll because `doc["update"]` defaulted to `false`. Fixed server-side: the `/ota/check` response now also includes the fields the firmware needs. Firmware schema remains the source of truth.
### 3. Signed firmware artifact pipeline (resolved 2026-05-14)
Deploy flow now bumps `FW_VERSION` → builds → copies `.pio/build/timercam/firmware.bin` to `firmware-<version>.bin` → signs with `tools/sign_firmware.py` → SCPs both `.bin` and `.bin.sig` to `root@nginx:/root/engagement-api/firmware/`. Server team updates `firmware_releases.sha256` to match the new binary.
**Gotcha:** the `.bin` and `.sig` must always be deployed together. The signature is over the bytes; replacing one without the other puts the server in an inconsistent state and devices will reject the update with `SIGNATURE INVALID`.
## Hardening added this session
### Firmware logging (`firmware/lib/ota_updater/ota_updater.cpp`, `firmware/src/main.cpp`)
The previous `log_i/w/e` macros were silenced by the default `CORE_DEBUG_LEVEL`. Replaced with `Serial.printf` so output appears regardless of log level. Now logs at every step:
- `[OTA] task started, interval=N ms`
- Per-tick WiFi status
- Full check URL + HMAC header preview (device id, ts, sig prefix)
- HTTP response code + error body on non-200
- JSON parse errors
- "Up to date" decision
- Partition labels and offsets (running + target)
- Per-128 KB download progress
- Total bytes + elapsed ms
- Computed sha256 of the downloaded image (compare against server `X-SHA256`)
- Signature verify result
- `esp_ota_end` / `esp_ota_set_boot_partition` errors by name
- 500 ms `Serial.flush()` + `delay()` before `esp_restart()` so the final log line escapes the UART
### Boot-time partition state (`firmware/src/main.cpp`)
Logs `running partition '<label>' (off=0x…) state=N fw=…` at every boot. If `state == ESP_OTA_IMG_PENDING_VERIFY` (3), calls `esp_ota_mark_app_valid_cancel_rollback()` to prevent the bootloader from reverting on the next reboot. Harmless no-op when rollback isn't enabled, but eliminates a class of silent OTA failures.
### `esp_ota_write` return value (`firmware/lib/ota_updater/ota_updater.cpp`)
Previously ignored — a failed write would silently corrupt the new partition and the device would still try to boot from it. Now checked, aborts the OTA cleanly, and logs the failing offset.
### Partition size pre-check
Reject the update before `esp_ota_begin` if `expected_size > target->size`.
## Verifying a deployment
After a server push, watch the device's serial output on the next OTA tick:
```
[OTA] tick: WiFi connected, running check
[OTA] check → GET http://logs.research.bike:80/ota/check?version=X.Y.Z
[OTA] check response: HTTP 200
[OTA] Update: X.Y.Z → A.B.C (N bytes)
[OTA] running='app0' (off=…), target='app1' (off=…)
[OTA] progress: N/N bytes
[OTA] sha256(image)=<hex> ← must match server X-SHA256
[OTA] signature OK
[OTA] boot partition set to 'app1' — rebooting in 500 ms
```
Then on reboot:
```
[BOOT] running partition 'app1' (off=…) state=N fw=A.B.C
```
The `fw=A.B.C` line is the success signal — it reflects the `FW_VERSION` macro baked into the freshly-booted image, not just what the device claims to be running.
## Quick reference
- Plan: `docs/superpowers/plans/2026-05-10-pull-ota-code-signing.md`
- Firmware version: `firmware/include/version.h`
- OTA library: `firmware/lib/ota_updater/`
- HMAC implementation: `firmware/lib/hmac/hmac.cpp`
- Provisioning tool: `tools/flash_device.py`
- Signing tools: `tools/gen_signing_key.py`, `tools/sign_firmware.py`, `tools/deploy_firmware.py`
- Server deploy path: `root@nginx:/root/engagement-api/firmware/` (per server team runbook)

Binary file not shown.

View File

@@ -0,0 +1,133 @@
from docx import Document
from docx.shared import Pt, Inches, RGBColor
from docx.enum.text import WD_ALIGN_PARAGRAPH
doc = Document()
for section in doc.sections:
section.top_margin = Inches(0.6)
section.bottom_margin = Inches(0.6)
section.left_margin = Inches(0.8)
section.right_margin = Inches(0.8)
style = doc.styles['Normal']
style.font.name = 'Calibri'
style.font.size = Pt(11)
def heading(text, size=18, color=(0x1F, 0x3A, 0x5F), space_before=6, space_after=4):
p = doc.add_paragraph()
p.paragraph_format.space_before = Pt(space_before)
p.paragraph_format.space_after = Pt(space_after)
run = p.add_run(text)
run.bold = True
run.font.size = Pt(size)
run.font.color.rgb = RGBColor(*color)
return p
def subheading(text):
return heading(text, size=13, color=(0x1F, 0x3A, 0x5F), space_before=8, space_after=2)
def body(text, bold_lead=None):
p = doc.add_paragraph()
p.paragraph_format.space_after = Pt(4)
if bold_lead:
r = p.add_run(bold_lead)
r.bold = True
p.add_run(text)
else:
p.add_run(text)
return p
def bullet(text, bold_lead=None):
p = doc.add_paragraph(style='List Bullet')
p.paragraph_format.space_after = Pt(2)
if bold_lead:
r = p.add_run(bold_lead)
r.bold = True
p.add_run(text)
else:
p.add_run(text)
return p
# ---------- Title ----------
title = doc.add_paragraph()
title.alignment = WD_ALIGN_PARAGRAPH.CENTER
tr = title.add_run('DoorCounter')
tr.bold = True
tr.font.size = Pt(28)
tr.font.color.rgb = RGBColor(0x1F, 0x3A, 0x5F)
sub = doc.add_paragraph()
sub.alignment = WD_ALIGN_PARAGRAPH.CENTER
sr = sub.add_run('A simple, private way to count visitors to your store')
sr.italic = True
sr.font.size = Pt(13)
sr.font.color.rgb = RGBColor(0x55, 0x55, 0x55)
sub.paragraph_format.space_after = Pt(10)
# ---------- What it is ----------
heading('What is in the box?', size=14)
bullet('A small camera (about the size of a matchbox)', bold_lead='Camera — ')
bullet('A USB cable to power it', bold_lead='Cable — ')
bullet('A small wall plug', bold_lead='Power adapter — ')
body('That\'s it. There is nothing to install on your computer or phone, no software to log into, and no monthly fee.')
# ---------- What it does ----------
heading('What does it do?', size=14)
body('The camera mounts above your front door, pointing straight down at the floor. Whenever someone walks underneath, it counts them. Once an hour, it sends the count to us so we can share visitor traffic reports with you.')
p = doc.add_paragraph()
p.paragraph_format.space_after = Pt(4)
r = p.add_run('Your privacy is protected. ')
r.bold = True
p.add_run('The camera looks straight down at the top of people\'s heads — it cannot see faces. No video or photos are ever saved or sent anywhere. Only the count of how many people walked through.')
# ---------- Setup ----------
heading('How do I set it up?', size=14)
body('The whole process takes about 5 minutes. You will need a stepladder and your store\'s WiFi password.')
subheading('Step 1 — Mount the camera above your door')
body('Use the included double-sided tape (or a screw, if you prefer) to stick the camera to the ceiling, directly above where people walk through your front door. The lens should point straight down at the floor. Aim for roughly 7 feet (about 2 meters) above the floor — most ceilings work fine.')
subheading('Step 2 — Plug it in')
body('Connect the USB cable to the camera and to the wall plug. Plug the wall plug into any standard outlet. The camera will turn on automatically — you will see a small red light.')
subheading('Step 3 — Connect it to your WiFi')
body('Take out your phone and open its WiFi settings. You will see a new network called "DoorCounter-Setup". Connect to it. Your phone will automatically open a setup page — enter your store\'s WiFi name and password, then tap Save.')
body('After about 30 seconds, the red light on the camera will turn off. That means it is connected and counting. You are done!', bold_lead='')
# ---------- Day to day ----------
heading('What do I do day-to-day?', size=14)
body('Nothing. The camera works on its own, 24 hours a day. It uses about as much electricity as a nightlight (less than $1 per year), runs cool, and never needs to be touched.')
p = doc.add_paragraph()
p.paragraph_format.space_after = Pt(4)
r = p.add_run('A small light blinks each time someone walks through. ')
r.bold = True
p.add_run('You may notice the count happens 35 seconds after the person passes — that is normal.')
# ---------- Troubleshooting ----------
heading('If something seems wrong', size=14)
bullet('your WiFi password is probably wrong, or the WiFi network is out of range. Reconnect your phone to "DoorCounter-Setup" and re-enter the password.', bold_lead='Red light stays on — ')
bullet('unplug it for 10 seconds and plug it back in.', bold_lead='No light at all — ')
bullet('please contact us using the information below.', bold_lead='Anything else — ')
# ---------- Contact ----------
heading('Questions?', size=14)
body('We are happy to help. Reach out anytime:')
bullet('peter@research.bike', bold_lead='Email: ')
bullet('https://git.research.bike/Bicycle_Market_Research/DoorCounter', bold_lead='Project page: ')
footer = doc.add_paragraph()
footer.alignment = WD_ALIGN_PARAGRAPH.CENTER
fr = footer.add_run('Thank you for participating in our retail traffic study.')
fr.italic = True
fr.font.size = Pt(10)
fr.font.color.rgb = RGBColor(0x77, 0x77, 0x77)
footer.paragraph_format.space_before = Pt(12)
import sys
out = sys.argv[1] if len(sys.argv) > 1 else 'retailer-setup-guide.docx'
doc.save(out)
print(f"wrote {out}")

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,189 @@
# BLE / NimBLE Timer-Callout Crash — Handoff
**Date opened:** 2026-05-01
**Status:** Resolved 2026-05-01 by upgrading `h2zero/NimBLE-Arduino` from `^1.4.2` to `^2.0.0` (`firmware/platformio.ini:24`). BLE scanning re-enabled via `BLE_SCANNING_ENABLED 1` (`firmware/src/main.cpp:30`). Verified clean on customer network for 1+ hour with no panics.
**Goal:** Re-enable BLE scanning without the device crashing within ~1s of boot.
**Confirmed root cause:** Instrumented `os_callout_timer_cb` with `ets_printf` and observed the very first callout fire on the direct-call path with both `evq=NULL` and `fn=NULL`, while the same `co` address later (post-init) showed valid `evq` and `fn`. Same callout struct reused — classic NimBLE 1.x callout init/fire race where the FreeRTOS `TimerHandle_t` had a queued expiry against a not-yet-initialized event. NimBLE 2.x rewrote the porting layer; the race is gone.
**Migration touches (NimBLE 1.x → 2.x):**
- `NimBLEAdvertisedDeviceCallbacks``NimBLEScanCallbacks`
- `onResult(NimBLEAdvertisedDevice*)``onResult(const NimBLEAdvertisedDevice*)`
- `setAdvertisedDeviceCallbacks(cb, true)``setScanCallbacks(cb, true)`
- `start(0, nullptr, false)``start(0, false, false)` (signature: `duration, isContinue, restart`)
BLE was working before today's customer-site provisioning trip. The crash is reliably reproducible on the current build at the customer location whenever `BLE_SCANNING_ENABLED` is set back to `1`. It may or may not reproduce on a quieter network — the camera task's CPU-starvation pattern is shared, but the crash window's exact trigger is unconfirmed.
---
## Symptom
Within ~1s of boot, after several `cam_hal: EV-VSYNC-OVF` lines from the camera driver:
```
Guru Meditation Error: Core 0 panic'ed (InstrFetchProhibited). Exception was unhandled.
Core 0 register dump:
PC : 0x00000000 PS : 0x00060630 A0 : 0x8009a9af A1 : 0x3ffbd6e0
A2 : 0x3fff1ef8 A3 : 0x00000001 ...
A8 : 0x800f2ebc ...
EXCCAUSE: 0x00000014 EXCVADDR: 0x00000000
Backtrace: 0xfffffffd:0x3ffbd6e0 0x4009a9ac:0x3ffbd700
```
Decoded with `~/.platformio/packages/toolchain-xtensa-esp32/bin/xtensa-esp32-elf-addr2line -e .pio/build/timercam/firmware.elf -pfiC 0x4009a9ac 0x400f2ebc`:
```
prvProcessReceivedCommands at freertos/timers.c:852
(inlined by) prvTimerTask at freertos/timers.c:600
os_callout_timer_cb at NimBLE-Arduino/.../npl_os_freertos.c:1742
```
`PC=0` + `EXCCAUSE=0x14` (InstrFetchProhibited) = jump-to-NULL. The FreeRTOS timer-service task is dispatching a NimBLE callout whose callback function pointer is NULL.
The relevant NimBLE source:
```c
// firmware/.pio/libdeps/timercam/NimBLE-Arduino/src/nimble/porting/npl/freertos/src/npl_os_freertos.c:1729-1742
static void
os_callout_timer_cb(TimerHandle_t timer)
{
struct ble_npl_callout *co;
co = pvTimerGetTimerID(timer);
assert(co);
if (co->evq) {
ble_npl_eventq_put(co->evq, &co->ev);
} else {
co->ev.fn(&co->ev); // <-- co->ev.fn is NULL
}
}
```
Either `co->ev.fn` is genuinely NULL on the direct-call path, OR — given the addr2line frame is a few lines off and the callsite is ambiguous — the FreeRTOS timer's own callback pointer (`pxTimer->pxCallbackFunction`) is NULL inside `prvProcessReceivedCommands`. Both indicate a callout/timer being freed or zeroed while the FreeRTOS timer service still has a command queued for it.
---
## Environment
- Board: M5Stack TimerCam-F (ESP32-D0WDQ6-V3, dual-core 240 MHz, 4MB flash).
- BLE library: `h2zero/NimBLE-Arduino@^1.4.2` (`firmware/platformio.ini`). 1.4.2 is end-of-life on the 1.x branch; 2.x exists with breaking API changes.
- Camera: OV3660 via `esp32-camera` driver, 96×96 grayscale @ 5 FPS.
- BLE scan: passive, low-overhead, hash-collected by `firmware/src/ble_scanner.cpp`.
- Tasks: `task_camera` (core 1, prio 2, 8KB stack), `task_reporter` (core 0, prio 1, 8KB stack), Arduino loop (default).
- The camera task triggers `cam_hal: EV-VSYNC-OVF` whenever frame capture overlaps another long operation — this consistently precedes the crash in logs.
---
## What's been ruled out
1. **DNS / network code** — entirely unrelated. DNS path works in production via the new fallback-IP machinery (`firmware/src/reporter.cpp` `resolve_api_ip` and `firmware/src/reporter.h` `REPORTER_API_FALLBACK_IP`). Do not regress this; it shipped with reports working at the customer site.
2. **Our BLE app code** — the backtrace stays inside the FreeRTOS timer service and NimBLE's own porting layer; nothing in `ble_scanner.cpp` is on the call stack. The bug is in vendored NimBLE.
3. **Memory corruption from our side**`A2 = 0x3fff1ef8` is a normal heap address, no obvious overrun pattern. Heap is healthy at the time (we'd see a different fault otherwise).
4. **Stack overflow** — A1 = 0x3ffbd6e0 is well within the FreeRTOS timer-service task's stack range; no canary smash log.
---
## What changed today
| File | Change | Keep? |
|---|---|---|
| `firmware/src/main.cpp` | Added `BLE_SCANNING_ENABLED 0` gate; all `ble_scanner_*` callsites compile out; `BLEHourlyRecord` zero-stubbed when off | Keep until crash fixed; flip to `1` to reproduce |
| `firmware/src/main.cpp` | Removed verbose `[F]`/`[CV] spawn` per-frame logging; kept entry/exit + heartbeat | Keep |
| `firmware/src/ble_scanner.cpp` | Removed `[BLE] new device:` per-discovery log | Keep |
| `firmware/src/reporter.{h,cpp}` | DNS resolution with fallback IP, raw `WiFiClient` HTTP, manual `Host:` header | Keep — production fix |
| `firmware/lib/net_guard/net_guard.{h,cpp}` | DNS pin to 1.1.1.1/8.8.8.8 at lwIP + esp-netif layers; `net_guard_dump_dns` diagnostic | Keep |
---
## Reproduction
1. `cd firmware && pio run -e timercam`.
2. Edit `firmware/src/main.cpp`, set `#define BLE_SCANNING_ENABLED 1`. Rebuild.
3. Flash a TimerCam: `python tools/flash_device.py --port /dev/ttyUSB0 --device-id dc-XXXX --location-id <loc> --hmac-secret <secret> --wifi-ssid "<ssid>" --wifi-password "<pw>"`.
4. `pio device monitor --port /dev/ttyUSB0 --baud 115200`.
5. Wait ≤30s. Expect the `Guru Meditation Error: Core 0 panic'ed (InstrFetchProhibited)` traceback above.
Crash is **deterministic** on the customer's network (Elly-Fi). Worth retesting on a quiet desk network — if it doesn't repro there, the trigger is camera-task starvation interacting with NimBLE timers, not a pure NimBLE bug.
To decode any future crash backtrace:
```sh
~/.platformio/packages/toolchain-xtensa-esp32/bin/xtensa-esp32-elf-addr2line \
-e firmware/.pio/build/timercam/firmware.elf -pfiC <addr1> <addr2> ...
```
---
## Investigation paths, in order of effort/confidence
### 1. Confirm the failing call site (cheap, do this first)
The addr2line line numbers can be off by ±3 due to inlining. Add a temporary `Serial.printf` patch to `npl_os_freertos.c` `os_callout_timer_cb` to log `co`, `co->evq`, `co->ev.fn` on entry. Reproduce. Then we know with certainty whether `co->ev.fn` is NULL on the direct-call path or whether this is an FreeRTOS-level issue (queued command for a deleted timer).
If `evq != NULL` and we still crash, the NULL is in the queued event dispatcher (a different code path; pivot the investigation).
### 2. Try upgrading NimBLE-Arduino to 2.x (medium effort, likely-fix)
`platformio.ini` has `h2zero/NimBLE-Arduino@^1.4.2`. 2.x rewrote the porting layer significantly. Breaking API changes — `NimBLEAdvertisedDeviceCallbacks` was renamed/restructured. Touch points: `firmware/src/ble_scanner.cpp` (the only file that uses NimBLE).
Try: pin `^2.0.0`, fix the API breakage in `ble_scanner.cpp` (it's <100 lines). If 2.x crashes too, the issue is independent of NimBLE version → pivot to (3) or (4).
### 3. Reduce camera-task starvation (cheap, may be sufficient)
The `EV-VSYNC-OVF` lines are the canary. The camera task pins core 1 at priority 2 doing CV processing every 200ms. NimBLE host task runs on core 0 by default but the FreeRTOS timer service task is core-agnostic and may be starved during long CV passes that hold a mutex.
Things to try in `firmware/src/main.cpp`:
- Lower `CAM_FPS` from 5 to 3, see if VSYNC-OVF still appears.
- Move CV processing off the capture path (capture into a queue, process at lower priority).
- Raise FreeRTOS timer-service task priority via `configTIMER_TASK_PRIORITY` (sdkconfig).
- Confirm NimBLE host task pinning — `CONFIG_BT_NIMBLE_PINNED_TO_CORE` should be 0 or 1 (not unpinned).
### 4. Local NULL-guard patch (last resort, masks the bug)
If upgrade is blocked and starvation reduction isn't enough, patch the vendored source:
```c
// npl_os_freertos.c:1740
} else {
if (co->ev.fn) co->ev.fn(&co->ev);
}
```
This silences the crash but drops the dropped event. The dropped events are likely scan-result deliveries; we'd undercount BLE devices but not crash. Acceptable as a stopgap with a `// TODO: remove when NimBLE upgraded` and a note in this doc.
Caveat: vendored library files in `.pio/libdeps/` get blown away by clean builds. Either copy NimBLE into `firmware/lib/` and pin it (vendored), or use `lib_archive` + a post-install script. Don't ship a build that depends on an unpinned hand-edit.
### 5. Replace BLE stack (high effort)
If 2.x also crashes and starvation reduction doesn't help, switch to the IDF-native bluedroid stack via the Arduino-ESP32 `BLEDevice` API. Larger memory footprint (~30KB more heap) but a different lifecycle model — won't share NimBLE's bug.
---
## Constraints / things not to break
- `firmware/src/reporter.cpp` DNS path with `REPORTER_API_FALLBACK_IP` — production fix, must keep working. Do not regress to `HTTPClient`.
- `BLE_SCANNING_ENABLED 0` is the **shipping default** until this is resolved. Devices in the field rely on this; flip to `1` only in dev builds.
- `firmware/lib/net_guard/net_guard.cpp` `net_guard_pin_dns()` is called both at boot and on every WiFi reconnect; if reorganizing net_guard, preserve both call sites.
- The `ble_scanner` module supports `ble_scanner_pause`/`resume` for OTA — verify it still works after any NimBLE upgrade (`ArduinoOTA.onStart` hook in `main.cpp:248`).
---
## Open questions
- Does the crash repro on a quiet network with no `EV-VSYNC-OVF`? (Determines whether starvation is necessary vs sufficient.)
- Was BLE working in a previous build, and on which NimBLE version? Earliest BLE-related commit traced to is well before today; binary search across firmware commits with BLE enabled would identify the regression boundary if it's our code.
- Does the customer site have an unusual RF environment (very dense BLE) that increases the callout-churn rate, making the race more likely? Worth a `nimble_scan_event` count log during a 60s capture window.
---
## Quick verification once you think it's fixed
1. Set `BLE_SCANNING_ENABLED 1`, rebuild, flash.
2. Run for at least 10 minutes on the customer network — the original crash hit within ~1s, so 10 min with no panic is strong evidence.
3. Confirm a successful hourly cycle: `[CV] entry/exit`, then `[HTTP] POST .../events/batch ... -> 200`, BLE record with non-zero `unique_devices`.
4. Run a second device side-by-side; confirm no cross-device interference.
When done, set `BLE_SCANNING_ENABLED 1` as the default and remove the gate (keep the comment block as institutional memory of the bug).

View File

@@ -0,0 +1,3 @@
#pragma once
// Format: MAJOR.MINOR.PATCH (SemVer) — OTA version compare uses sscanf("%d.%d.%d")
#define FW_VERSION "1.0.1"

View File

@@ -14,12 +14,21 @@ static HString bytes_to_hex(const uint8_t* bytes, size_t len) {
return out;
}
static void hex_to_bytes(const HString& hex, uint8_t* out, size_t out_len) {
if (hex.length() % 2 != 0) return; // malformed — odd-length hex
for (size_t i = 0; i < out_len && (i * 2 + 1) < hex.length(); i++) {
char byte_str[3] = {hex[i*2], hex[i*2+1], 0};
static bool is_hex_char(char c) {
return (c >= '0' && c <= '9') ||
(c >= 'a' && c <= 'f') ||
(c >= 'A' && c <= 'F');
}
static bool hex_to_bytes(const HString& hex, uint8_t* out, size_t out_len) {
if (hex.length() != out_len * 2) return false;
for (size_t i = 0; i < out_len; i++) {
char a = hex[i*2], b = hex[i*2+1];
if (!is_hex_char(a) || !is_hex_char(b)) return false;
char byte_str[3] = {a, b, 0};
out[i] = (uint8_t)strtol(byte_str, nullptr, 16);
}
return true;
}
static bool sha256(const uint8_t* data, size_t len, uint8_t out[32]) {
@@ -52,10 +61,20 @@ HString hmac_sign(const HString& secret_hex,
snprintf(ts_buf, sizeof(ts_buf), "%u", (unsigned)timestamp);
HString message = method + "\n" + path + "\n" + ts_buf + "\n" + body_hash_hex;
// 3. Decode secret from hex
// 3. Decode secret from hex. Reject empty / odd-length / oversized /
// non-hex inputs — flash_device.py validates at provision time, but
// hmac_sign refuses to sign under a malformed key regardless of how it
// ended up in NVS (legacy provisioning, NVS corruption, etc.).
if (secret_hex.length() == 0 ||
secret_hex.length() > 128 ||
secret_hex.length() % 2 != 0) {
return HString{};
}
size_t secret_len = secret_hex.length() / 2;
uint8_t secret[64] = {};
hex_to_bytes(secret_hex, secret, secret_len);
if (!hex_to_bytes(secret_hex, secret, secret_len)) {
return HString{};
}
// 4. HMAC-SHA256(secret, message)
uint8_t hmac_result[32];

View File

@@ -9,8 +9,66 @@ uint32_t net_guard_next_backoff_ms(uint32_t attempt) {
#ifdef ARDUINO
#include "config.h"
#include <WiFi.h>
#include <Arduino.h>
#include <lwip/dns.h>
#include <esp_netif.h>
#include "event_log.h"
// Both lwIP's ip_addr_t and esp-netif's esp_ip_addr_t alias the same on-disk
// layout for IPv4, but the C++ types differ. Take the raw u32 to sidestep it.
static String fmt_v4(uint32_t addr_be) {
if (addr_be == 0) return String("0.0.0.0");
char b[16];
snprintf(b, sizeof(b), "%u.%u.%u.%u",
(unsigned)((addr_be >> 0) & 0xFF),
(unsigned)((addr_be >> 8) & 0xFF),
(unsigned)((addr_be >> 16) & 0xFF),
(unsigned)((addr_be >> 24) & 0xFF));
return String(b);
}
void net_guard_dump_dns(const char* tag) {
const ip_addr_t* d0 = dns_getserver(0);
const ip_addr_t* d1 = dns_getserver(1);
Serial.printf("[DNS] %s lwip: %s , %s\n", tag,
fmt_v4(d0 ? ip_2_ip4(d0)->addr : 0).c_str(),
fmt_v4(d1 ? ip_2_ip4(d1)->addr : 0).c_str());
esp_netif_t* sta = esp_netif_get_handle_from_ifkey("WIFI_STA_DEF");
if (sta) {
esp_netif_dns_info_t main_dns{}, backup_dns{};
esp_netif_get_dns_info(sta, ESP_NETIF_DNS_MAIN, &main_dns);
esp_netif_get_dns_info(sta, ESP_NETIF_DNS_BACKUP, &backup_dns);
Serial.printf("[DNS] %s netif: %s , %s\n", tag,
fmt_v4(main_dns.ip.u_addr.ip4.addr).c_str(),
fmt_v4(backup_dns.ip.u_addr.ip4.addr).c_str());
} else {
Serial.printf("[DNS] %s netif: <no STA handle>\n", tag);
}
}
void net_guard_pin_dns() {
ip_addr_t d1, d2;
IP_ADDR4(&d1, 1, 1, 1, 1);
IP_ADDR4(&d2, 8, 8, 8, 8);
dns_setserver(0, &d1);
dns_setserver(1, &d2);
// Also push through the esp_netif layer. dns_setserver() writes the
// global lwIP table directly; esp_netif_set_dns_info() is what the
// DHCP client itself calls, so writing here prevents the next DHCP
// event from silently overwriting our pin.
esp_netif_t* sta = esp_netif_get_handle_from_ifkey("WIFI_STA_DEF");
if (sta) {
esp_netif_dns_info_t info{};
IP_ADDR4(&info.ip, 1, 1, 1, 1);
esp_netif_set_dns_info(sta, ESP_NETIF_DNS_MAIN, &info);
IP_ADDR4(&info.ip, 8, 8, 8, 8);
esp_netif_set_dns_info(sta, ESP_NETIF_DNS_BACKUP, &info);
}
net_guard_dump_dns("pinned");
}
// Shared with the WiFi event task. 32-bit aligned loads/stores are atomic on
// Xtensa; volatile suffices. Tick re-evaluates every loop iteration, so stale
// reads self-correct within ~200ms.
@@ -23,6 +81,11 @@ static volatile uint32_t s_next_retry_ms = 0;
static void on_wifi_event(WiFiEvent_t event, WiFiEventInfo_t info) {
switch (event) {
case ARDUINO_EVENT_WIFI_STA_GOT_IP:
// Override DHCP-supplied DNS. Some routers return TC=1 for short
// answers (forcing TCP fallback that lwIP can't follow), or hand
// out an unreachable resolver. Pin to public resolvers so
// hostByName() never depends on the local network's DNS quality.
net_guard_pin_dns();
s_up = true;
s_attempts = 0;
s_next_retry_ms = 0;
@@ -63,7 +126,11 @@ extern "C" void net_guard_tick() {
}
if (s_up || s_cfg == nullptr) return;
if (millis() < s_next_retry_ms) return;
// Wrap-safe: signed difference handles the ~49.7-day millis() wrap. The
// device is meant to run for months between reboots, so absolute compare
// (millis() < s_next_retry_ms) would either tight-loop retries across the
// wrap or stall them until millis() climbed back past an old high mark.
if ((int32_t)(millis() - s_next_retry_ms) < 0) return;
if (s_up) return; // re-check after the timing gate — closes GOT_IP-vs-tick race
s_attempts++;
// WiFi.begin() alone re-associates cleanly; a prior WiFi.disconnect() call

View File

@@ -21,4 +21,13 @@ uint8_t net_guard_last_disconnect_reason();
// Non-blocking tick called from loop(); kicks reconnect if due.
extern "C" void net_guard_tick();
// Override DHCP-supplied DNS with public resolvers (1.1.1.1, 8.8.8.8).
// Idempotent; safe to call repeatedly. net_guard re-applies on every GOT_IP,
// but main.cpp must call it once for the boot association (which completes
// before net_guard_start() registers its event handler).
void net_guard_pin_dns();
// Diagnostic: print current DNS table state from both lwIP and esp_netif.
void net_guard_dump_dns(const char* tag);
#endif

View File

@@ -0,0 +1,6 @@
{
"name": "ota_updater",
"build": {
"flags": ["-I$PROJECT_INCLUDE_DIR"]
}
}

View File

@@ -0,0 +1,4 @@
#pragma once
// Auto-generated by tools/gen_signing_key.py — DO NOT EDIT
// ECDSA P-256 public key, uncompressed X9.62 (04 || X || Y)
static const uint8_t kOtaPublicKey[65] = {0x04, 0x1c, 0x92, 0x43, 0x23, 0xe9, 0xac, 0xd1, 0xe8, 0x05, 0x32, 0x49, 0x39, 0x12, 0x95, 0xb2, 0x0a, 0x3e, 0xfb, 0x9d, 0xdf, 0xee, 0xd1, 0x98, 0x87, 0x97, 0xa3, 0xb8, 0xcb, 0x2b, 0xa6, 0x06, 0xe0, 0x83, 0x32, 0x71, 0xd2, 0x5f, 0x80, 0x40, 0x68, 0xcd, 0x00, 0xe5, 0x0e, 0xba, 0x13, 0xf6, 0x97, 0x43, 0x6f, 0xe6, 0x4f, 0xd0, 0x95, 0x53, 0x0e, 0xd7, 0x9a, 0x8a, 0x2e, 0x25, 0x52, 0xb4, 0xaf};

View File

@@ -0,0 +1,319 @@
// firmware/lib/ota_updater/ota_updater.cpp
#include "ota_updater.h"
#include <stdio.h>
#include <string.h>
#include <mbedtls/ecdsa.h>
#include <mbedtls/ecp.h>
#include <mbedtls/bignum.h>
// ── version comparison ─────────────────────────────────────────────────────
bool ota_version_newer(const char* current, const char* remote) {
int ca = 0, cb = 0, cc = 0;
int ra = 0, rb = 0, rc = 0;
if (sscanf(current, "%d.%d.%d", &ca, &cb, &cc) != 3) return false;
if (sscanf(remote, "%d.%d.%d", &ra, &rb, &rc) != 3) return false;
if (ra != ca) return ra > ca;
if (rb != cb) return rb > cb;
return rc > cc;
}
// ── signature verification ─────────────────────────────────────────────────
bool ota_verify_signature_with_key(const uint8_t hash32[32], const uint8_t sig64[64],
const uint8_t pubkey65[65]) {
mbedtls_ecp_group grp;
mbedtls_ecp_point Q;
mbedtls_mpi r, s;
mbedtls_ecp_group_init(&grp);
mbedtls_ecp_point_init(&Q);
mbedtls_mpi_init(&r);
mbedtls_mpi_init(&s);
bool ok = false;
if (mbedtls_ecp_group_load(&grp, MBEDTLS_ECP_DP_SECP256R1) == 0 &&
mbedtls_ecp_point_read_binary(&grp, &Q, pubkey65, 65) == 0 &&
mbedtls_mpi_read_binary(&r, sig64, 32) == 0 &&
mbedtls_mpi_read_binary(&s, sig64 + 32, 32) == 0 &&
mbedtls_ecdsa_verify(&grp, hash32, 32, &Q, &r, &s) == 0) {
ok = true;
}
mbedtls_ecp_group_free(&grp);
mbedtls_ecp_point_free(&Q);
mbedtls_mpi_free(&r);
mbedtls_mpi_free(&s);
return ok;
}
// ── device-only code ───────────────────────────────────────────────────────
#ifndef NATIVE_TEST
#include <Arduino.h>
#include <time.h>
#include <HTTPClient.h>
#include <WiFi.h>
#include <ArduinoJson.h>
#include <esp_ota_ops.h>
#include <mbedtls/sha256.h>
#include <mbedtls/base64.h>
#include "hmac.h"
#include "ota_pubkey.h"
#include "version.h"
bool ota_verify_signature(const uint8_t hash32[32], const uint8_t sig64[64]) {
return ota_verify_signature_with_key(hash32, sig64, kOtaPublicKey);
}
static const char* s_server_base = nullptr;
static const char* s_device_id = nullptr;
static const char* s_hmac_secret = nullptr;
static uint32_t s_interval_ms = 21600000UL; // 6 h default
static uint32_t s_last_check_ms = 0;
void ota_updater_init(const char* server_base, const char* device_id,
const char* hmac_secret, uint32_t check_interval_ms) {
s_server_base = server_base;
s_device_id = device_id;
s_hmac_secret = hmac_secret;
s_interval_ms = check_interval_ms;
s_last_check_ms = 0; // force first check on next call
}
static bool add_hmac_headers(HTTPClient& http, const char* method, const char* path) {
uint32_t ts = (uint32_t)time(nullptr);
if (ts < 1700000000UL) {
Serial.printf("[OTA] Clock not synced (ts=%u) — skipping HMAC sign\n", (unsigned)ts);
return false;
}
String sig = hmac_sign(s_hmac_secret, method, path, ts, "");
if (sig.isEmpty()) {
Serial.println("[OTA] HMAC sign failed");
return false;
}
Serial.printf("[OTA] HMAC headers: device=%s ts=%u sig=%s...\n",
s_device_id, (unsigned)ts, sig.substring(0, 12).c_str());
http.addHeader("X-Device-Id", s_device_id);
http.addHeader("X-Timestamp", String(ts));
http.addHeader("X-Signature", sig);
return true;
}
static bool download_and_flash(const char* fw_url, size_t expected_size,
const uint8_t sig64[64]) {
const esp_partition_t* running = esp_ota_get_running_partition();
const esp_partition_t* target = esp_ota_get_next_update_partition(nullptr);
if (!target) {
Serial.println("[OTA] No update partition found");
return false;
}
Serial.printf("[OTA] running='%s' (off=0x%x sz=0x%x), target='%s' (off=0x%x sz=0x%x)\n",
running ? running->label : "?",
running ? (unsigned)running->address : 0,
running ? (unsigned)running->size : 0,
target->label,
(unsigned)target->address, (unsigned)target->size);
if (expected_size > target->size) {
Serial.printf("[OTA] image (%zu) larger than partition (%u)\n",
expected_size, (unsigned)target->size);
return false;
}
esp_ota_handle_t handle;
esp_err_t er = esp_ota_begin(target, OTA_WITH_SEQUENTIAL_WRITES, &handle);
if (er != ESP_OK) {
Serial.printf("[OTA] esp_ota_begin failed: %s\n", esp_err_to_name(er));
return false;
}
mbedtls_sha256_context sha_ctx;
mbedtls_sha256_init(&sha_ctx);
mbedtls_sha256_starts(&sha_ctx, 0);
HTTPClient http;
http.begin(fw_url);
http.setTimeout(30000);
if (!add_hmac_headers(http, "GET", "/ota/firmware")) {
Serial.println("[OTA] Aborting firmware download: HMAC sign failed");
mbedtls_sha256_free(&sha_ctx);
esp_ota_abort(handle);
return false;
}
Serial.printf("[OTA] downloading firmware: %s\n", fw_url);
int code = http.GET();
Serial.printf("[OTA] firmware response: HTTP %d\n", code);
if (code != HTTP_CODE_OK) {
String body = http.getString();
Serial.printf("[OTA] error body: %s\n", body.c_str());
http.end();
mbedtls_sha256_free(&sha_ctx);
esp_ota_abort(handle);
return false;
}
int content_len = http.getSize();
Serial.printf("[OTA] Content-Length: %d (expected %zu)\n",
content_len, expected_size);
WiFiClient* stream = http.getStreamPtr();
uint8_t buf[4096];
size_t written = 0;
size_t last_log_at = 0;
bool write_failed = false;
uint32_t start_ms = millis();
while (written < expected_size) {
size_t want = min((size_t)sizeof(buf), expected_size - written);
int got = stream->readBytes(buf, want);
if (got <= 0) {
Serial.printf("[OTA] stream ended at %zu/%zu bytes (readBytes=%d)\n",
written, expected_size, got);
break;
}
esp_err_t we = esp_ota_write(handle, buf, (size_t)got);
if (we != ESP_OK) {
Serial.printf("[OTA] esp_ota_write failed at offset %zu: %s\n",
written, esp_err_to_name(we));
write_failed = true;
break;
}
mbedtls_sha256_update(&sha_ctx, buf, (size_t)got);
written += (size_t)got;
if (written - last_log_at >= 131072 || written == expected_size) {
Serial.printf("[OTA] progress: %zu/%zu bytes\n", written, expected_size);
last_log_at = written;
}
}
uint32_t elapsed_ms = millis() - start_ms;
http.end();
Serial.printf("[OTA] download done: %zu bytes in %u ms\n",
written, (unsigned)elapsed_ms);
uint8_t hash[32];
mbedtls_sha256_finish(&sha_ctx, hash);
mbedtls_sha256_free(&sha_ctx);
char hex[65];
for (int i = 0; i < 32; i++) snprintf(hex + i*2, 3, "%02x", hash[i]);
Serial.printf("[OTA] sha256(image)=%s\n", hex);
if (write_failed) {
esp_ota_abort(handle);
return false;
}
if (written != expected_size) {
Serial.printf("[OTA] Download truncated (%zu/%zu bytes)\n", written, expected_size);
esp_ota_abort(handle);
return false;
}
if (!ota_verify_signature_with_key(hash, sig64, kOtaPublicKey)) {
Serial.println("[OTA] SIGNATURE INVALID — staying on current firmware");
esp_ota_abort(handle);
return false;
}
Serial.println("[OTA] signature OK");
esp_err_t end_err = esp_ota_end(handle);
if (end_err != ESP_OK) {
Serial.printf("[OTA] esp_ota_end failed: %s\n", esp_err_to_name(end_err));
return false;
}
esp_err_t boot_err = esp_ota_set_boot_partition(target);
if (boot_err != ESP_OK) {
Serial.printf("[OTA] esp_ota_set_boot_partition failed: %s\n",
esp_err_to_name(boot_err));
return false;
}
Serial.printf("[OTA] boot partition set to '%s' — rebooting in 500 ms\n",
target->label);
Serial.flush();
delay(500);
esp_restart();
return true; // unreachable
}
bool ota_updater_check_and_apply() {
if (!s_server_base || !s_device_id || !s_hmac_secret) {
Serial.println("[OTA] check skipped: updater not initialized");
return false;
}
if (s_last_check_ms != 0 &&
(uint32_t)(millis() - s_last_check_ms) < s_interval_ms) {
return false;
}
s_last_check_ms = millis();
if (WiFi.status() != WL_CONNECTED) {
Serial.printf("[OTA] check skipped: WiFi not connected (status=%d)\n",
WiFi.status());
return false;
}
char check_path[128];
snprintf(check_path, sizeof(check_path), "/ota/check?version=%s", FW_VERSION);
char check_url[256];
snprintf(check_url, sizeof(check_url), "%s%s", s_server_base, check_path);
Serial.printf("[OTA] check → GET %s (fw=%s)\n", check_url, FW_VERSION);
HTTPClient http;
if (!http.begin(check_url)) {
Serial.println("[OTA] http.begin() failed");
return false;
}
if (!add_hmac_headers(http, "GET", check_path)) {
Serial.println("[OTA] Aborting check: HMAC sign failed");
http.end();
return false;
}
int code = http.GET();
Serial.printf("[OTA] check response: HTTP %d\n", code);
if (code != HTTP_CODE_OK) {
String body = http.getString();
Serial.printf("[OTA] error body: %s\n", body.c_str());
http.end();
return false;
}
JsonDocument doc;
DeserializationError err = deserializeJson(doc, http.getStream());
http.end();
if (err) {
Serial.printf("[OTA] JSON parse error: %s\n", err.c_str());
return false;
}
if (!doc["update"].as<bool>()) {
Serial.printf("[OTA] Firmware up to date (%s)\n", FW_VERSION);
return false;
}
const char* remote_ver = doc["version"] | "";
size_t fw_size = doc["size"] | 0;
const char* sig_b64 = doc["sig_b64"] | "";
if (fw_size == 0 || strlen(sig_b64) == 0) {
log_e("[OTA] Invalid update manifest");
return false;
}
log_i("[OTA] Update: %s → %s (%zu bytes)", FW_VERSION, remote_ver, fw_size);
uint8_t sig64[64];
size_t sig_len = 0;
if (mbedtls_base64_decode(sig64, sizeof(sig64), &sig_len,
(const uint8_t*)sig_b64, strlen(sig_b64)) != 0 ||
sig_len != 64) {
log_e("[OTA] Bad signature encoding (len=%zu)", sig_len);
return false;
}
char fw_url[256];
snprintf(fw_url, sizeof(fw_url), "%s/ota/firmware", s_server_base);
return download_and_flash(fw_url, fw_size, sig64);
}
#endif // NATIVE_TEST

View File

@@ -0,0 +1,27 @@
// firmware/lib/ota_updater/ota_updater.h
#pragma once
#include <stdint.h>
#include <stddef.h>
#include <stdbool.h>
// One-time init. Call from setup() after WiFi is ready.
// server_base: e.g. "http://logs.research.bike:8000"
// check_interval_ms: milliseconds between polls (e.g. 6*3600*1000 = 21600000)
void ota_updater_init(const char* server_base,
const char* device_id,
const char* hmac_secret,
uint32_t check_interval_ms);
// Polls server; downloads, verifies, and flashes if newer version available.
// Returns true if update was applied (device reboots before returning false path).
// Safe to call from any task; blocks during download.
bool ota_updater_check_and_apply();
// Exposed for unit testing — pass an arbitrary 65-byte uncompressed P-256 pubkey.
bool ota_version_newer(const char* current, const char* remote);
bool ota_verify_signature_with_key(const uint8_t hash32[32], const uint8_t sig64[64],
const uint8_t pubkey65[65]);
// Production wrapper — uses the compiled-in kOtaPublicKey from ota_pubkey.h.
// Not callable from native tests (requires ota_pubkey.h / device build).
bool ota_verify_signature(const uint8_t hash32[32], const uint8_t sig64[64]);

View File

@@ -21,7 +21,7 @@ upload_flags = --no-stub
lib_deps =
tzapu/WiFiManager@^2.0.17
bblanchon/ArduinoJson@^7.0.0
h2zero/NimBLE-Arduino@^1.4.2
h2zero/NimBLE-Arduino@^2.0.0
espressif/esp32-camera
; Frame-capture build. Strips WiFi/BLE/CV/reporter; streams raw 96x96 frames

View File

@@ -42,8 +42,8 @@ static String sha256_prefix(const String& input) {
return hex;
}
class ScanCallback : public NimBLEAdvertisedDeviceCallbacks {
void onResult(NimBLEAdvertisedDevice* dev) override {
class ScanCallback : public NimBLEScanCallbacks {
void onResult(const NimBLEAdvertisedDevice* dev) override {
String mac = String(dev->getAddress().toString().c_str());
String hash = sha256_prefix(mac);
int rssi = dev->getRSSI();
@@ -51,7 +51,6 @@ class ScanCallback : public NimBLEAdvertisedDeviceCallbacks {
std::lock_guard<std::mutex> lock(s_mutex);
auto it = s_seen.find(hash);
if (it == s_seen.end()) {
Serial.printf("[BLE] new device: %s (rssi %d)\n", hash.c_str(), rssi);
s_seen[hash] = {rssi, 1};
} else {
it->second.rssi_sum += rssi;
@@ -68,16 +67,16 @@ static NimBLEScan* s_scan = nullptr;
void ble_scanner_start() {
NimBLEDevice::init("");
s_scan = NimBLEDevice::getScan();
s_scan->setAdvertisedDeviceCallbacks(&s_callback, true); // true = allow duplicates
s_scan->setScanCallbacks(&s_callback, true); // true = allow duplicates
s_scan->setActiveScan(false); // passive
s_scan->setInterval(100);
s_scan->setWindow(99);
s_scan->setMaxResults(0); // don't store results — callback-only
s_scan->start(0, nullptr, false); // 0 = continuous
s_scan->start(0, false, false); // duration=0 (forever), isContinue=false, restart=false
}
void ble_scanner_pause() { if (s_scan) s_scan->stop(); }
void ble_scanner_resume() { if (s_scan) s_scan->start(0, nullptr, false); }
void ble_scanner_resume() { if (s_scan) s_scan->start(0, false, false); }
void ble_scanner_deinit() {
if (s_scan) s_scan->stop();

View File

@@ -10,8 +10,11 @@
#include "reporter.h"
#include "event_log.h"
#include "net_guard.h"
#include "version.h"
#include "ota_updater.h"
#include <esp_system.h>
#include <esp_task_wdt.h>
#include <esp_ota_ops.h>
// LED on GPIO2 (TimerCamera-F built-in LED) — verify against board schematic
// Factory reset: hold GPIO37 (BOOT button) for 5 seconds
@@ -19,6 +22,15 @@
#define BUTTON_PIN 37
#define FACTORY_RESET_HOLD_MS 5000
// BLE scanning disabled in production until the NimBLE-Arduino 1.4.2 timer
// race is resolved. Symptom: FreeRTOS timer task dispatches an
// os_callout_timer_cb whose callback fn is NULL, causing PC=0 fetch and
// Historical note: NimBLE-Arduino 1.4.2 had an init/fire race in its FreeRTOS
// callout porting layer that caused a NULL-fn dispatch (PC=0,
// InstrFetchProhibited) within ~1s of boot when the camera task starved the
// timer service. Fixed by upgrading to 2.x (see platformio.ini).
#define BLE_SCANNING_ENABLED 1
#define CAM_FPS 5
#define CAM_INTERVAL_MS (1000 / CAM_FPS)
#define REPORT_INTERVAL_S 3600
@@ -67,16 +79,7 @@ static void task_camera(void*) {
if (camera_capture_96(frame)) {
if (xSemaphoreTake(s_cv_mutex, pdMS_TO_TICKS(100)) == pdTRUE) {
CVResult r = cv_process(g_cv, frame, g_cfg.line_offset);
for (const auto& t : g_cv.tracks) {
if (t.id > last_logged_track_id) {
last_logged_track_id = t.id;
Serial.printf("[CV] spawn id=%d y=%.1f\n", t.id, t.spawn_y);
}
}
if (r.fg_count > 0) {
Serial.printf("[F] n=%d y=%d..%d c=%.1f\n",
r.fg_count, r.fg_min_y, r.fg_max_y, r.fg_centroid_y);
}
(void)last_logged_track_id;
if (r.entries_delta) Serial.printf("[CV] entry +%d (total %d) first=%.1f min=%.1f max=%.1f last=%.1f dur=%d\n",
r.entries_delta, g_cv.entries,
r.fire_first_c, r.fire_min_c, r.fire_max_c, r.fire_last_c, r.fire_duration);
@@ -93,6 +96,22 @@ static void task_camera(void*) {
}
}
static void ota_task(void*) {
// Min 10s to avoid pathological fast loops if NVS is corrupted
uint32_t interval_ms = g_cfg.ota_interval_s < 10 ? 10000UL : g_cfg.ota_interval_s * 1000UL;
Serial.printf("[OTA] task started, interval=%u ms\n", (unsigned)interval_ms);
for (;;) {
if (WiFi.isConnected()) {
Serial.println("[OTA] tick: WiFi connected, running check");
ota_updater_check_and_apply();
} else {
Serial.printf("[OTA] tick: WiFi not connected (status=%d), skipping\n",
WiFi.status());
}
vTaskDelay(pdMS_TO_TICKS(interval_ms));
}
}
// Hourly reporter task — runs on core 0
static void task_reporter(void*) {
uint32_t last_report_ts = 0; // 0 = not initialized yet
@@ -119,7 +138,9 @@ static void task_reporter(void*) {
last_report_ts = now;
// Deinit BLE to free ~25KB heap for SSL handshakes
#if BLE_SCANNING_ENABLED
ble_scanner_deinit();
#endif
led_set(true); // on = uploading
CameraHourlyRecord cam_rec;
@@ -129,18 +150,26 @@ static void task_reporter(void*) {
xSemaphoreGive(s_cv_mutex);
} else {
// Failed to acquire — skip this cycle, will report next hour
#if BLE_SCANNING_ENABLED
ble_scanner_reinit();
#endif
led_set(false);
continue;
}
#if !BLE_SCANNING_ENABLED
BLEHourlyRecord ble_rec = {period_start, period_end, 0, 0};
#else
BLEHourlyRecord ble_rec = ble_scanner_collect(period_start, period_end);
#endif
reporter_submit_camera(g_cfg, cam_rec);
reporter_submit_ble(g_cfg, ble_rec);
bool hb_ok = reporter_heartbeat(g_cfg, millis() / 1000, WiFi.RSSI());
#if BLE_SCANNING_ENABLED
ble_scanner_reinit();
#endif
led_set(false);
static uint8_t consecutive_misses = 0;
@@ -165,6 +194,27 @@ void setup() {
pinMode(BUTTON_PIN, INPUT_PULLUP);
led_set(true); // on = booting
// OTA rollback guard: if booted from a freshly-flashed OTA image while the
// bootloader has rollback enabled, the image is PENDING_VERIFY and will be
// rolled back on the next reboot unless we mark it valid. Harmless no-op
// when rollback is disabled. Always log the running partition + state so
// we can see post-OTA boot behavior on serial.
{
const esp_partition_t* running = esp_ota_get_running_partition();
esp_ota_img_states_t state = ESP_OTA_IMG_UNDEFINED;
if (running) {
esp_ota_get_state_partition(running, &state);
Serial.printf("[BOOT] running partition '%s' (off=0x%x) state=%d fw=%s\n",
running->label, (unsigned)running->address,
(int)state, FW_VERSION);
}
if (state == ESP_OTA_IMG_PENDING_VERIFY) {
esp_err_t e = esp_ota_mark_app_valid_cancel_rollback();
Serial.printf("[BOOT] esp_ota_mark_app_valid_cancel_rollback: %s\n",
esp_err_to_name(e));
}
}
event_log_init();
event_log_write(EVT_BOOT, (uint16_t)esp_reset_reason(), 0);
@@ -202,6 +252,11 @@ void setup() {
ESP.restart();
}
// Boot connect happens before net_guard registers its WiFi event handler,
// so the GOT_IP-driven DNS override there won't fire for this association.
// Pin DNS now; net_guard re-applies it on every subsequent reconnect.
net_guard_pin_dns();
net_guard_start(g_cfg);
led_set(false); // off = connected
@@ -220,17 +275,29 @@ void setup() {
reporter_init();
#if BLE_SCANNING_ENABLED
ble_scanner_start();
#endif
// OTA update support
ArduinoOTA.setHostname(g_cfg.device_id.c_str());
#if !BLE_SCANNING_ENABLED
ArduinoOTA.onStart([]() { });
#else
ArduinoOTA.onStart([]() { ble_scanner_pause(); });
#endif
ArduinoOTA.onEnd([]() {
#if BLE_SCANNING_ENABLED
ble_scanner_resume();
#endif
event_log_write(EVT_REBOOT, REBOOT_OTA, 0);
ESP.restart();
});
#if !BLE_SCANNING_ENABLED
ArduinoOTA.onError([](ota_error_t e) { });
#else
ArduinoOTA.onError([](ota_error_t e) { ble_scanner_resume(); });
#endif
ArduinoOTA.begin();
s_cv_mutex = xSemaphoreCreateMutex();
@@ -242,6 +309,16 @@ void setup() {
xTaskCreatePinnedToCore(task_camera, "cam", 8192, nullptr, 2, nullptr, 1);
xTaskCreatePinnedToCore(task_reporter, "rep", 8192, nullptr, 1, nullptr, 0);
// static: ota_updater stores raw pointer; must outlive setup()
static String s_ota_base = String("http://") + REPORTER_API_HOST_NAME + ":" + REPORTER_API_PORT;
ota_updater_init(
s_ota_base.c_str(),
g_cfg.device_id.c_str(),
g_cfg.hmac_secret.c_str(),
g_cfg.ota_interval_s < 10 ? 10000UL : g_cfg.ota_interval_s * 1000UL
);
xTaskCreate(ota_task, "ota", 8192, nullptr, 1, nullptr);
}
void loop() {

View File

@@ -6,6 +6,7 @@
#include <HTTPClient.h>
#include <ArduinoJson.h>
#include <WiFi.h>
#include <algorithm>
#include <vector>
#include <time.h>
#include <freertos/semphr.h>
@@ -26,28 +27,107 @@ static uint32_t now_ts() {
return (uint32_t)time(nullptr);
}
// Last successfully resolved IP — used as a warm fallback if a subsequent
// resolution fails. Never takes precedence over a fresh successful resolve.
static IPAddress s_cached_api_ip;
// Resolve the API host. Tries hostByName first; on failure falls back to the
// last good resolution, then to the hardcoded fallback IP. Returns the IP via
// out-param and a label describing where it came from for logging.
static bool resolve_api_ip(IPAddress& out, const char*& source) {
IPAddress ip;
uint32_t r0 = millis();
bool ok = WiFi.hostByName(REPORTER_API_HOST_NAME, ip);
uint32_t elapsed = millis() - r0;
if (ok) {
s_cached_api_ip = ip;
out = ip;
source = "dns";
Serial.printf("[DNS] %s -> %s (%u ms)\n",
REPORTER_API_HOST_NAME, ip.toString().c_str(), (unsigned)elapsed);
return true;
}
Serial.printf("[DNS] %s -> FAIL (%u ms)\n",
REPORTER_API_HOST_NAME, (unsigned)elapsed);
net_guard_dump_dns("on-fail");
net_guard_pin_dns(); // re-assert in case something overwrote the table
if ((uint32_t)s_cached_api_ip != 0) {
out = s_cached_api_ip;
source = "cache";
return true;
}
if (out.fromString(REPORTER_API_FALLBACK_IP)) {
source = "fallback";
return true;
}
return false;
}
// Drains and parses the HTTP response status line. Returns the numeric status
// code, or -1 on read timeout / malformed response.
static int read_http_status(WiFiClient& client, uint32_t timeout_ms) {
uint32_t deadline = millis() + timeout_ms;
while (!client.available() && millis() < deadline) vTaskDelay(pdMS_TO_TICKS(10));
if (!client.available()) return -1;
String line = client.readStringUntil('\n');
line.trim();
// Format: "HTTP/1.1 200 OK"
int sp1 = line.indexOf(' ');
if (sp1 < 0) return -1;
int sp2 = line.indexOf(' ', sp1 + 1);
String code_str = (sp2 > 0) ? line.substring(sp1 + 1, sp2) : line.substring(sp1 + 1);
return code_str.toInt();
}
static bool post_json_once(const DeviceConfig& cfg, const char* path, const String& body) {
uint32_t ts = now_ts();
if (ts < 1700000000UL) return false;
String sig = hmac_sign(cfg.hmac_secret, "POST", path, ts, body);
if (sig.isEmpty()) return false;
HTTPClient http;
String url = String(REPORTER_API_HOST) + path;
http.begin(url);
http.setConnectTimeout(5000); // DNS + TCP connect
http.setTimeout(10000); // per-transaction response timeout
http.addHeader("Content-Type", "application/json");
http.addHeader("X-Device-Id", cfg.device_id);
http.addHeader("X-Timestamp", String(ts));
http.addHeader("X-Signature", sig);
IPAddress ip;
const char* ip_source = "?";
if (!resolve_api_ip(ip, ip_source)) {
Serial.printf("[HTTP] POST %s -> resolve-fail\n", path);
event_log_write(EVT_HTTP_FAIL, event_log_path_hash(path), (uint16_t)-1);
return false;
}
uint32_t t0 = millis();
int code = http.POST(body);
WiFiClient client;
client.setTimeout(10); // seconds — read timeout
if (!client.connect(ip, REPORTER_API_PORT, 5000 /*ms connect timeout*/)) {
uint32_t elapsed = millis() - t0;
Serial.printf("[HTTP] connect %s:%u (%s) -> failed (%u ms)\n",
ip.toString().c_str(), REPORTER_API_PORT, ip_source, (unsigned)elapsed);
event_log_write(EVT_HTTP_FAIL, event_log_path_hash(path), (uint16_t)-1);
return false;
}
// Manual HTTP/1.1 — gives us full control over the Host header so the
// server's vhost routing works even when we connect by IP.
client.printf("POST %s HTTP/1.1\r\n", path);
client.printf("Host: %s\r\n", REPORTER_API_HOST_NAME);
client.print ("Connection: close\r\n");
client.print ("Content-Type: application/json\r\n");
client.printf("Content-Length: %u\r\n", (unsigned)body.length());
client.printf("X-Device-Id: %s\r\n", cfg.device_id.c_str());
client.printf("X-Timestamp: %u\r\n", (unsigned)ts);
client.printf("X-Signature: %s\r\n", sig.c_str());
client.print ("\r\n");
client.print(body);
int code = read_http_status(client, 10000);
// Drain so the server can close cleanly.
while (client.connected() && client.available()) client.read();
client.stop();
uint32_t elapsed = millis() - t0;
http.end();
uint16_t phash = event_log_path_hash(path);
Serial.printf("[HTTP] POST %s -> %d (%u ms)\n", url.c_str(), code, (unsigned)elapsed);
Serial.printf("[HTTP] POST %s%s (%s %s) -> %d (%u ms)\n",
REPORTER_API_HOST_NAME, path, ip_source, ip.toString().c_str(),
code, (unsigned)elapsed);
if (code == 200) {
event_log_write(EVT_HTTP_OK, phash, (uint16_t)((elapsed > 65535) ? 65535 : elapsed));
return true;
@@ -217,7 +297,10 @@ void reporter_flush(const DeviceConfig& cfg) {
String body = build_camera_batch(cfg, cam_snap);
if (post_json(cfg, "/api/v1/camera/events/batch", body)) {
xSemaphoreTake(s_buf_mutex, portMAX_DELAY);
s_cam_buf.clear();
// Erase only the prefix we snapshotted; FIFO append from
// submit_camera during the in-flight POST stays buffered.
size_t n = std::min(cam_snap.size(), s_cam_buf.size());
s_cam_buf.erase(s_cam_buf.begin(), s_cam_buf.begin() + n);
xSemaphoreGive(s_buf_mutex);
}
}
@@ -225,7 +308,8 @@ void reporter_flush(const DeviceConfig& cfg) {
String body = build_ble_batch(cfg, ble_snap);
if (post_json(cfg, "/api/v1/events/batch", body)) {
xSemaphoreTake(s_buf_mutex, portMAX_DELAY);
s_ble_buf.clear();
size_t n = std::min(ble_snap.size(), s_ble_buf.size());
s_ble_buf.erase(s_ble_buf.begin(), s_ble_buf.begin() + n);
xSemaphoreGive(s_buf_mutex);
}
}

View File

@@ -11,8 +11,13 @@ struct CameraHourlyRecord {
int exits;
};
static const int REPORTER_MAX_BUFFER = 24;
static const char* REPORTER_API_HOST = "http://logs.research.bike";
static const int REPORTER_MAX_BUFFER = 24;
static const char* REPORTER_API_HOST_NAME = "logs.research.bike";
static const uint16_t REPORTER_API_PORT = 80;
// Hardcoded fallback used when DNS fails (some customer networks intercept
// :53 with a transparent proxy that mangles responses). Update if the
// server's IP changes — but a successful hostByName() always wins over this.
static const char* REPORTER_API_FALLBACK_IP = "5.78.114.131";
void reporter_init();
void reporter_submit_camera(const DeviceConfig& cfg, const CameraHourlyRecord& rec);

View File

@@ -0,0 +1,44 @@
// firmware/test/test_ota/test_version.cpp
#include <unity.h>
// Pull in the function under test — include .cpp directly for native builds
// so we don't need a separate compilation unit per test.
#define NATIVE_TEST
#include "../../lib/ota_updater/ota_updater.cpp"
void setUp() {}
void tearDown() {}
void test_remote_newer_patch() {
TEST_ASSERT_TRUE(ota_version_newer("1.0.0", "1.0.1"));
}
void test_remote_newer_minor() {
TEST_ASSERT_TRUE(ota_version_newer("1.0.9", "1.1.0"));
}
void test_remote_newer_major() {
TEST_ASSERT_TRUE(ota_version_newer("0.9.9", "1.0.0"));
}
void test_same_version() {
TEST_ASSERT_FALSE(ota_version_newer("1.2.3", "1.2.3"));
}
void test_remote_older() {
TEST_ASSERT_FALSE(ota_version_newer("1.2.3", "1.2.2"));
}
void test_malformed_current() {
TEST_ASSERT_FALSE(ota_version_newer("bad", "1.0.0"));
}
void test_malformed_remote() {
TEST_ASSERT_FALSE(ota_version_newer("1.0.0", "bad"));
}
int main() {
UNITY_BEGIN();
RUN_TEST(test_remote_newer_patch);
RUN_TEST(test_remote_newer_minor);
RUN_TEST(test_remote_newer_major);
RUN_TEST(test_same_version);
RUN_TEST(test_remote_older);
RUN_TEST(test_malformed_current);
RUN_TEST(test_malformed_remote);
return UNITY_END();
}

View File

@@ -0,0 +1,62 @@
// firmware/test/test_ota_sig/test_sig_verify.cpp
#include <unity.h>
#include <string.h>
#define NATIVE_TEST
#include "../../lib/ota_updater/ota_updater.cpp"
// ── Test vectors generated by Python/cryptography (ECDSA P-256) ────────────
static const uint8_t TEST_PUBKEY[65] = {
0x04, 0x96, 0x18, 0x6c, 0x8b, 0xb2, 0xdf, 0xea, 0x3f, 0xe4, 0x75, 0x35, 0x0e, 0x8a, 0x3e, 0x7d,
0x49, 0x7f, 0x56, 0xb5, 0xb4, 0x1a, 0xae, 0x05, 0xa3, 0x10, 0x6f, 0x02, 0x43, 0x84, 0xb3, 0x1c,
0x1f, 0x44, 0xef, 0x08, 0x84, 0x57, 0xca, 0x6e, 0xd8, 0x19, 0x74, 0x10, 0x8d, 0x95, 0xcc, 0x8c,
0x61, 0x89, 0x56, 0xea, 0xbc, 0x0c, 0xa2, 0x54, 0xd7, 0x02, 0xf3, 0x1d, 0x67, 0x7c, 0xa5, 0xba,
0x42
};
static const uint8_t TEST_HASH[32] = {
0x0a, 0x7e, 0x5f, 0x6a, 0x4c, 0x72, 0x11, 0xb7, 0x14, 0x3f, 0x85, 0x59, 0x50, 0x61, 0x8a, 0xa1,
0xab, 0xee, 0x7b, 0x57, 0x08, 0x59, 0x56, 0x09, 0x6d, 0x18, 0xaf, 0x70, 0xe6, 0x6e, 0x6c, 0xa8
};
static const uint8_t TEST_SIG[64] = {
0x4f, 0xff, 0xc3, 0xc6, 0xd5, 0x04, 0x71, 0x37, 0x87, 0x8c, 0xe1, 0xe5, 0x79, 0xef, 0x59, 0x2a,
0x63, 0xde, 0xf6, 0x96, 0x3e, 0x8f, 0x90, 0x2f, 0x46, 0x1f, 0x1b, 0x8a, 0xd5, 0x94, 0xb8, 0x28,
0x80, 0xfa, 0xe4, 0x26, 0x14, 0xbf, 0x91, 0x54, 0xbf, 0xa6, 0x2f, 0x67, 0xf9, 0x97, 0x45, 0x3a,
0x0f, 0xdc, 0x66, 0xcd, 0x21, 0xb8, 0x91, 0xdb, 0xb9, 0xaa, 0x6b, 0x5d, 0x6c, 0xa5, 0xcb, 0x96
};
// ──────────────────────────────────────────────────────────────────────────
void setUp() {}
void tearDown() {}
void test_valid_signature_accepted() {
TEST_ASSERT_TRUE(ota_verify_signature_with_key(TEST_HASH, TEST_SIG, TEST_PUBKEY));
}
void test_corrupted_hash_rejected() {
uint8_t bad_hash[32];
memcpy(bad_hash, TEST_HASH, 32);
bad_hash[0] ^= 0xff;
TEST_ASSERT_FALSE(ota_verify_signature_with_key(bad_hash, TEST_SIG, TEST_PUBKEY));
}
void test_corrupted_signature_rejected() {
uint8_t bad_sig[64];
memcpy(bad_sig, TEST_SIG, 64);
bad_sig[0] ^= 0xff;
TEST_ASSERT_FALSE(ota_verify_signature_with_key(TEST_HASH, bad_sig, TEST_PUBKEY));
}
void test_zero_signature_rejected() {
uint8_t zero_sig[64] = {};
TEST_ASSERT_FALSE(ota_verify_signature_with_key(TEST_HASH, zero_sig, TEST_PUBKEY));
}
int main() {
UNITY_BEGIN();
RUN_TEST(test_valid_signature_accepted);
RUN_TEST(test_corrupted_hash_rejected);
RUN_TEST(test_corrupted_signature_rejected);
RUN_TEST(test_zero_signature_rejected);
return UNITY_END();
}

View File

@@ -11,7 +11,7 @@ import sqlite3
from typing import List
from fastapi import Depends
from pydantic import BaseModel, Field
from pydantic import BaseModel, Field, model_validator
class CameraRecord(BaseModel):
@@ -20,6 +20,12 @@ class CameraRecord(BaseModel):
entries: int = Field(ge=0)
exits: int = Field(ge=0)
@model_validator(mode="after")
def _period_order(self):
if self.period_end <= self.period_start:
raise ValueError("period_end must be strictly greater than period_start")
return self
class CameraEventsRequest(BaseModel):
location_id: str

View File

@@ -70,13 +70,15 @@ def store_heartbeat_diagnostics(
else None
)
cursor = db.cursor()
# COALESCE preserves existing column values when the v1.0.0 payload omits
# diagnostic fields (Pydantic resolves them to None).
cursor.execute(
"""UPDATE heartbeats
SET reset_reason = ?,
heap_free = ?,
heap_min_free = ?,
last_disconnect_code = ?,
recent_events = ?
SET reset_reason = COALESCE(?, reset_reason),
heap_free = COALESCE(?, heap_free),
heap_min_free = COALESCE(?, heap_min_free),
last_disconnect_code = COALESCE(?, last_disconnect_code),
recent_events = COALESCE(?, recent_events)
WHERE device_id = ?""",
(
hb.reset_reason,

120
server/ota_endpoint.py Normal file
View File

@@ -0,0 +1,120 @@
# server/ota_endpoint.py
"""
OTA firmware update endpoints.
Deployment workflow:
1. Generate signing key (one-time):
python tools/gen_signing_key.py
→ secrets/firmware_signing_key.pem (keep offline)
→ firmware/lib/ota_updater/ota_pubkey.h (commit this)
2. Build and deploy a new firmware version:
pio run -e timercam # build
python tools/deploy_firmware.py \\
firmware/.pio/build/timercam/firmware.bin 1.2.3
→ server/firmware/ updated with current.bin, current.sig, manifest.json
3. Bump FW_VERSION in firmware/include/version.h before each release.
4. Register in server main app:
from server.ota_endpoint import router as ota_router
app.include_router(ota_router)
Also uncomment Depends(verify_device_hmac) on both route handlers
and confirm the HMAC format matches hmac.cpp:
method + "\\n" + path + "\\n" + timestamp + "\\n" + sha256_hex(body)
Note: HMAC auth is currently commented out on route handlers — must be wired
before production use. verify_device_hmac must use the same format as hmac.cpp.
"""
import base64
import json
from pathlib import Path
from fastapi import APIRouter
from fastapi.responses import FileResponse
FIRMWARE_DIR = Path(__file__).parent / "firmware"
router = APIRouter(prefix="/ota", tags=["ota"])
class FirmwareNotFoundError(Exception):
pass
def _parse_version(v: str) -> tuple:
"""Parse semver string to comparable tuple; returns (0,0,0) on malformed input."""
try:
parts = v.strip().split(".")
if len(parts) != 3:
return (0, 0, 0)
return tuple(int(x) for x in parts)
except (ValueError, AttributeError):
return (0, 0, 0)
def ota_check_impl(current_version: str, firmware_dir: Path = FIRMWARE_DIR) -> dict:
"""
Compare device's current_version against staged manifest.
Returns {"update": False} when no update is available or manifest is missing.
Returns full update payload when server version is strictly newer.
"""
manifest_path = firmware_dir / "manifest.json"
if not manifest_path.exists():
return {"update": False}
try:
manifest = json.loads(manifest_path.read_text())
version = manifest["version"]
size = manifest["size"]
sha256 = manifest["sha256"]
except (json.JSONDecodeError, KeyError):
return {"update": False}
if _parse_version(version) <= _parse_version(current_version):
return {"update": False}
sig_path = firmware_dir / "current.sig"
if not sig_path.exists():
return {"update": False}
sig_b64 = base64.b64encode(sig_path.read_bytes()).decode()
return {
"update": True,
"version": version,
"size": size,
"sha256": sha256,
"sig_b64": sig_b64,
}
def ota_firmware_impl(firmware_dir: Path = FIRMWARE_DIR) -> bytes:
"""
Return raw firmware binary bytes.
Raises FirmwareNotFoundError if current.bin is absent.
"""
bin_path = firmware_dir / "current.bin"
if not bin_path.exists():
raise FirmwareNotFoundError("No firmware staged")
return bin_path.read_bytes()
@router.get("/check")
async def ota_check(
version: str,
# device_id: str = Depends(verify_device_hmac), # uncomment when wiring into app
):
"""Check whether a firmware update is available for the given device version."""
return ota_check_impl(current_version=version)
@router.get("/firmware")
async def ota_firmware(
# device_id: str = Depends(verify_device_hmac), # uncomment when wiring into app
):
"""Stream the staged firmware binary to the device."""
from fastapi import HTTPException
bin_path = FIRMWARE_DIR / "current.bin"
if not bin_path.exists():
raise HTTPException(status_code=404, detail="No firmware available")
return FileResponse(bin_path, media_type="application/octet-stream")

View File

@@ -98,3 +98,15 @@ def test_negative_counts_rejected():
with pytest.raises(ValidationError):
CameraRecord(period_start=1712000000, period_end=1712003600,
entries=-1, exits=0)
def test_inverted_period_rejected():
"""Pydantic should reject period_end <= period_start."""
from pydantic import ValidationError
from server.camera_endpoint import CameraRecord
with pytest.raises(ValidationError):
CameraRecord(period_start=1712003600, period_end=1712003600,
entries=0, exits=0)
with pytest.raises(ValidationError):
CameraRecord(period_start=1712003600, period_end=1712000000,
entries=0, exits=0)

View File

@@ -0,0 +1,83 @@
# server/test_ota_endpoint.py
import base64
import hashlib
import json
from pathlib import Path
import pytest
from server.ota_endpoint import ota_check_impl, ota_firmware_impl
def write_firmware(firmware_dir: Path, version: str, data: bytes = b"fake_fw") -> None:
sig = bytes(64) # zero sig (not validated server-side)
manifest = {
"version": version,
"size": len(data),
"sha256": hashlib.sha256(data).hexdigest(),
}
(firmware_dir / "current.bin").write_bytes(data)
(firmware_dir / "current.sig").write_bytes(sig)
(firmware_dir / "manifest.json").write_text(json.dumps(manifest))
@pytest.fixture(autouse=True)
def patch_firmware_dir(tmp_path, monkeypatch):
import server.ota_endpoint as mod
monkeypatch.setattr(mod, "FIRMWARE_DIR", tmp_path)
yield tmp_path
def test_check_no_update_same_version(tmp_path):
write_firmware(tmp_path, "1.0.0")
result = ota_check_impl(current_version="1.0.0", firmware_dir=tmp_path)
assert result["update"] is False
def test_check_no_update_newer_local(tmp_path):
write_firmware(tmp_path, "1.0.0")
result = ota_check_impl(current_version="1.1.0", firmware_dir=tmp_path)
assert result["update"] is False
def test_check_update_available(tmp_path):
write_firmware(tmp_path, "1.1.0", data=b"new firmware")
result = ota_check_impl(current_version="1.0.0", firmware_dir=tmp_path)
assert result["update"] is True
assert result["version"] == "1.1.0"
assert result["size"] == len(b"new firmware")
assert "sha256" in result
assert "sig_b64" in result
sig_bytes = base64.b64decode(result["sig_b64"])
assert len(sig_bytes) == 64
def test_check_no_manifest(tmp_path):
result = ota_check_impl(current_version="1.0.0", firmware_dir=tmp_path)
assert result["update"] is False
def test_firmware_endpoint_returns_binary(tmp_path):
fw_data = b"firmware binary content"
write_firmware(tmp_path, "1.1.0", data=fw_data)
content = ota_firmware_impl(firmware_dir=tmp_path)
assert content == fw_data
def test_firmware_endpoint_missing_raises(tmp_path):
import server.ota_endpoint as mod
with pytest.raises(mod.FirmwareNotFoundError):
ota_firmware_impl(firmware_dir=tmp_path)
def test_check_malformed_manifest(tmp_path):
(tmp_path / "manifest.json").write_text("not valid json{{{")
result = ota_check_impl(current_version="1.0.0", firmware_dir=tmp_path)
assert result["update"] is False
def test_check_wrong_arity_version_no_update(tmp_path):
write_firmware(tmp_path, "1.2") # wrong arity server version
result = ota_check_impl(current_version="1.0.0", firmware_dir=tmp_path)
# server "1.2" → (0,0,0) ≤ client (1,0,0) → no update
assert result["update"] is False

0
tools/__init__.py Normal file
View File

43
tools/deploy_firmware.py Normal file
View File

@@ -0,0 +1,43 @@
#!/usr/bin/env python3
"""Sign firmware and stage it for the server OTA endpoint."""
import argparse, hashlib, json
from pathlib import Path
from sign_firmware import sign_firmware
def deploy(firmware_path: Path, key_path: Path,
version: str, output_dir: Path) -> None:
output_dir.mkdir(parents=True, exist_ok=True)
data = firmware_path.read_bytes()
sig = sign_firmware(firmware_path, key_path)
(output_dir / "current.bin").write_bytes(data)
(output_dir / "current.sig").write_bytes(sig)
(output_dir / "manifest.json").write_text(json.dumps({
"version": version,
"size": len(data),
"sha256": hashlib.sha256(data).hexdigest(),
}, indent=2))
print(f"Deployed {firmware_path.name} v{version}{output_dir}/")
def main() -> None:
p = argparse.ArgumentParser(description=__doc__)
p.add_argument("firmware", help="Path to .bin")
p.add_argument("version", help="Version string, e.g. 1.2.3")
p.add_argument("--key", default="secrets/firmware_signing_key.pem")
p.add_argument("--out-dir", default="server/firmware")
args = p.parse_args()
deploy(
firmware_path=Path(args.firmware),
key_path=Path(args.key),
version=args.version,
output_dir=Path(args.out_dir),
)
if __name__ == "__main__":
main()

View File

@@ -15,16 +15,38 @@ Usage:
"""
import argparse
import os
import re
import secrets
import subprocess
import sys
import tempfile
HMAC_SECRET_RE = re.compile(r"^[0-9a-fA-F]{64}$")
NVS_NAMESPACE = "doorcounter"
NVS_PARTITION_OFFSET = "0x9000"
NVS_PARTITION_SIZE = "0x5000" # matches firmware partition table (20KB)
# Characters that would change the field/row structure of the NVS-CSV format
# (key,type,encoding,value). A value containing any of these would either
# split into more fields or add rows, silently provisioning the wrong keys.
_CSV_FORBIDDEN = (",", '"', "\n", "\r")
def _reject_csv_metacharacters(name, value):
"""Exit with an error if value contains a character that would corrupt
the NVS CSV. Used for operator-supplied strings (device id, location id,
WiFi credentials)."""
for c in _CSV_FORBIDDEN:
if c in value:
print(
f"Error: --{name} contains forbidden character {c!r}; "
f"this would corrupt the NVS partition CSV.",
file=sys.stderr,
)
sys.exit(1)
def build_nvs_csv(device_id, location_id, hmac_secret,
wifi_ssid=None, wifi_pass=None, line_offset=50):
@@ -63,6 +85,10 @@ def main():
args = parser.parse_args()
hmac_secret = args.hmac_secret or secrets.token_hex(32)
if not HMAC_SECRET_RE.match(hmac_secret):
print("Error: --hmac-secret must be exactly 64 hex characters (32 bytes)",
file=sys.stderr)
sys.exit(1)
if args.hmac_secret is None:
print(f"Generated HMAC secret: {hmac_secret}")
print(" *** SAVE THIS — you need it to register the device on the server ***")
@@ -71,6 +97,13 @@ def main():
print("Error: --line-offset must be 0-100", file=sys.stderr)
sys.exit(1)
_reject_csv_metacharacters("device-id", args.device_id)
_reject_csv_metacharacters("location-id", args.location_id)
if args.wifi_ssid is not None:
_reject_csv_metacharacters("wifi-ssid", args.wifi_ssid)
if args.wifi_password is not None:
_reject_csv_metacharacters("wifi-password", args.wifi_password)
with tempfile.TemporaryDirectory() as tmp:
csv_path = os.path.join(tmp, "nvs.csv")
bin_path = os.path.join(tmp, "nvs.bin")

57
tools/gen_signing_key.py Normal file
View File

@@ -0,0 +1,57 @@
#!/usr/bin/env python3
"""Generate ECDSA P-256 signing keypair for OTA firmware verification."""
import argparse
import os
from pathlib import Path
from cryptography.hazmat.primitives.asymmetric import ec
from cryptography.hazmat.primitives import serialization
def generate(secrets_dir: Path, header_out: Path) -> None:
secrets_dir.mkdir(parents=True, exist_ok=True)
key = ec.generate_private_key(ec.SECP256R1())
pem = key.private_bytes(
encoding=serialization.Encoding.PEM,
format=serialization.PrivateFormat.PKCS8,
encryption_algorithm=serialization.NoEncryption(),
)
key_path = secrets_dir / "firmware_signing_key.pem"
key_path.write_bytes(pem)
key_path.chmod(0o600)
pub_bytes = key.public_key().public_bytes(
encoding=serialization.Encoding.X962,
format=serialization.PublicFormat.UncompressedPoint,
)
assert len(pub_bytes) == 65 and pub_bytes[0] == 0x04
hex_values = ", ".join(f"0x{b:02x}" for b in pub_bytes)
header = (
"#pragma once\n"
"// Auto-generated by tools/gen_signing_key.py — DO NOT EDIT\n"
"// ECDSA P-256 public key, uncompressed X9.62 (04 || X || Y)\n"
f"static const uint8_t kOtaPublicKey[65] = {{{hex_values}}};\n"
)
header_out.parent.mkdir(parents=True, exist_ok=True)
header_out.write_text(header)
print(f"Private key → {key_path}")
print(f"Public key header → {header_out}")
def main() -> None:
p = argparse.ArgumentParser(description=__doc__)
p.add_argument("--secrets-dir", default="secrets",
help="Directory for private key (default: secrets/)")
p.add_argument("--header-out",
default="firmware/lib/ota_updater/ota_pubkey.h",
help="Path to write the C header")
args = p.parse_args()
generate(Path(args.secrets_dir), Path(args.header_out))
if __name__ == "__main__":
main()

52
tools/sign_firmware.py Normal file
View File

@@ -0,0 +1,52 @@
#!/usr/bin/env python3
"""Sign a firmware binary with ECDSA P-256. Outputs a raw 64-byte r||s .sig file."""
import argparse
import sys
from pathlib import Path
from cryptography.hazmat.primitives.asymmetric import ec
from cryptography.hazmat.primitives import hashes, serialization
from cryptography.hazmat.primitives.asymmetric.utils import decode_dss_signature
def load_private_key(key_path: Path) -> ec.EllipticCurvePrivateKey:
key = serialization.load_pem_private_key(key_path.read_bytes(), password=None)
if not isinstance(key, ec.EllipticCurvePrivateKey):
raise ValueError("Key must be an EC private key")
if not isinstance(key.curve, ec.SECP256R1):
raise ValueError(f"Key must use SECP256R1 curve, got {key.curve.name}")
return key
def sign_firmware(firmware_path: Path, key_path: Path) -> bytes:
key = load_private_key(key_path)
data = firmware_path.read_bytes()
sig_der = key.sign(data, ec.ECDSA(hashes.SHA256()))
r, s = decode_dss_signature(sig_der)
# Returns raw 64-byte r‖s (not DER) — mbedtls_ecdsa_verify expects this layout
return r.to_bytes(32, 'big') + s.to_bytes(32, 'big')
def main() -> None:
p = argparse.ArgumentParser(description=__doc__)
p.add_argument("firmware", help="Path to firmware .bin")
p.add_argument("--key", default="secrets/firmware_signing_key.pem",
help="Path to PEM private key")
p.add_argument("--out", help="Output .sig path (default: firmware.bin.sig)")
args = p.parse_args()
firmware = Path(args.firmware)
key_path = Path(args.key)
out_path = Path(args.out) if args.out else firmware.with_suffix(".bin.sig")
try:
sig = sign_firmware(firmware, key_path)
except (FileNotFoundError, ValueError) as e:
print(f"Error: {e}", file=sys.stderr)
raise SystemExit(1)
out_path.write_bytes(sig)
print(f"Signed {firmware.name}{out_path} ({len(sig)} bytes)")
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,63 @@
import json, hashlib, sys
from pathlib import Path
import pytest
REPO_ROOT = Path(__file__).parent.parent
sys.path.insert(0, str(REPO_ROOT / "tools"))
from cryptography.hazmat.primitives.asymmetric import ec
from cryptography.hazmat.primitives import serialization
from deploy_firmware import deploy
@pytest.fixture()
def key_pem(tmp_path):
key = ec.generate_private_key(ec.SECP256R1())
pem_path = tmp_path / "key.pem"
pem_path.write_bytes(key.private_bytes(
serialization.Encoding.PEM,
serialization.PrivateFormat.PKCS8,
serialization.NoEncryption(),
))
return pem_path
def test_deploy_writes_all_artifacts(tmp_path, key_pem):
firmware = tmp_path / "firmware.bin"
firmware.write_bytes(b"fake firmware" * 200)
out_dir = tmp_path / "server_firmware"
deploy(firmware_path=firmware, key_path=key_pem,
version="1.2.3", output_dir=out_dir)
assert (out_dir / "current.bin").exists()
assert (out_dir / "current.sig").exists()
assert (out_dir / "manifest.json").exists()
def test_manifest_contents(tmp_path, key_pem):
data = b"firmware payload"
firmware = tmp_path / "fw.bin"
firmware.write_bytes(data)
out_dir = tmp_path / "out"
deploy(firmware_path=firmware, key_path=key_pem,
version="2.0.1", output_dir=out_dir)
manifest = json.loads((out_dir / "manifest.json").read_text())
assert manifest["version"] == "2.0.1"
assert manifest["size"] == len(data)
assert manifest["sha256"] == hashlib.sha256(data).hexdigest()
def test_signature_is_64_bytes(tmp_path, key_pem):
firmware = tmp_path / "fw.bin"
firmware.write_bytes(b"fw")
out_dir = tmp_path / "out"
deploy(firmware_path=firmware, key_path=key_pem,
version="1.0.0", output_dir=out_dir)
sig = (out_dir / "current.sig").read_bytes()
assert len(sig) == 64

View File

@@ -0,0 +1,17 @@
import pytest
from tools.flash_device import _reject_csv_metacharacters
def test_clean_value_accepted():
"""A value with no metacharacters should pass without exiting."""
_reject_csv_metacharacters("device-id", "dc-0042")
_reject_csv_metacharacters("location-id", "retailer-123")
_reject_csv_metacharacters("wifi-ssid", "StoreWiFi-2.4GHz")
_reject_csv_metacharacters("wifi-password", "p@ssw0rd!~#$%^&*()_+-=:;<>?/")
@pytest.mark.parametrize("bad", ["Home,Network", 'pa"ss', "ssid\nfoo", "name\rbar"])
def test_metacharacter_rejected(bad):
with pytest.raises(SystemExit):
_reject_csv_metacharacters("wifi-ssid", bad)

View File

@@ -0,0 +1,49 @@
import os, subprocess, sys, tempfile
from pathlib import Path
REPO_ROOT = Path(__file__).parent.parent
def run_gen(secrets_dir, header_path):
env = os.environ.copy()
result = subprocess.run(
[sys.executable, str(REPO_ROOT / "tools/gen_signing_key.py"),
"--secrets-dir", str(secrets_dir),
"--header-out", str(header_path)],
capture_output=True, text=True, env=env
)
assert result.returncode == 0, result.stderr
return result
def test_private_key_created():
with tempfile.TemporaryDirectory() as d:
header = Path(d) / "ota_pubkey.h"
run_gen(d, header)
pem = Path(d) / "firmware_signing_key.pem"
assert pem.exists()
content = pem.read_text()
assert "BEGIN PRIVATE KEY" in content
def test_header_created():
with tempfile.TemporaryDirectory() as d:
header = Path(d) / "ota_pubkey.h"
run_gen(d, header)
assert header.exists()
content = header.read_text()
assert "kOtaPublicKey" in content
assert "0x04" in content # uncompressed point prefix
assert "[65]" in content
def test_public_key_is_valid_p256_point():
from cryptography.hazmat.primitives.asymmetric import ec
from cryptography.hazmat.primitives import serialization
with tempfile.TemporaryDirectory() as d:
header = Path(d) / "ota_pubkey.h"
run_gen(d, header)
pem = (Path(d) / "firmware_signing_key.pem").read_bytes()
priv = serialization.load_pem_private_key(pem, password=None)
pub_bytes = priv.public_key().public_bytes(
serialization.Encoding.X962,
serialization.PublicFormat.UncompressedPoint,
)
assert len(pub_bytes) == 65
assert pub_bytes[0] == 0x04

View File

@@ -0,0 +1,64 @@
import sys
from pathlib import Path
import pytest
from cryptography.exceptions import InvalidSignature
from cryptography.hazmat.primitives.asymmetric import ec
from cryptography.hazmat.primitives import hashes, serialization
from cryptography.hazmat.primitives.asymmetric.utils import decode_dss_signature
REPO_ROOT = Path(__file__).parent.parent
sys.path.insert(0, str(REPO_ROOT / "tools"))
from sign_firmware import sign_firmware, load_private_key
@pytest.fixture()
def keypair(tmp_path):
key = ec.generate_private_key(ec.SECP256R1())
pem_path = tmp_path / "key.pem"
pem_path.write_bytes(key.private_bytes(
serialization.Encoding.PEM,
serialization.PrivateFormat.PKCS8,
serialization.NoEncryption(),
))
return key, pem_path
def test_signature_is_64_bytes(keypair, tmp_path):
key, key_path = keypair
firmware = tmp_path / "fw.bin"
firmware.write_bytes(b"fake firmware data" * 100)
sig = sign_firmware(firmware, key_path)
assert len(sig) == 64
def test_signature_verifies(keypair, tmp_path):
key, key_path = keypair
data = b"test firmware payload"
firmware = tmp_path / "fw.bin"
firmware.write_bytes(data)
sig_raw = sign_firmware(firmware, key_path)
# Convert raw r||s back to DER for cryptography lib verify
r = int.from_bytes(sig_raw[:32], 'big')
s = int.from_bytes(sig_raw[32:], 'big')
from cryptography.hazmat.primitives.asymmetric.utils import encode_dss_signature
sig_der = encode_dss_signature(r, s)
key.public_key().verify(sig_der, data, ec.ECDSA(hashes.SHA256()))
def test_wrong_key_fails_verification(keypair, tmp_path):
key, key_path = keypair
firmware = tmp_path / "fw.bin"
firmware.write_bytes(b"firmware")
sig_raw = sign_firmware(firmware, key_path)
other_key = ec.generate_private_key(ec.SECP256R1())
r = int.from_bytes(sig_raw[:32], 'big')
s = int.from_bytes(sig_raw[32:], 'big')
from cryptography.hazmat.primitives.asymmetric.utils import encode_dss_signature
sig_der = encode_dss_signature(r, s)
with pytest.raises(InvalidSignature):
other_key.public_key().verify(sig_der, b"firmware", ec.ECDSA(hashes.SHA256()))