feat(ota): harden OTA apply flow + bump firmware to 1.0.1

End-to-end OTA verified on dc-0002 after resolving server-side schema
mismatch (server now emits update/size/sig_b64 alongside existing fields).

Firmware changes:
- Bump FW_VERSION 1.0.0 -> 1.0.1
- Replace log_i/w/e with Serial.printf in ota_updater so output appears
  regardless of CORE_DEBUG_LEVEL (the prior macros were silent in prod)
- Log partition labels/offsets, per-128KB progress, computed sha256,
  HTTP errors with body, esp_ota_* errors by name, Content-Length vs
  expected size
- Check esp_ota_write return value (previously ignored -- silent
  partition corruption on write failure) and abort cleanly on error
- Reject update if expected_size > target partition size
- Serial.flush() + 500ms delay before esp_restart() so the final log
  line escapes the UART
- Boot-time: log running partition label/offset/state + FW_VERSION,
  and call esp_ota_mark_app_valid_cancel_rollback() on PENDING_VERIFY
  to prevent silent rollback after a successful OTA

Docs:
- Rewrite docs/ota-deployment-status.md to reflect resolved state,
  document the schema fix and the .bin/.sig co-deploy invariant
This commit is contained in:
2026-05-14 12:21:52 -07:00
parent 5ec678dfa3
commit d2c2d97fb7
4 changed files with 232 additions and 32 deletions

View File

@@ -0,0 +1,88 @@
# OTA Deployment — Status
## Current state (2026-05-14)
**End-to-end OTA verified working on `dc-0002`.** Device polled `engagement-api-1`, received a signed manifest, downloaded and verified firmware 1.0.1, set the alternate boot partition, rebooted, and came up reporting `fw=1.0.1`.
## What's deployed
- **Branch `feat/pull-ota-code-signing`** merged to `main` (13 commits, 17 new files, 936 LOC).
- **Signing toolchain**: `tools/gen_signing_key.py`, `tools/sign_firmware.py`, `tools/deploy_firmware.py`.
- **Firmware OTA library**: `firmware/lib/ota_updater/`.
- **Signing key**: `secrets/firmware_signing_key.pem` (gitignored). Public key committed at `firmware/lib/ota_updater/ota_pubkey.h`.
- **Live OTA handler**: served by `engagement-api-1` Docker service (source not in this repo). The stub at `server/ota_endpoint.py` is unwired and not the one responding to devices.
- **Configurable poll interval** via NVS key `ota_interval`. Provision with `flash_device.py --ota-interval-seconds N`. Min 10 s, default 21600 (6 h).
## Issues resolved
### 1. HMAC format mismatch (resolved 2026-05-13)
Firmware OTA updater was using `X-HMAC-Signature` header + `millis()`-derived timestamp; the reporter component used `X-Signature` + `time(nullptr)`. Server expected the reporter format. Fixed by aligning the OTA updater to the same canonical scheme as the reporter (`firmware/lib/ota_updater/ota_updater.cpp` `add_hmac_headers`).
### 2. `/ota/check` JSON schema mismatch (resolved 2026-05-14)
Server was emitting `{update_available, sha256, url}` but firmware reads `{update, size, sig_b64}`. Device silently decided "up to date" every poll because `doc["update"]` defaulted to `false`. Fixed server-side: the `/ota/check` response now also includes the fields the firmware needs. Firmware schema remains the source of truth.
### 3. Signed firmware artifact pipeline (resolved 2026-05-14)
Deploy flow now bumps `FW_VERSION` → builds → copies `.pio/build/timercam/firmware.bin` to `firmware-<version>.bin` → signs with `tools/sign_firmware.py` → SCPs both `.bin` and `.bin.sig` to `root@nginx:/root/engagement-api/firmware/`. Server team updates `firmware_releases.sha256` to match the new binary.
**Gotcha:** the `.bin` and `.sig` must always be deployed together. The signature is over the bytes; replacing one without the other puts the server in an inconsistent state and devices will reject the update with `SIGNATURE INVALID`.
## Hardening added this session
### Firmware logging (`firmware/lib/ota_updater/ota_updater.cpp`, `firmware/src/main.cpp`)
The previous `log_i/w/e` macros were silenced by the default `CORE_DEBUG_LEVEL`. Replaced with `Serial.printf` so output appears regardless of log level. Now logs at every step:
- `[OTA] task started, interval=N ms`
- Per-tick WiFi status
- Full check URL + HMAC header preview (device id, ts, sig prefix)
- HTTP response code + error body on non-200
- JSON parse errors
- "Up to date" decision
- Partition labels and offsets (running + target)
- Per-128 KB download progress
- Total bytes + elapsed ms
- Computed sha256 of the downloaded image (compare against server `X-SHA256`)
- Signature verify result
- `esp_ota_end` / `esp_ota_set_boot_partition` errors by name
- 500 ms `Serial.flush()` + `delay()` before `esp_restart()` so the final log line escapes the UART
### Boot-time partition state (`firmware/src/main.cpp`)
Logs `running partition '<label>' (off=0x…) state=N fw=…` at every boot. If `state == ESP_OTA_IMG_PENDING_VERIFY` (3), calls `esp_ota_mark_app_valid_cancel_rollback()` to prevent the bootloader from reverting on the next reboot. Harmless no-op when rollback isn't enabled, but eliminates a class of silent OTA failures.
### `esp_ota_write` return value (`firmware/lib/ota_updater/ota_updater.cpp`)
Previously ignored — a failed write would silently corrupt the new partition and the device would still try to boot from it. Now checked, aborts the OTA cleanly, and logs the failing offset.
### Partition size pre-check
Reject the update before `esp_ota_begin` if `expected_size > target->size`.
## Verifying a deployment
After a server push, watch the device's serial output on the next OTA tick:
```
[OTA] tick: WiFi connected, running check
[OTA] check → GET http://logs.research.bike:80/ota/check?version=X.Y.Z
[OTA] check response: HTTP 200
[OTA] Update: X.Y.Z → A.B.C (N bytes)
[OTA] running='app0' (off=…), target='app1' (off=…)
[OTA] progress: N/N bytes
[OTA] sha256(image)=<hex> ← must match server X-SHA256
[OTA] signature OK
[OTA] boot partition set to 'app1' — rebooting in 500 ms
```
Then on reboot:
```
[BOOT] running partition 'app1' (off=…) state=N fw=A.B.C
```
The `fw=A.B.C` line is the success signal — it reflects the `FW_VERSION` macro baked into the freshly-booted image, not just what the device claims to be running.
## Quick reference
- Plan: `docs/superpowers/plans/2026-05-10-pull-ota-code-signing.md`
- Firmware version: `firmware/include/version.h`
- OTA library: `firmware/lib/ota_updater/`
- HMAC implementation: `firmware/lib/hmac/hmac.cpp`
- Provisioning tool: `tools/flash_device.py`
- Signing tools: `tools/gen_signing_key.py`, `tools/sign_firmware.py`, `tools/deploy_firmware.py`
- Server deploy path: `root@nginx:/root/engagement-api/firmware/` (per server team runbook)

View File

@@ -1,3 +1,3 @@
#pragma once
// Format: MAJOR.MINOR.PATCH (SemVer) — OTA version compare uses sscanf("%d.%d.%d")
#define FW_VERSION "1.0.0"
#define FW_VERSION "1.0.1"

View File

@@ -50,6 +50,7 @@ bool ota_verify_signature_with_key(const uint8_t hash32[32], const uint8_t sig64
#ifndef NATIVE_TEST
#include <Arduino.h>
#include <time.h>
#include <HTTPClient.h>
#include <WiFi.h>
#include <ArduinoJson.h>
@@ -80,117 +81,199 @@ void ota_updater_init(const char* server_base, const char* device_id,
}
static bool add_hmac_headers(HTTPClient& http, const char* method, const char* path) {
uint32_t ts = (uint32_t)(esp_timer_get_time() / 1000000ULL);
String sig = hmac_sign(s_hmac_secret, method, path, ts, "");
if (sig.isEmpty()) {
log_e("[OTA] HMAC sign failed");
uint32_t ts = (uint32_t)time(nullptr);
if (ts < 1700000000UL) {
Serial.printf("[OTA] Clock not synced (ts=%u) — skipping HMAC sign\n", (unsigned)ts);
return false;
}
http.addHeader("X-Device-Id", s_device_id);
http.addHeader("X-Timestamp", String(ts));
http.addHeader("X-HMAC-Signature", sig);
String sig = hmac_sign(s_hmac_secret, method, path, ts, "");
if (sig.isEmpty()) {
Serial.println("[OTA] HMAC sign failed");
return false;
}
Serial.printf("[OTA] HMAC headers: device=%s ts=%u sig=%s...\n",
s_device_id, (unsigned)ts, sig.substring(0, 12).c_str());
http.addHeader("X-Device-Id", s_device_id);
http.addHeader("X-Timestamp", String(ts));
http.addHeader("X-Signature", sig);
return true;
}
static bool download_and_flash(const char* fw_url, size_t expected_size,
const uint8_t sig64[64]) {
const esp_partition_t* target = esp_ota_get_next_update_partition(nullptr);
const esp_partition_t* running = esp_ota_get_running_partition();
const esp_partition_t* target = esp_ota_get_next_update_partition(nullptr);
if (!target) {
log_e("[OTA] No update partition found");
Serial.println("[OTA] No update partition found");
return false;
}
Serial.printf("[OTA] running='%s' (off=0x%x sz=0x%x), target='%s' (off=0x%x sz=0x%x)\n",
running ? running->label : "?",
running ? (unsigned)running->address : 0,
running ? (unsigned)running->size : 0,
target->label,
(unsigned)target->address, (unsigned)target->size);
if (expected_size > target->size) {
Serial.printf("[OTA] image (%zu) larger than partition (%u)\n",
expected_size, (unsigned)target->size);
return false;
}
esp_ota_handle_t handle;
if (esp_ota_begin(target, OTA_WITH_SEQUENTIAL_WRITES, &handle) != ESP_OK) {
log_e("[OTA] esp_ota_begin failed");
esp_err_t er = esp_ota_begin(target, OTA_WITH_SEQUENTIAL_WRITES, &handle);
if (er != ESP_OK) {
Serial.printf("[OTA] esp_ota_begin failed: %s\n", esp_err_to_name(er));
return false;
}
mbedtls_sha256_context sha_ctx;
mbedtls_sha256_init(&sha_ctx);
mbedtls_sha256_starts(&sha_ctx, 0); // 0 = SHA-256
mbedtls_sha256_starts(&sha_ctx, 0);
HTTPClient http;
http.begin(fw_url);
http.setTimeout(30000);
if (!add_hmac_headers(http, "GET", "/ota/firmware")) {
log_e("[OTA] Aborting firmware download: HMAC sign failed");
Serial.println("[OTA] Aborting firmware download: HMAC sign failed");
mbedtls_sha256_free(&sha_ctx);
esp_ota_abort(handle);
return false;
}
Serial.printf("[OTA] downloading firmware: %s\n", fw_url);
int code = http.GET();
Serial.printf("[OTA] firmware response: HTTP %d\n", code);
if (code != HTTP_CODE_OK) {
log_e("[OTA] Firmware fetch failed: HTTP %d", code);
String body = http.getString();
Serial.printf("[OTA] error body: %s\n", body.c_str());
http.end();
mbedtls_sha256_free(&sha_ctx);
esp_ota_abort(handle);
return false;
}
int content_len = http.getSize();
Serial.printf("[OTA] Content-Length: %d (expected %zu)\n",
content_len, expected_size);
WiFiClient* stream = http.getStreamPtr();
uint8_t buf[4096];
size_t written = 0;
size_t written = 0;
size_t last_log_at = 0;
bool write_failed = false;
uint32_t start_ms = millis();
while (written < expected_size) {
size_t want = min((size_t)sizeof(buf), expected_size - written);
int got = stream->readBytes(buf, want);
if (got <= 0) break;
esp_ota_write(handle, buf, (size_t)got);
if (got <= 0) {
Serial.printf("[OTA] stream ended at %zu/%zu bytes (readBytes=%d)\n",
written, expected_size, got);
break;
}
esp_err_t we = esp_ota_write(handle, buf, (size_t)got);
if (we != ESP_OK) {
Serial.printf("[OTA] esp_ota_write failed at offset %zu: %s\n",
written, esp_err_to_name(we));
write_failed = true;
break;
}
mbedtls_sha256_update(&sha_ctx, buf, (size_t)got);
written += (size_t)got;
if (written - last_log_at >= 131072 || written == expected_size) {
Serial.printf("[OTA] progress: %zu/%zu bytes\n", written, expected_size);
last_log_at = written;
}
}
uint32_t elapsed_ms = millis() - start_ms;
http.end();
Serial.printf("[OTA] download done: %zu bytes in %u ms\n",
written, (unsigned)elapsed_ms);
uint8_t hash[32];
mbedtls_sha256_finish(&sha_ctx, hash);
mbedtls_sha256_free(&sha_ctx);
char hex[65];
for (int i = 0; i < 32; i++) snprintf(hex + i*2, 3, "%02x", hash[i]);
Serial.printf("[OTA] sha256(image)=%s\n", hex);
if (write_failed) {
esp_ota_abort(handle);
return false;
}
if (written != expected_size) {
log_e("[OTA] Download truncated (%zu/%zu bytes)", written, expected_size);
Serial.printf("[OTA] Download truncated (%zu/%zu bytes)\n", written, expected_size);
esp_ota_abort(handle);
return false;
}
if (!ota_verify_signature_with_key(hash, sig64, kOtaPublicKey)) {
log_e("[OTA] SIGNATURE INVALID — staying on current firmware");
Serial.println("[OTA] SIGNATURE INVALID — staying on current firmware");
esp_ota_abort(handle);
return false;
}
Serial.println("[OTA] signature OK");
if (esp_ota_end(handle) != ESP_OK ||
esp_ota_set_boot_partition(target) != ESP_OK) {
log_e("[OTA] Commit failed");
esp_err_t end_err = esp_ota_end(handle);
if (end_err != ESP_OK) {
Serial.printf("[OTA] esp_ota_end failed: %s\n", esp_err_to_name(end_err));
return false;
}
esp_err_t boot_err = esp_ota_set_boot_partition(target);
if (boot_err != ESP_OK) {
Serial.printf("[OTA] esp_ota_set_boot_partition failed: %s\n",
esp_err_to_name(boot_err));
return false;
}
log_i("[OTA] Firmware verified and committed — rebooting");
Serial.printf("[OTA] boot partition set to '%s' — rebooting in 500 ms\n",
target->label);
Serial.flush();
delay(500);
esp_restart();
return true; // unreachable
}
bool ota_updater_check_and_apply() {
if (!s_server_base || !s_device_id || !s_hmac_secret) return false;
if (!s_server_base || !s_device_id || !s_hmac_secret) {
Serial.println("[OTA] check skipped: updater not initialized");
return false;
}
if (s_last_check_ms != 0 &&
(uint32_t)(millis() - s_last_check_ms) < s_interval_ms) {
return false;
}
s_last_check_ms = millis();
if (WiFi.status() != WL_CONNECTED) {
Serial.printf("[OTA] check skipped: WiFi not connected (status=%d)\n",
WiFi.status());
return false;
}
char check_path[128];
snprintf(check_path, sizeof(check_path), "/ota/check?version=%s", FW_VERSION);
char check_url[256];
snprintf(check_url, sizeof(check_url), "%s%s", s_server_base, check_path);
Serial.printf("[OTA] check → GET %s (fw=%s)\n", check_url, FW_VERSION);
HTTPClient http;
http.begin(check_url);
if (!http.begin(check_url)) {
Serial.println("[OTA] http.begin() failed");
return false;
}
if (!add_hmac_headers(http, "GET", check_path)) {
log_e("[OTA] Aborting check: HMAC sign failed");
Serial.println("[OTA] Aborting check: HMAC sign failed");
http.end();
return false;
}
int code = http.GET();
Serial.printf("[OTA] check response: HTTP %d\n", code);
if (code != HTTP_CODE_OK) {
log_w("[OTA] Check failed: HTTP %d", code);
String body = http.getString();
Serial.printf("[OTA] error body: %s\n", body.c_str());
http.end();
return false;
}
@@ -199,12 +282,12 @@ bool ota_updater_check_and_apply() {
DeserializationError err = deserializeJson(doc, http.getStream());
http.end();
if (err) {
log_w("[OTA] JSON parse error: %s", err.c_str());
Serial.printf("[OTA] JSON parse error: %s\n", err.c_str());
return false;
}
if (!doc["update"].as<bool>()) {
log_i("[OTA] Firmware up to date (%s)", FW_VERSION);
Serial.printf("[OTA] Firmware up to date (%s)\n", FW_VERSION);
return false;
}

View File

@@ -14,6 +14,7 @@
#include "ota_updater.h"
#include <esp_system.h>
#include <esp_task_wdt.h>
#include <esp_ota_ops.h>
// LED on GPIO2 (TimerCamera-F built-in LED) — verify against board schematic
// Factory reset: hold GPIO37 (BOOT button) for 5 seconds
@@ -96,11 +97,18 @@ static void task_camera(void*) {
}
static void ota_task(void*) {
// Min 10s to avoid pathological fast loops if NVS is corrupted
uint32_t interval_ms = g_cfg.ota_interval_s < 10 ? 10000UL : g_cfg.ota_interval_s * 1000UL;
Serial.printf("[OTA] task started, interval=%u ms\n", (unsigned)interval_ms);
for (;;) {
if (WiFi.isConnected()) {
Serial.println("[OTA] tick: WiFi connected, running check");
ota_updater_check_and_apply();
} else {
Serial.printf("[OTA] tick: WiFi not connected (status=%d), skipping\n",
WiFi.status());
}
vTaskDelay(pdMS_TO_TICKS(21600000UL)); // 6 hours
vTaskDelay(pdMS_TO_TICKS(interval_ms));
}
}
@@ -186,6 +194,27 @@ void setup() {
pinMode(BUTTON_PIN, INPUT_PULLUP);
led_set(true); // on = booting
// OTA rollback guard: if booted from a freshly-flashed OTA image while the
// bootloader has rollback enabled, the image is PENDING_VERIFY and will be
// rolled back on the next reboot unless we mark it valid. Harmless no-op
// when rollback is disabled. Always log the running partition + state so
// we can see post-OTA boot behavior on serial.
{
const esp_partition_t* running = esp_ota_get_running_partition();
esp_ota_img_states_t state = ESP_OTA_IMG_UNDEFINED;
if (running) {
esp_ota_get_state_partition(running, &state);
Serial.printf("[BOOT] running partition '%s' (off=0x%x) state=%d fw=%s\n",
running->label, (unsigned)running->address,
(int)state, FW_VERSION);
}
if (state == ESP_OTA_IMG_PENDING_VERIFY) {
esp_err_t e = esp_ota_mark_app_valid_cancel_rollback();
Serial.printf("[BOOT] esp_ota_mark_app_valid_cancel_rollback: %s\n",
esp_err_to_name(e));
}
}
event_log_init();
event_log_write(EVT_BOOT, (uint16_t)esp_reset_reason(), 0);
@@ -287,7 +316,7 @@ void setup() {
s_ota_base.c_str(),
g_cfg.device_id.c_str(),
g_cfg.hmac_secret.c_str(),
21600000UL
g_cfg.ota_interval_s < 10 ? 10000UL : g_cfg.ota_interval_s * 1000UL
);
xTaskCreate(ota_task, "ota", 8192, nullptr, 1, nullptr);
}