This commit is contained in:
kassam 2026-04-21 10:33:13 +04:00
parent f15bb48935
commit f9ad4aedd5
2 changed files with 505 additions and 3 deletions

187
README.md
View File

@ -2,7 +2,8 @@
Real-time PPE compliance (helmet, vest, boots, gloves, goggles) using YOLO11n.
Runs on a Unitree G1 humanoid with an Intel RealSense D435I. On UNSAFE the
robot speaks a warning and plays the `reject` arm action.
robot announces the missing gear via the onboard TTS and plays a recorded
arm-motion trajectory from `assets/motions/`.
## Layout
@ -18,6 +19,9 @@ Saqr/
│ ├── saqr-bridge.service # systemd unit (wraps start_saqr.sh)
│ └── deploy.sh # push code dev machine → robot
├── config/ # logging.json, core_config.json, robot_config.json
├── assets/
│ ├── audio/ # pre-recorded WAV clips (16kHz mono int16, per category)
│ └── motions/ # teach-and-replay arm trajectories (*.jsonl)
├── data/ # dataset/, models/ (gitignored)
├── runtime/ # captures/, runs/ (gitignored)
├── logs/ # per-module .log files (gitignored)
@ -46,8 +50,91 @@ Then on the wireless remote:
- **R2 + X** → start detection
- **R2 + Y** → stop detection
See [docs/DEPLOY.md](docs/DEPLOY.md) for first-time deploy and
[docs/start.md](docs/start.md) for the systemd workflow.
See [docs/DEPLOY.md](docs/DEPLOY.md) for first-time deploy,
[docs/start.md](docs/start.md) for the systemd workflow, and
[docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) for the component map.
## Usage
**One-time setup on the robot (already done, documented for rebuilds):**
```bash
# 1. Clone into ~/Saqr
git clone <repo> ~/Saqr && cd ~/Saqr
# 2. Create the conda env and install in editable mode
conda create -n saqr python=3.8 -y
conda activate saqr
pip install -e .
pip install "cyclonedds==0.10.2" # exact version required by unitree_sdk2py
# 3. Drop the model at data/models/saqr_best.pt
# (trained on the dev machine; copy via scp or scripts/deploy.sh)
# 4. Install the systemd unit for auto-start on boot
sudo cp scripts/saqr-bridge.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable saqr-bridge
```
**Normal day-to-day operation on the robot:**
```bash
# Foreground (see logs live, Ctrl-C to stop)
CONDA_ENV=saqr ~/Saqr/scripts/start_saqr.sh
# Background via systemd (survives reboots)
sudo systemctl start saqr-bridge
journalctl -u saqr-bridge -f # follow logs
sudo systemctl stop saqr-bridge
```
**Controlling detection from the wireless remote:**
1. Start the bridge (above). It announces "Saqr is running. Press R2 plus X
to start."
2. Press **R2 + X** → the bridge spawns `apps.saqr_cli` as a subprocess,
opens the RealSense camera, loads the model, and begins publishing
events on stdout.
3. Walk into frame. On UNSAFE (missing helmet or vest) the robot speaks
what's missing and plays the recorded arm gesture.
4. Press **R2 + Y** → the subprocess is SIGINT'd, the robot announces
"Saqr deactivated.", and the bridge goes back to idle.
5. You can start/stop repeatedly without restarting the bridge.
**Dev-machine workflow (push code + weights to the robot):**
```bash
# From the dev machine, from the Saqr/ dir
scripts/deploy.sh # rsync + pip install -e . in the robot's conda env
# Or a single file:
scp robot/robot_controller.py unitree@192.168.123.164:~/Saqr/robot/
ssh unitree@192.168.123.164 'sudo systemctl restart saqr-bridge'
```
**Streaming view (optional, for debugging from a browser):**
With the subprocess running, open `http://<robot-ip>:8080` — MJPEG stream
of the annotated frames. Port and interface are set in
[config/core_config.json](config/core_config.json) under `stream`.
**Log locations:**
- Bridge stdout/stderr: `journalctl -u saqr-bridge` (systemd) or the terminal.
- Saqr CLI: streams to the bridge and appears inline in the same log.
- Per-module files: `logs/*.log` rotate on size; tail with `tail -f
logs/saqr.log`.
- Event CSVs: `runtime/runs/<timestamp>/events.csv`.
**Troubleshooting quick-refs:**
| Symptom | Likely cause | Fix |
|---|---|---|
| "Camera not connected" announcement | RealSense not on USB | Replug, `lsusb \| grep Intel`, rerun |
| R2+X does nothing | LowState not subscribed | Check `BRIDGE … Subscribed to rt/lowstate` in the log; network down? |
| `rc=3104` warnings | Firmware audio/arm contention | Expected — see [Known limitations](#known-limitations) |
| `motion files missing` | `assets/motions/*.jsonl` absent on robot | `scripts/deploy.sh` or scp the directory |
| `ModuleNotFoundError: cyclonedds` | Wrong version | `pip install "cyclonedds==0.10.2"` in the `saqr` env |
## Deploy
@ -96,3 +183,97 @@ part of the normal run flow:
python -m apps.train_cli --epochs 100 --batch 16
# best weights land at data/models/saqr_best.pt; deploy with scripts/deploy.sh
```
## Audio pipeline
On UNSAFE, [robot/robot_controller.py](robot/robot_controller.py) queues one
announcement to a worker thread which routes it per `tts.mode`:
- `tts_only` → firmware `TtsMaker(text, speaker_id=2)` (current default).
- `recorded_only` → WAV lookup in `assets/audio/<category>/<key>.wav`; no
fallback to TtsMaker.
- `recorded_or_tts` → WAV if available, else TtsMaker.
Each TtsMaker call is preceded by `AUDIO_STOP_PLAY` + 300 ms settle (the
reset pattern from `G1_Lootah/Audio_Recorder/voice_note.txt`) and retried
once on `rc != 0`. Audio RPC timeout is capped at 3 s so a stuck firmware
call fails fast instead of blocking the worker for the bridge-wide 10 s.
The Python SDK's `TtsMaker` has a broken `tts_index` counter that never
increments — the controller bypasses it and calls the underlying RPC with
a real index.
## Known limitations
**G1 firmware serialises the audio channel and `rt/arm_sdk`.** When the
low-level arm replayer is publishing joint commands at 60 Hz, every audio
RPC (`TtsMaker`, `PlayStream`, even `AUDIO_STOP_PLAY`) blocks until the
RPC timeout and returns `rc=3104`. Measured audio dropout during parallel
arm+audio runs: ~50 % of alerts. This is a firmware-level constraint on
the voice service, not anything the client can tune its way out of —
verified by testing retries, `SetVolume` warmups, and `AUDIO_STOP_PLAY`
resets, all of which hang under the same condition.
The reliable path is **serial**: `speak()``wait_for_audio_done()`
`reject()` (swap `bridge.py handle_line` if that's the behaviour you
want). Parallel is kept behind `audio_lead_s` for latency-sensitive
demos where occasional dropped audio is acceptable.
## Current status (2026-04-21)
Snapshot of what is and isn't working, so whoever picks this up next
doesn't have to rediscover it.
**Working:**
- End-to-end flow: RealSense → YOLO11n → tracker → compliance →
UNSAFE/SAFE/PARTIAL events → robot voice + recorded arm motion.
- Wireless-remote gated start/stop (R2+X / R2+Y) via `rt/lowstate`.
- Recorded arm replay from [assets/motions/adnoc1.jsonl](assets/motions/)
via [robot/arm_replay.py](robot/arm_replay.py) — custom teach-and-replay
trajectory instead of the canned `reject` action.
- Recorded WAV library under [assets/audio/](assets/audio/) with a
`recorded_or_tts` mode that can be toggled via `tts.mode` in config.
- TtsMaker path hardened: `AUDIO_STOP_PLAY` reset + retry + 3 s RPC
timeout + index-bug workaround (see [Audio pipeline](#audio-pipeline)).
- MJPEG stream on port 8080 for remote visual QA.
- Fully config-driven — no hard-coded IPs, paths, or thresholds in code
beyond `SAQR_ROOT` fallback and firmware API ids.
- systemd unit + `start_saqr.sh` = single entry point, survives reboots.
**Known broken / in tension:**
- **Audio ↔ arm firmware contention** (see [Known limitations](#known-limitations)).
Current default is parallel via `audio_lead_s=0.3` — ~50 % of UNSAFE
alerts drop their audio when arm sdk is publishing. Mitigations tested
and **none of them fix the root cause** (it's a firmware limitation).
The deterministic fix is to serialise audio-then-arm; pending decision.
- **Track-ID churn** — each re-acquisition of the same person generates a
new track id, which bypasses the per-`(track_id, status)` cooldown and
triggers a fresh UNSAFE alert. Not yet quantified, but visible in
rapid-fire `ID 0001 → 0002 → 0003` sequences for what appears to be
one worker.
**Configuration right now:**
- `tts.mode = "tts_only"` (firmware TTS, no recorded WAVs used).
- `motion.enabled = true`, `motion.unsafe_file = "adnoc1.jsonl"` — the
custom 7-second trajectory with 60 frames of smooth move-in and 180
frames of smooth return-to-home.
- `bridge.audio_lead_s = 0.3` — parallel mode.
- `bridge.cooldown = 8.0` — per (track_id, status).
**Next-session picks (prioritised):**
1. Decide parallel-vs-serial for audio+arm and wire it in. Serial is
safer for safety-critical alerts.
2. Stabilise track-IDs or make the cooldown per-identity instead of
per-track so one worker doesn't trigger N alerts.
3. Add a boot-time self-check (camera reachable, DDS iface up, model
file present, motion JSONL parseable) so failures surface before the
first event rather than during it.
4. Rotate `logs/` — currently grows unbounded.
5. Add a `logs/events.jsonl` audit trail (one structured line per
UNSAFE/SAFE event) for post-hoc compliance review.
See [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) for the full component
map and data flow.

321
docs/ARCHITECTURE.md Normal file
View File

@ -0,0 +1,321 @@
# Saqr Architecture
This document describes how Saqr is built: the components, how they
communicate, how data flows through the system, the concurrency model,
and the constraints imposed by the G1 firmware. It's meant as a
"start-here" for anyone extending the code or debugging an incident.
## 1. High-level system map
```
┌─────────────────────────── Dev machine ────────────────────────────┐
│ │
│ apps.train_cli ──(saqr_best.pt)──► data/models/ │
│ gui/ (optional: PySide6 desktop QA tool, not used in production) │
│ │
│ scripts/deploy.sh ──(rsync + pip install -e .)──► │
│ │
└────────────────────────────────┬───────────────────────────────────┘
robot_ip (eth0 / 192.168.123.164)
┌────────────────────────────────▼───────────────────────────────────┐
│ Unitree G1 (Jetson Orin NX) │
│ │
│ scripts/start_saqr.sh │
│ │ │
│ ▼ │
│ ┌─────────────── robot/bridge.py (main process) ─────────────┐ │
│ │ │ │
│ │ RobotController ─── G1ArmActionClient ─┐ │ │
│ │ │ ├── DDS (eth0) ─┐ │ │
│ │ ├── AudioClient ────────────────┤ │ │ │
│ │ ├── LowStateHub ← rt/lowstate ─┤ │ │ │
│ │ ├── ArmReplayer ── rt/arm_sdk ─┘ │ │ │
│ │ └── TtsWorker thread (audio queue) │ │ │
│ │ │ │ │
│ │ TriggerLoop thread ── R2+X / R2+Y polling ─────────────┘ │ │
│ │ │ │
│ │ StdoutReader thread ── parses event lines from subprocess ─┤ │
│ └──────────────────────────────────────────────────────────────┘ │
│ │ │
│ subprocess.Popen │
│ │ │
│ ┌────────────────── apps/saqr_cli.py ──────────────────────┐ │
│ │ │ │
│ │ Camera (RealSense) → YOLO11n → Tracker → Compliance │ │
│ │ │ │ │
│ │ └── emit_event() ──► stdout │
│ │ MJPEG stream on :8080 (optional) │ │
│ └────────────────────────────────────────────────────────────┘ │
│ │
└────────────────────────────────────────────────────────────────────┘
Wireless remote (R2+X / R2+Y)
G1 speaker (audio out)
G1 arms (motion out)
```
## 2. Components
### 2.1 `core/` — detection & reasoning (shared library)
Pure-Python, no Unitree SDK dependency. Used by `apps/` and (indirectly)
`robot/bridge.py` via subprocess.
- `core/camera.py` — RealSense / webcam / video-file source. Yields
`(frame, depth)` pairs.
- `core/model.py` — YOLO11n wrapper, class filtering, confidence
thresholding, batched inference.
- `core/tracker.py` — ByteTrack-style persistent IDs across frames.
- `core/compliance.py` — binary SAFE / UNSAFE classifier. Reads
`REQUIRED_PPE` from config; `split_wearing_missing()` handles the
`no-X` class convention.
- `core/events.py` — event emission with structured format:
`ID NNNN | EVENT | STATUS | wearing: … | missing: … | unknown: …`
- `core/stationary.py` — "is this person standing still long enough to
warrant an alert?" heuristic (pixel-level centroid stability).
- `core/drawing.py` — overlay boxes + labels on frames for the MJPEG
stream.
- `core/paths.py` — resolves `PROJECT_ROOT` from the
`SAQR_ROOT` env var or by walking up from `__file__`.
### 2.2 `apps/` — executable entry points
- `apps/saqr_cli.py` — the detection subprocess launched by the bridge.
Reads config, opens the camera, runs the pipeline, prints events on
stdout, serves MJPEG on `:8080`.
- `apps/detect_cli.py` — stand-alone detector for testing on clips.
- `apps/train_cli.py` — dev-machine training wrapper around
`ultralytics`.
- `apps/manager_cli.py` — dataset tooling (class rebalancing, splits).
- `apps/view_stream.py` — OpenCV viewer attached to the MJPEG stream.
### 2.3 `robot/` — G1 integration (only runs on the robot)
- `robot/bridge.py` — orchestrator. Owns `RobotController`, spawns
`apps.saqr_cli` as a subprocess, parses its stdout, routes UNSAFE/SAFE
events into robot actions. Also the systemd entry point.
- `robot/robot_controller.py` — owns all the G1 clients: arm action,
audio, lowstate. Runs a `TtsWorker` background thread with a freshness
policy (new announcement cancels and replaces the in-flight one).
- `robot/arm_replay.py` — low-level `rt/arm_sdk` publisher that plays a
recorded JSONL trajectory at 60 Hz. Used when `motion.enabled=true`.
- `robot/audio_player.py``PlayStream`-based WAV player, with chunk
retries for firmware 3104 and a cancel flag. Used when
`tts.mode="recorded_only"` or `"recorded_or_tts"`.
- `robot/controller.py``LowStateHub` for decoding the wireless remote
(R2+X / R2+Y combos) from `rt/lowstate`.
### 2.4 `utils/` — shared helpers
- `utils/config.py``load_config(name)` — reads `config/<name>_config.json`,
caches, applies env-var overrides.
- `utils/logger.py` — rotating file logger + console mirror.
### 2.5 `config/` — runtime tunables
- `core_config.json` — detection thresholds, tracker params, camera
source, stream port, training hyperparams, compliance rules, capture.
- `robot_config.json` — bridge timing, TTS mode + phrases, arm action
names, recorded-motion filenames, deploy target IP, start_saqr
defaults.
- `logging.json` — log level per module.
Precedence: **env var > config JSON > code fallback**.
### 2.6 `assets/` — runtime artefacts (in-repo)
- `assets/audio/fixed/*.wav` — generic phrases (ready, safe,
unsafe_generic, deactivated, no_camera).
- `assets/audio/unsafe_missing/*.wav` — per missing-PPE combo (helmet,
vest, helmet_vest).
- `assets/motions/adnoc1.jsonl` — the UNSAFE arm gesture (7 s recorded
trajectory).
- `assets/motions/arm_home.jsonl` — the home pose used to smoothly
return the arm at the end of a replay.
All WAVs are 16 kHz mono int16 — required by the G1 audio channel.
Motion JSONL is `{"t": seconds, "q": [29 floats]}` per line, 60 Hz.
## 3. Data pipeline
```
┌──────────────┐ ┌─────────────┐ ┌────────────┐
│ RealSense │────►│ YOLO11n │────►│ Tracker │
│ 640x480@30 │ │ saqr_best │ │ ByteTrack │
└──────────────┘ └─────────────┘ └─────┬──────┘
┌──────────────────┐
│ Compliance │
│ REQUIRED=[hv] │
└────────┬─────────┘
SAFE / UNSAFE / PARTIAL
┌──────────────────┐
│ Stationary check │
│ (centroid drift) │
└────────┬─────────┘
emit_event() ──► stdout
bridge.handle_line() (reader thread)
┌─────────────────────────┼─────────────────────┐
▼ ▼ ▼
RobotController ArmReplayer (log only)
.speak(text,cat,key) .play(motion, home)
│ │
▼ ▼
TtsWorker thread rt/arm_sdk @ 60 Hz
│ │
▼ │
AudioClient.TtsMaker │
│ │
└─────── G1 firmware ◄────┘
Speaker + arms
```
### Event lifecycle (single UNSAFE example)
1. Frame `N`: YOLO detects a person with `no-helmet` and `no-vest` boxes.
2. Tracker assigns (or keeps) `track_id=42`.
3. Compliance → UNSAFE, `wearing=[], missing=[helmet, vest]`.
4. Stationary check: same centroid for ≥15 frames → green-lit.
5. `core.events.emit_event()` prints to stdout:
`[HH:MM:SS.fff] ID 0042 | NEW | UNSAFE | wearing: none | missing: helmet, vest | unknown: gloves, goggles, boots`
6. `bridge._read_stdout` parses, `handle_line` matches `EVENT_RE`.
7. Cooldown check on `(42, UNSAFE)` passes → fire actions.
8. `robot.speak("Please stop. Wear your proper safety equipment. You are missing helmet and vest.", category="unsafe_missing", key="helmet_vest")`.
9. `time.sleep(audio_lead_s=0.3)` — hand the audio worker a head start.
10. `robot.reject(release_after=0.5)``ArmReplayer.play(adnoc1.jsonl, arm_home.jsonl)` — blocks the main thread for ~12 s while publishing joint commands.
11. Audio worker picks up the speak request, resets with
`AUDIO_STOP_PLAY`, calls `TtsMaker`, retries once if `rc!=0`.
12. Bridge returns to idle; next event at `(42, UNSAFE)` is ignored for
8 s per the cooldown.
## 4. Concurrency model
**Three process / thread boundaries:**
| Layer | Process | Thread | Purpose |
|---|---|---|---|
| Bridge | `bridge.py` main | MainThread | orchestrator, arm action calls (blocking) |
| Bridge | `bridge.py` main | TriggerLoop | polls `hub.combo_r2x()` / `combo_r2y()` |
| Bridge | `bridge.py` main | StdoutReader | reads subprocess stdout line-by-line |
| Bridge | `bridge.py` main | TtsWorker | drains audio queue, calls `TtsMaker` |
| Saqr CLI | `apps.saqr_cli` (subprocess) | MainThread | camera + inference + stdout emit |
| Saqr CLI | subprocess | Streaming | MJPEG server thread on `:8080` |
**Synchronisation primitives** (all in `RobotController`):
- `_tts_queue` — bounded deque (`queue_max=4`), drained by TtsWorker.
- `_tts_event` — signalled on every `speak()` to wake the worker.
- `_audio_idle` — set when queue empty AND no dispatch in flight.
Callers can `wait_for_audio_done()` to block until audio drains —
this is the primitive you'd use to serialise audio-before-arm.
- `_tts_worker_stop` — shutdown flag, set in `shutdown_tts()`.
**Freshness policy:** a new `speak()` call clears the queue and
`cancel()`s any in-flight player — newer events always take precedence,
avoiding stale "helmet+vest" audio finishing after the worker already
moved to a "vest only" event.
## 5. Configuration flow
```
config/*.json ──► utils.config.load_config() ──► cached dict
module top-level constants
(e.g. TTS_VOLUME, MOTION_UNSAFE_FILE)
runtime
```
Env-var overrides are applied at `start_saqr.sh` level (for
`CONDA_ENV`, `SAQR_SOURCE`, etc.) or inside `load_config` for specific
keys. The philosophy is: no repo-edits for the common knobs — flip the
JSON and restart.
## 6. Firmware constraints (the big lesson)
Unitree G1 firmware routes the audio subsystem (`TtsMaker`,
`PlayStream`, `AUDIO_STOP_PLAY`, `SetVolume`) and the low-level arm SDK
(`rt/arm_sdk`, published at 60 Hz during motion replay) through the
same onboard MCU / bus. While arm SDK is actively publishing, audio
RPCs block until their timeout and return `rc=3104` ("device busy").
Consequences for this codebase:
1. **Parallel audio + custom motion is unreliable.** Testing showed a
~50 % audio dropout rate when `ArmReplayer` runs concurrently with
`TtsMaker`.
2. **The 10-second default DDS timeout made failures catastrophic.** We
shortened the `AudioClient` timeout to 3 s so hung calls surface in
seconds, not tens of seconds.
3. **Retries, warm-ups, and STOP_PLAY resets cannot fix a firmware
that's busy on the other channel** — they can only recover after
arm sdk releases the bus. We keep all three as robustness hygiene
but they aren't a substitute for serialisation.
4. **The high-level `ExecuteAction('reject')` is a single RPC**, not
60 Hz publishing, so it contends less with audio. If parallel is
required, `motion.enabled=false` plus `audio_lead_s=0.3` is the
closest thing that works — at the cost of the canned gesture.
The deterministic path is to serialise: `speak()`
`wait_for_audio_done()``reject()`. This trades ~6 s of per-event
latency for guaranteed audio delivery — the right trade for a
safety-critical warning.
## 7. Deployment and lifecycle
1. **Install**: `scripts/deploy.sh` rsyncs the repo to
`unitree@<ROBOT_IP>:~/Saqr/` and runs `pip install -e .` inside the
robot's `saqr` conda env.
2. **Enable**: `sudo systemctl enable --now saqr-bridge`.
3. **Runtime**: bridge.py → `TtsMaker("Saqr is running. Press R2 plus
X to start.")` → idle.
4. **Trigger**: operator presses R2+X → subprocess spawns.
5. **Stop**: operator presses R2+Y → subprocess gets SIGINT, bridge
announces "Saqr deactivated.", returns to idle.
6. **Reboot**: systemd auto-restarts the bridge with `Restart=on-failure`.
7. **Logs**: `journalctl -u saqr-bridge` + `logs/*.log` +
`runtime/runs/<timestamp>/events.csv`.
## 8. Plan & open threads
Short list — things that would materially improve reliability if picked
up next:
| Area | What | Why |
|---|---|---|
| Audio + arm | Serialise (audio done → arm) | Eliminates the ~50 % dropout on safety alerts |
| Detection | Per-identity cooldown, not per-track-id | Track-ID churn currently re-triggers the alert for the same person |
| Robustness | Boot-time self-check | Surface camera/DDS/model/motion issues before the first event, not during |
| Observability | Rotate `logs/`, add `logs/events.jsonl` | Audit trail + disk safety |
| UX | Short-phrase fallback when audio drops | The arm still moves even when audio fails; user sees gesture but hears nothing |
| Testing | Mock RobotController for bridge unit tests | Today the bridge is only tested end-to-end on the robot |
## 9. Glossary
- **G1 / Unitree G1** — humanoid platform this runs on.
- **DDS** — Data Distribution Service, Cyclone DDS v0.10.2, the pub/sub
bus used by the Unitree SDK.
- **`rt/lowstate`** — robot state topic; carries wireless remote bits.
- **`rt/arm_sdk`** — 60 Hz arm joint command topic used by the
teach-and-replay path.
- **`AudioClient` / `G1ArmActionClient`** — Unitree SDK service clients
that wrap DDS RPC.
- **`TtsMaker`** — firmware text-to-speech RPC.
- **`PlayStream`** — firmware PCM-chunk playback RPC.
- **`rc=3104`** — firmware-level "device busy" error returned by audio
RPCs when the audio bus is held by another consumer.
- **PPE** — Personal Protective Equipment (helmet, vest, boots, gloves,
goggles in this project's dataset).