update
This commit is contained in:
parent
f15bb48935
commit
f9ad4aedd5
187
README.md
187
README.md
@ -2,7 +2,8 @@
|
||||
|
||||
Real-time PPE compliance (helmet, vest, boots, gloves, goggles) using YOLO11n.
|
||||
Runs on a Unitree G1 humanoid with an Intel RealSense D435I. On UNSAFE the
|
||||
robot speaks a warning and plays the `reject` arm action.
|
||||
robot announces the missing gear via the onboard TTS and plays a recorded
|
||||
arm-motion trajectory from `assets/motions/`.
|
||||
|
||||
## Layout
|
||||
|
||||
@ -18,6 +19,9 @@ Saqr/
|
||||
│ ├── saqr-bridge.service # systemd unit (wraps start_saqr.sh)
|
||||
│ └── deploy.sh # push code dev machine → robot
|
||||
├── config/ # logging.json, core_config.json, robot_config.json
|
||||
├── assets/
|
||||
│ ├── audio/ # pre-recorded WAV clips (16kHz mono int16, per category)
|
||||
│ └── motions/ # teach-and-replay arm trajectories (*.jsonl)
|
||||
├── data/ # dataset/, models/ (gitignored)
|
||||
├── runtime/ # captures/, runs/ (gitignored)
|
||||
├── logs/ # per-module .log files (gitignored)
|
||||
@ -46,8 +50,91 @@ Then on the wireless remote:
|
||||
- **R2 + X** → start detection
|
||||
- **R2 + Y** → stop detection
|
||||
|
||||
See [docs/DEPLOY.md](docs/DEPLOY.md) for first-time deploy and
|
||||
[docs/start.md](docs/start.md) for the systemd workflow.
|
||||
See [docs/DEPLOY.md](docs/DEPLOY.md) for first-time deploy,
|
||||
[docs/start.md](docs/start.md) for the systemd workflow, and
|
||||
[docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) for the component map.
|
||||
|
||||
## Usage
|
||||
|
||||
**One-time setup on the robot (already done, documented for rebuilds):**
|
||||
|
||||
```bash
|
||||
# 1. Clone into ~/Saqr
|
||||
git clone <repo> ~/Saqr && cd ~/Saqr
|
||||
|
||||
# 2. Create the conda env and install in editable mode
|
||||
conda create -n saqr python=3.8 -y
|
||||
conda activate saqr
|
||||
pip install -e .
|
||||
pip install "cyclonedds==0.10.2" # exact version required by unitree_sdk2py
|
||||
|
||||
# 3. Drop the model at data/models/saqr_best.pt
|
||||
# (trained on the dev machine; copy via scp or scripts/deploy.sh)
|
||||
|
||||
# 4. Install the systemd unit for auto-start on boot
|
||||
sudo cp scripts/saqr-bridge.service /etc/systemd/system/
|
||||
sudo systemctl daemon-reload
|
||||
sudo systemctl enable saqr-bridge
|
||||
```
|
||||
|
||||
**Normal day-to-day operation on the robot:**
|
||||
|
||||
```bash
|
||||
# Foreground (see logs live, Ctrl-C to stop)
|
||||
CONDA_ENV=saqr ~/Saqr/scripts/start_saqr.sh
|
||||
|
||||
# Background via systemd (survives reboots)
|
||||
sudo systemctl start saqr-bridge
|
||||
journalctl -u saqr-bridge -f # follow logs
|
||||
sudo systemctl stop saqr-bridge
|
||||
```
|
||||
|
||||
**Controlling detection from the wireless remote:**
|
||||
|
||||
1. Start the bridge (above). It announces "Saqr is running. Press R2 plus X
|
||||
to start."
|
||||
2. Press **R2 + X** → the bridge spawns `apps.saqr_cli` as a subprocess,
|
||||
opens the RealSense camera, loads the model, and begins publishing
|
||||
events on stdout.
|
||||
3. Walk into frame. On UNSAFE (missing helmet or vest) the robot speaks
|
||||
what's missing and plays the recorded arm gesture.
|
||||
4. Press **R2 + Y** → the subprocess is SIGINT'd, the robot announces
|
||||
"Saqr deactivated.", and the bridge goes back to idle.
|
||||
5. You can start/stop repeatedly without restarting the bridge.
|
||||
|
||||
**Dev-machine workflow (push code + weights to the robot):**
|
||||
|
||||
```bash
|
||||
# From the dev machine, from the Saqr/ dir
|
||||
scripts/deploy.sh # rsync + pip install -e . in the robot's conda env
|
||||
# Or a single file:
|
||||
scp robot/robot_controller.py unitree@192.168.123.164:~/Saqr/robot/
|
||||
ssh unitree@192.168.123.164 'sudo systemctl restart saqr-bridge'
|
||||
```
|
||||
|
||||
**Streaming view (optional, for debugging from a browser):**
|
||||
|
||||
With the subprocess running, open `http://<robot-ip>:8080` — MJPEG stream
|
||||
of the annotated frames. Port and interface are set in
|
||||
[config/core_config.json](config/core_config.json) under `stream`.
|
||||
|
||||
**Log locations:**
|
||||
|
||||
- Bridge stdout/stderr: `journalctl -u saqr-bridge` (systemd) or the terminal.
|
||||
- Saqr CLI: streams to the bridge and appears inline in the same log.
|
||||
- Per-module files: `logs/*.log` rotate on size; tail with `tail -f
|
||||
logs/saqr.log`.
|
||||
- Event CSVs: `runtime/runs/<timestamp>/events.csv`.
|
||||
|
||||
**Troubleshooting quick-refs:**
|
||||
|
||||
| Symptom | Likely cause | Fix |
|
||||
|---|---|---|
|
||||
| "Camera not connected" announcement | RealSense not on USB | Replug, `lsusb \| grep Intel`, rerun |
|
||||
| R2+X does nothing | LowState not subscribed | Check `BRIDGE … Subscribed to rt/lowstate` in the log; network down? |
|
||||
| `rc=3104` warnings | Firmware audio/arm contention | Expected — see [Known limitations](#known-limitations) |
|
||||
| `motion files missing` | `assets/motions/*.jsonl` absent on robot | `scripts/deploy.sh` or scp the directory |
|
||||
| `ModuleNotFoundError: cyclonedds` | Wrong version | `pip install "cyclonedds==0.10.2"` in the `saqr` env |
|
||||
|
||||
## Deploy
|
||||
|
||||
@ -96,3 +183,97 @@ part of the normal run flow:
|
||||
python -m apps.train_cli --epochs 100 --batch 16
|
||||
# best weights land at data/models/saqr_best.pt; deploy with scripts/deploy.sh
|
||||
```
|
||||
|
||||
## Audio pipeline
|
||||
|
||||
On UNSAFE, [robot/robot_controller.py](robot/robot_controller.py) queues one
|
||||
announcement to a worker thread which routes it per `tts.mode`:
|
||||
|
||||
- `tts_only` → firmware `TtsMaker(text, speaker_id=2)` (current default).
|
||||
- `recorded_only` → WAV lookup in `assets/audio/<category>/<key>.wav`; no
|
||||
fallback to TtsMaker.
|
||||
- `recorded_or_tts` → WAV if available, else TtsMaker.
|
||||
|
||||
Each TtsMaker call is preceded by `AUDIO_STOP_PLAY` + 300 ms settle (the
|
||||
reset pattern from `G1_Lootah/Audio_Recorder/voice_note.txt`) and retried
|
||||
once on `rc != 0`. Audio RPC timeout is capped at 3 s so a stuck firmware
|
||||
call fails fast instead of blocking the worker for the bridge-wide 10 s.
|
||||
The Python SDK's `TtsMaker` has a broken `tts_index` counter that never
|
||||
increments — the controller bypasses it and calls the underlying RPC with
|
||||
a real index.
|
||||
|
||||
## Known limitations
|
||||
|
||||
**G1 firmware serialises the audio channel and `rt/arm_sdk`.** When the
|
||||
low-level arm replayer is publishing joint commands at 60 Hz, every audio
|
||||
RPC (`TtsMaker`, `PlayStream`, even `AUDIO_STOP_PLAY`) blocks until the
|
||||
RPC timeout and returns `rc=3104`. Measured audio dropout during parallel
|
||||
arm+audio runs: ~50 % of alerts. This is a firmware-level constraint on
|
||||
the voice service, not anything the client can tune its way out of —
|
||||
verified by testing retries, `SetVolume` warmups, and `AUDIO_STOP_PLAY`
|
||||
resets, all of which hang under the same condition.
|
||||
|
||||
The reliable path is **serial**: `speak()` → `wait_for_audio_done()` →
|
||||
`reject()` (swap `bridge.py handle_line` if that's the behaviour you
|
||||
want). Parallel is kept behind `audio_lead_s` for latency-sensitive
|
||||
demos where occasional dropped audio is acceptable.
|
||||
|
||||
## Current status (2026-04-21)
|
||||
|
||||
Snapshot of what is and isn't working, so whoever picks this up next
|
||||
doesn't have to rediscover it.
|
||||
|
||||
**Working:**
|
||||
|
||||
- End-to-end flow: RealSense → YOLO11n → tracker → compliance →
|
||||
UNSAFE/SAFE/PARTIAL events → robot voice + recorded arm motion.
|
||||
- Wireless-remote gated start/stop (R2+X / R2+Y) via `rt/lowstate`.
|
||||
- Recorded arm replay from [assets/motions/adnoc1.jsonl](assets/motions/)
|
||||
via [robot/arm_replay.py](robot/arm_replay.py) — custom teach-and-replay
|
||||
trajectory instead of the canned `reject` action.
|
||||
- Recorded WAV library under [assets/audio/](assets/audio/) with a
|
||||
`recorded_or_tts` mode that can be toggled via `tts.mode` in config.
|
||||
- TtsMaker path hardened: `AUDIO_STOP_PLAY` reset + retry + 3 s RPC
|
||||
timeout + index-bug workaround (see [Audio pipeline](#audio-pipeline)).
|
||||
- MJPEG stream on port 8080 for remote visual QA.
|
||||
- Fully config-driven — no hard-coded IPs, paths, or thresholds in code
|
||||
beyond `SAQR_ROOT` fallback and firmware API ids.
|
||||
- systemd unit + `start_saqr.sh` = single entry point, survives reboots.
|
||||
|
||||
**Known broken / in tension:**
|
||||
|
||||
- **Audio ↔ arm firmware contention** (see [Known limitations](#known-limitations)).
|
||||
Current default is parallel via `audio_lead_s=0.3` — ~50 % of UNSAFE
|
||||
alerts drop their audio when arm sdk is publishing. Mitigations tested
|
||||
and **none of them fix the root cause** (it's a firmware limitation).
|
||||
The deterministic fix is to serialise audio-then-arm; pending decision.
|
||||
- **Track-ID churn** — each re-acquisition of the same person generates a
|
||||
new track id, which bypasses the per-`(track_id, status)` cooldown and
|
||||
triggers a fresh UNSAFE alert. Not yet quantified, but visible in
|
||||
rapid-fire `ID 0001 → 0002 → 0003` sequences for what appears to be
|
||||
one worker.
|
||||
|
||||
**Configuration right now:**
|
||||
|
||||
- `tts.mode = "tts_only"` (firmware TTS, no recorded WAVs used).
|
||||
- `motion.enabled = true`, `motion.unsafe_file = "adnoc1.jsonl"` — the
|
||||
custom 7-second trajectory with 60 frames of smooth move-in and 180
|
||||
frames of smooth return-to-home.
|
||||
- `bridge.audio_lead_s = 0.3` — parallel mode.
|
||||
- `bridge.cooldown = 8.0` — per (track_id, status).
|
||||
|
||||
**Next-session picks (prioritised):**
|
||||
|
||||
1. Decide parallel-vs-serial for audio+arm and wire it in. Serial is
|
||||
safer for safety-critical alerts.
|
||||
2. Stabilise track-IDs or make the cooldown per-identity instead of
|
||||
per-track so one worker doesn't trigger N alerts.
|
||||
3. Add a boot-time self-check (camera reachable, DDS iface up, model
|
||||
file present, motion JSONL parseable) so failures surface before the
|
||||
first event rather than during it.
|
||||
4. Rotate `logs/` — currently grows unbounded.
|
||||
5. Add a `logs/events.jsonl` audit trail (one structured line per
|
||||
UNSAFE/SAFE event) for post-hoc compliance review.
|
||||
|
||||
See [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) for the full component
|
||||
map and data flow.
|
||||
|
||||
321
docs/ARCHITECTURE.md
Normal file
321
docs/ARCHITECTURE.md
Normal file
@ -0,0 +1,321 @@
|
||||
# Saqr Architecture
|
||||
|
||||
This document describes how Saqr is built: the components, how they
|
||||
communicate, how data flows through the system, the concurrency model,
|
||||
and the constraints imposed by the G1 firmware. It's meant as a
|
||||
"start-here" for anyone extending the code or debugging an incident.
|
||||
|
||||
## 1. High-level system map
|
||||
|
||||
```
|
||||
┌─────────────────────────── Dev machine ────────────────────────────┐
|
||||
│ │
|
||||
│ apps.train_cli ──(saqr_best.pt)──► data/models/ │
|
||||
│ gui/ (optional: PySide6 desktop QA tool, not used in production) │
|
||||
│ │
|
||||
│ scripts/deploy.sh ──(rsync + pip install -e .)──► │
|
||||
│ │
|
||||
└────────────────────────────────┬───────────────────────────────────┘
|
||||
│
|
||||
robot_ip (eth0 / 192.168.123.164)
|
||||
│
|
||||
┌────────────────────────────────▼───────────────────────────────────┐
|
||||
│ Unitree G1 (Jetson Orin NX) │
|
||||
│ │
|
||||
│ scripts/start_saqr.sh │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌─────────────── robot/bridge.py (main process) ─────────────┐ │
|
||||
│ │ │ │
|
||||
│ │ RobotController ─── G1ArmActionClient ─┐ │ │
|
||||
│ │ │ ├── DDS (eth0) ─┐ │ │
|
||||
│ │ ├── AudioClient ────────────────┤ │ │ │
|
||||
│ │ ├── LowStateHub ← rt/lowstate ─┤ │ │ │
|
||||
│ │ ├── ArmReplayer ── rt/arm_sdk ─┘ │ │ │
|
||||
│ │ └── TtsWorker thread (audio queue) │ │ │
|
||||
│ │ │ │ │
|
||||
│ │ TriggerLoop thread ── R2+X / R2+Y polling ─────────────┘ │ │
|
||||
│ │ │ │
|
||||
│ │ StdoutReader thread ── parses event lines from subprocess ─┤ │
|
||||
│ └──────────────────────────────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ subprocess.Popen │
|
||||
│ │ │
|
||||
│ ┌────────────────── apps/saqr_cli.py ──────────────────────┐ │
|
||||
│ │ │ │
|
||||
│ │ Camera (RealSense) → YOLO11n → Tracker → Compliance │ │
|
||||
│ │ │ │ │
|
||||
│ │ └── emit_event() ──► stdout │
|
||||
│ │ MJPEG stream on :8080 (optional) │ │
|
||||
│ └────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
└────────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
Wireless remote (R2+X / R2+Y)
|
||||
G1 speaker (audio out)
|
||||
G1 arms (motion out)
|
||||
```
|
||||
|
||||
## 2. Components
|
||||
|
||||
### 2.1 `core/` — detection & reasoning (shared library)
|
||||
|
||||
Pure-Python, no Unitree SDK dependency. Used by `apps/` and (indirectly)
|
||||
`robot/bridge.py` via subprocess.
|
||||
|
||||
- `core/camera.py` — RealSense / webcam / video-file source. Yields
|
||||
`(frame, depth)` pairs.
|
||||
- `core/model.py` — YOLO11n wrapper, class filtering, confidence
|
||||
thresholding, batched inference.
|
||||
- `core/tracker.py` — ByteTrack-style persistent IDs across frames.
|
||||
- `core/compliance.py` — binary SAFE / UNSAFE classifier. Reads
|
||||
`REQUIRED_PPE` from config; `split_wearing_missing()` handles the
|
||||
`no-X` class convention.
|
||||
- `core/events.py` — event emission with structured format:
|
||||
`ID NNNN | EVENT | STATUS | wearing: … | missing: … | unknown: …`
|
||||
- `core/stationary.py` — "is this person standing still long enough to
|
||||
warrant an alert?" heuristic (pixel-level centroid stability).
|
||||
- `core/drawing.py` — overlay boxes + labels on frames for the MJPEG
|
||||
stream.
|
||||
- `core/paths.py` — resolves `PROJECT_ROOT` from the
|
||||
`SAQR_ROOT` env var or by walking up from `__file__`.
|
||||
|
||||
### 2.2 `apps/` — executable entry points
|
||||
|
||||
- `apps/saqr_cli.py` — the detection subprocess launched by the bridge.
|
||||
Reads config, opens the camera, runs the pipeline, prints events on
|
||||
stdout, serves MJPEG on `:8080`.
|
||||
- `apps/detect_cli.py` — stand-alone detector for testing on clips.
|
||||
- `apps/train_cli.py` — dev-machine training wrapper around
|
||||
`ultralytics`.
|
||||
- `apps/manager_cli.py` — dataset tooling (class rebalancing, splits).
|
||||
- `apps/view_stream.py` — OpenCV viewer attached to the MJPEG stream.
|
||||
|
||||
### 2.3 `robot/` — G1 integration (only runs on the robot)
|
||||
|
||||
- `robot/bridge.py` — orchestrator. Owns `RobotController`, spawns
|
||||
`apps.saqr_cli` as a subprocess, parses its stdout, routes UNSAFE/SAFE
|
||||
events into robot actions. Also the systemd entry point.
|
||||
- `robot/robot_controller.py` — owns all the G1 clients: arm action,
|
||||
audio, lowstate. Runs a `TtsWorker` background thread with a freshness
|
||||
policy (new announcement cancels and replaces the in-flight one).
|
||||
- `robot/arm_replay.py` — low-level `rt/arm_sdk` publisher that plays a
|
||||
recorded JSONL trajectory at 60 Hz. Used when `motion.enabled=true`.
|
||||
- `robot/audio_player.py` — `PlayStream`-based WAV player, with chunk
|
||||
retries for firmware 3104 and a cancel flag. Used when
|
||||
`tts.mode="recorded_only"` or `"recorded_or_tts"`.
|
||||
- `robot/controller.py` — `LowStateHub` for decoding the wireless remote
|
||||
(R2+X / R2+Y combos) from `rt/lowstate`.
|
||||
|
||||
### 2.4 `utils/` — shared helpers
|
||||
|
||||
- `utils/config.py` — `load_config(name)` — reads `config/<name>_config.json`,
|
||||
caches, applies env-var overrides.
|
||||
- `utils/logger.py` — rotating file logger + console mirror.
|
||||
|
||||
### 2.5 `config/` — runtime tunables
|
||||
|
||||
- `core_config.json` — detection thresholds, tracker params, camera
|
||||
source, stream port, training hyperparams, compliance rules, capture.
|
||||
- `robot_config.json` — bridge timing, TTS mode + phrases, arm action
|
||||
names, recorded-motion filenames, deploy target IP, start_saqr
|
||||
defaults.
|
||||
- `logging.json` — log level per module.
|
||||
|
||||
Precedence: **env var > config JSON > code fallback**.
|
||||
|
||||
### 2.6 `assets/` — runtime artefacts (in-repo)
|
||||
|
||||
- `assets/audio/fixed/*.wav` — generic phrases (ready, safe,
|
||||
unsafe_generic, deactivated, no_camera).
|
||||
- `assets/audio/unsafe_missing/*.wav` — per missing-PPE combo (helmet,
|
||||
vest, helmet_vest).
|
||||
- `assets/motions/adnoc1.jsonl` — the UNSAFE arm gesture (7 s recorded
|
||||
trajectory).
|
||||
- `assets/motions/arm_home.jsonl` — the home pose used to smoothly
|
||||
return the arm at the end of a replay.
|
||||
|
||||
All WAVs are 16 kHz mono int16 — required by the G1 audio channel.
|
||||
Motion JSONL is `{"t": seconds, "q": [29 floats]}` per line, 60 Hz.
|
||||
|
||||
## 3. Data pipeline
|
||||
|
||||
```
|
||||
┌──────────────┐ ┌─────────────┐ ┌────────────┐
|
||||
│ RealSense │────►│ YOLO11n │────►│ Tracker │
|
||||
│ 640x480@30 │ │ saqr_best │ │ ByteTrack │
|
||||
└──────────────┘ └─────────────┘ └─────┬──────┘
|
||||
│
|
||||
▼
|
||||
┌──────────────────┐
|
||||
│ Compliance │
|
||||
│ REQUIRED=[hv] │
|
||||
└────────┬─────────┘
|
||||
│
|
||||
SAFE / UNSAFE / PARTIAL
|
||||
│
|
||||
▼
|
||||
┌──────────────────┐
|
||||
│ Stationary check │
|
||||
│ (centroid drift) │
|
||||
└────────┬─────────┘
|
||||
│
|
||||
▼
|
||||
emit_event() ──► stdout
|
||||
│
|
||||
▼
|
||||
bridge.handle_line() (reader thread)
|
||||
│
|
||||
┌─────────────────────────┼─────────────────────┐
|
||||
▼ ▼ ▼
|
||||
RobotController ArmReplayer (log only)
|
||||
.speak(text,cat,key) .play(motion, home)
|
||||
│ │
|
||||
▼ ▼
|
||||
TtsWorker thread rt/arm_sdk @ 60 Hz
|
||||
│ │
|
||||
▼ │
|
||||
AudioClient.TtsMaker │
|
||||
│ │
|
||||
└─────── G1 firmware ◄────┘
|
||||
│
|
||||
▼
|
||||
Speaker + arms
|
||||
```
|
||||
|
||||
### Event lifecycle (single UNSAFE example)
|
||||
|
||||
1. Frame `N`: YOLO detects a person with `no-helmet` and `no-vest` boxes.
|
||||
2. Tracker assigns (or keeps) `track_id=42`.
|
||||
3. Compliance → UNSAFE, `wearing=[], missing=[helmet, vest]`.
|
||||
4. Stationary check: same centroid for ≥15 frames → green-lit.
|
||||
5. `core.events.emit_event()` prints to stdout:
|
||||
`[HH:MM:SS.fff] ID 0042 | NEW | UNSAFE | wearing: none | missing: helmet, vest | unknown: gloves, goggles, boots`
|
||||
6. `bridge._read_stdout` parses, `handle_line` matches `EVENT_RE`.
|
||||
7. Cooldown check on `(42, UNSAFE)` passes → fire actions.
|
||||
8. `robot.speak("Please stop. Wear your proper safety equipment. You are missing helmet and vest.", category="unsafe_missing", key="helmet_vest")`.
|
||||
9. `time.sleep(audio_lead_s=0.3)` — hand the audio worker a head start.
|
||||
10. `robot.reject(release_after=0.5)` → `ArmReplayer.play(adnoc1.jsonl, arm_home.jsonl)` — blocks the main thread for ~12 s while publishing joint commands.
|
||||
11. Audio worker picks up the speak request, resets with
|
||||
`AUDIO_STOP_PLAY`, calls `TtsMaker`, retries once if `rc!=0`.
|
||||
12. Bridge returns to idle; next event at `(42, UNSAFE)` is ignored for
|
||||
8 s per the cooldown.
|
||||
|
||||
## 4. Concurrency model
|
||||
|
||||
**Three process / thread boundaries:**
|
||||
|
||||
| Layer | Process | Thread | Purpose |
|
||||
|---|---|---|---|
|
||||
| Bridge | `bridge.py` main | MainThread | orchestrator, arm action calls (blocking) |
|
||||
| Bridge | `bridge.py` main | TriggerLoop | polls `hub.combo_r2x()` / `combo_r2y()` |
|
||||
| Bridge | `bridge.py` main | StdoutReader | reads subprocess stdout line-by-line |
|
||||
| Bridge | `bridge.py` main | TtsWorker | drains audio queue, calls `TtsMaker` |
|
||||
| Saqr CLI | `apps.saqr_cli` (subprocess) | MainThread | camera + inference + stdout emit |
|
||||
| Saqr CLI | subprocess | Streaming | MJPEG server thread on `:8080` |
|
||||
|
||||
**Synchronisation primitives** (all in `RobotController`):
|
||||
|
||||
- `_tts_queue` — bounded deque (`queue_max=4`), drained by TtsWorker.
|
||||
- `_tts_event` — signalled on every `speak()` to wake the worker.
|
||||
- `_audio_idle` — set when queue empty AND no dispatch in flight.
|
||||
Callers can `wait_for_audio_done()` to block until audio drains —
|
||||
this is the primitive you'd use to serialise audio-before-arm.
|
||||
- `_tts_worker_stop` — shutdown flag, set in `shutdown_tts()`.
|
||||
|
||||
**Freshness policy:** a new `speak()` call clears the queue and
|
||||
`cancel()`s any in-flight player — newer events always take precedence,
|
||||
avoiding stale "helmet+vest" audio finishing after the worker already
|
||||
moved to a "vest only" event.
|
||||
|
||||
## 5. Configuration flow
|
||||
|
||||
```
|
||||
config/*.json ──► utils.config.load_config() ──► cached dict
|
||||
│
|
||||
module top-level constants
|
||||
(e.g. TTS_VOLUME, MOTION_UNSAFE_FILE)
|
||||
│
|
||||
runtime
|
||||
```
|
||||
|
||||
Env-var overrides are applied at `start_saqr.sh` level (for
|
||||
`CONDA_ENV`, `SAQR_SOURCE`, etc.) or inside `load_config` for specific
|
||||
keys. The philosophy is: no repo-edits for the common knobs — flip the
|
||||
JSON and restart.
|
||||
|
||||
## 6. Firmware constraints (the big lesson)
|
||||
|
||||
Unitree G1 firmware routes the audio subsystem (`TtsMaker`,
|
||||
`PlayStream`, `AUDIO_STOP_PLAY`, `SetVolume`) and the low-level arm SDK
|
||||
(`rt/arm_sdk`, published at 60 Hz during motion replay) through the
|
||||
same onboard MCU / bus. While arm SDK is actively publishing, audio
|
||||
RPCs block until their timeout and return `rc=3104` ("device busy").
|
||||
|
||||
Consequences for this codebase:
|
||||
|
||||
1. **Parallel audio + custom motion is unreliable.** Testing showed a
|
||||
~50 % audio dropout rate when `ArmReplayer` runs concurrently with
|
||||
`TtsMaker`.
|
||||
2. **The 10-second default DDS timeout made failures catastrophic.** We
|
||||
shortened the `AudioClient` timeout to 3 s so hung calls surface in
|
||||
seconds, not tens of seconds.
|
||||
3. **Retries, warm-ups, and STOP_PLAY resets cannot fix a firmware
|
||||
that's busy on the other channel** — they can only recover after
|
||||
arm sdk releases the bus. We keep all three as robustness hygiene
|
||||
but they aren't a substitute for serialisation.
|
||||
4. **The high-level `ExecuteAction('reject')` is a single RPC**, not
|
||||
60 Hz publishing, so it contends less with audio. If parallel is
|
||||
required, `motion.enabled=false` plus `audio_lead_s=0.3` is the
|
||||
closest thing that works — at the cost of the canned gesture.
|
||||
|
||||
The deterministic path is to serialise: `speak()` →
|
||||
`wait_for_audio_done()` → `reject()`. This trades ~6 s of per-event
|
||||
latency for guaranteed audio delivery — the right trade for a
|
||||
safety-critical warning.
|
||||
|
||||
## 7. Deployment and lifecycle
|
||||
|
||||
1. **Install**: `scripts/deploy.sh` rsyncs the repo to
|
||||
`unitree@<ROBOT_IP>:~/Saqr/` and runs `pip install -e .` inside the
|
||||
robot's `saqr` conda env.
|
||||
2. **Enable**: `sudo systemctl enable --now saqr-bridge`.
|
||||
3. **Runtime**: bridge.py → `TtsMaker("Saqr is running. Press R2 plus
|
||||
X to start.")` → idle.
|
||||
4. **Trigger**: operator presses R2+X → subprocess spawns.
|
||||
5. **Stop**: operator presses R2+Y → subprocess gets SIGINT, bridge
|
||||
announces "Saqr deactivated.", returns to idle.
|
||||
6. **Reboot**: systemd auto-restarts the bridge with `Restart=on-failure`.
|
||||
7. **Logs**: `journalctl -u saqr-bridge` + `logs/*.log` +
|
||||
`runtime/runs/<timestamp>/events.csv`.
|
||||
|
||||
## 8. Plan & open threads
|
||||
|
||||
Short list — things that would materially improve reliability if picked
|
||||
up next:
|
||||
|
||||
| Area | What | Why |
|
||||
|---|---|---|
|
||||
| Audio + arm | Serialise (audio done → arm) | Eliminates the ~50 % dropout on safety alerts |
|
||||
| Detection | Per-identity cooldown, not per-track-id | Track-ID churn currently re-triggers the alert for the same person |
|
||||
| Robustness | Boot-time self-check | Surface camera/DDS/model/motion issues before the first event, not during |
|
||||
| Observability | Rotate `logs/`, add `logs/events.jsonl` | Audit trail + disk safety |
|
||||
| UX | Short-phrase fallback when audio drops | The arm still moves even when audio fails; user sees gesture but hears nothing |
|
||||
| Testing | Mock RobotController for bridge unit tests | Today the bridge is only tested end-to-end on the robot |
|
||||
|
||||
## 9. Glossary
|
||||
|
||||
- **G1 / Unitree G1** — humanoid platform this runs on.
|
||||
- **DDS** — Data Distribution Service, Cyclone DDS v0.10.2, the pub/sub
|
||||
bus used by the Unitree SDK.
|
||||
- **`rt/lowstate`** — robot state topic; carries wireless remote bits.
|
||||
- **`rt/arm_sdk`** — 60 Hz arm joint command topic used by the
|
||||
teach-and-replay path.
|
||||
- **`AudioClient` / `G1ArmActionClient`** — Unitree SDK service clients
|
||||
that wrap DDS RPC.
|
||||
- **`TtsMaker`** — firmware text-to-speech RPC.
|
||||
- **`PlayStream`** — firmware PCM-chunk playback RPC.
|
||||
- **`rc=3104`** — firmware-level "device busy" error returned by audio
|
||||
RPCs when the audio bus is held by another consumer.
|
||||
- **PPE** — Personal Protective Equipment (helmet, vest, boots, gloves,
|
||||
goggles in this project's dataset).
|
||||
Loading…
x
Reference in New Issue
Block a user