# Saqr Architecture This document describes how Saqr is built: the components, how they communicate, how data flows through the system, the concurrency model, and the constraints imposed by the G1 firmware. It's meant as a "start-here" for anyone extending the code or debugging an incident. ## 1. High-level system map ``` ┌─────────────────────────── Dev machine ────────────────────────────┐ │ │ │ apps.train_cli ──(saqr_best.pt)──► data/models/ │ │ gui/ (optional: PySide6 desktop QA tool, not used in production) │ │ │ │ scripts/deploy.sh ──(rsync + pip install -e .)──► │ │ │ └────────────────────────────────┬───────────────────────────────────┘ │ robot_ip (eth0 / 192.168.123.164) │ ┌────────────────────────────────▼───────────────────────────────────┐ │ Unitree G1 (Jetson Orin NX) │ │ │ │ scripts/start_saqr.sh │ │ │ │ │ ▼ │ │ ┌─────────────── robot/bridge.py (main process) ─────────────┐ │ │ │ │ │ │ │ RobotController ─── G1ArmActionClient ─┐ │ │ │ │ │ ├── DDS (eth0) ─┐ │ │ │ │ ├── AudioClient ────────────────┤ │ │ │ │ │ ├── LowStateHub ← rt/lowstate ─┤ │ │ │ │ │ ├── ArmReplayer ── rt/arm_sdk ─┘ │ │ │ │ │ └── TtsWorker thread (audio queue) │ │ │ │ │ │ │ │ │ │ TriggerLoop thread ── R2+X / R2+Y polling ─────────────┘ │ │ │ │ │ │ │ │ StdoutReader thread ── parses event lines from subprocess ─┤ │ │ └──────────────────────────────────────────────────────────────┘ │ │ │ │ │ subprocess.Popen │ │ │ │ │ ┌────────────────── apps/saqr_cli.py ──────────────────────┐ │ │ │ │ │ │ │ Camera (RealSense) → YOLO11n → Tracker → Compliance │ │ │ │ │ │ │ │ │ └── emit_event() ──► stdout │ │ │ MJPEG stream on :8080 (optional) │ │ │ └────────────────────────────────────────────────────────────┘ │ │ │ └────────────────────────────────────────────────────────────────────┘ │ Wireless remote (R2+X / R2+Y) G1 speaker (audio out) G1 arms (motion out) ``` ## 2. Components ### 2.1 `core/` — detection & reasoning (shared library) Pure-Python, no Unitree SDK dependency. Used by `apps/` and (indirectly) `robot/bridge.py` via subprocess. - `core/camera.py` — RealSense / webcam / video-file source. Yields `(frame, depth)` pairs. - `core/model.py` — YOLO11n wrapper, class filtering, confidence thresholding, batched inference. - `core/tracker.py` — ByteTrack-style persistent IDs across frames. - `core/compliance.py` — binary SAFE / UNSAFE classifier. Reads `REQUIRED_PPE` from config; `split_wearing_missing()` handles the `no-X` class convention. - `core/events.py` — event emission with structured format: `ID NNNN | EVENT | STATUS | wearing: … | missing: … | unknown: …` - `core/stationary.py` — "is this person standing still long enough to warrant an alert?" heuristic (pixel-level centroid stability). - `core/drawing.py` — overlay boxes + labels on frames for the MJPEG stream. - `core/paths.py` — resolves `PROJECT_ROOT` from the `SAQR_ROOT` env var or by walking up from `__file__`. ### 2.2 `apps/` — executable entry points - `apps/saqr_cli.py` — the detection subprocess launched by the bridge. Reads config, opens the camera, runs the pipeline, prints events on stdout, serves MJPEG on `:8080`. - `apps/detect_cli.py` — stand-alone detector for testing on clips. - `apps/train_cli.py` — dev-machine training wrapper around `ultralytics`. - `apps/manager_cli.py` — dataset tooling (class rebalancing, splits). - `apps/view_stream.py` — OpenCV viewer attached to the MJPEG stream. ### 2.3 `robot/` — G1 integration (only runs on the robot) - `robot/bridge.py` — orchestrator. Owns `RobotController`, spawns `apps.saqr_cli` as a subprocess, parses its stdout, routes UNSAFE/SAFE events into robot actions. Also the systemd entry point. - `robot/robot_controller.py` — owns all the G1 clients: arm action, audio, lowstate. Runs a `TtsWorker` background thread with a freshness policy (new announcement cancels and replaces the in-flight one). - `robot/arm_replay.py` — low-level `rt/arm_sdk` publisher that plays a recorded JSONL trajectory at 60 Hz. Used when `motion.enabled=true`. - `robot/audio_player.py` — `PlayStream`-based WAV player, with chunk retries for firmware 3104 and a cancel flag. Used when `tts.mode="recorded_only"` or `"recorded_or_tts"`. - `robot/controller.py` — `LowStateHub` for decoding the wireless remote (R2+X / R2+Y combos) from `rt/lowstate`. ### 2.4 `utils/` — shared helpers - `utils/config.py` — `load_config(name)` — reads `config/_config.json`, caches, applies env-var overrides. - `utils/logger.py` — rotating file logger + console mirror. ### 2.5 `config/` — runtime tunables - `core_config.json` — detection thresholds, tracker params, camera source, stream port, training hyperparams, compliance rules, capture. - `robot_config.json` — bridge timing, TTS mode + phrases, arm action names, recorded-motion filenames, deploy target IP, start_saqr defaults. - `logging.json` — log level per module. Precedence: **env var > config JSON > code fallback**. ### 2.6 `assets/` — runtime artefacts (in-repo) - `assets/audio/fixed/*.wav` — generic phrases (ready, safe, unsafe_generic, deactivated, no_camera). - `assets/audio/unsafe_missing/*.wav` — per missing-PPE combo (helmet, vest, helmet_vest). - `assets/motions/adnoc1.jsonl` — the UNSAFE arm gesture (7 s recorded trajectory). - `assets/motions/arm_home.jsonl` — the home pose used to smoothly return the arm at the end of a replay. All WAVs are 16 kHz mono int16 — required by the G1 audio channel. Motion JSONL is `{"t": seconds, "q": [29 floats]}` per line, 60 Hz. ## 3. Data pipeline ``` ┌──────────────┐ ┌─────────────┐ ┌────────────┐ │ RealSense │────►│ YOLO11n │────►│ Tracker │ │ 640x480@30 │ │ saqr_best │ │ ByteTrack │ └──────────────┘ └─────────────┘ └─────┬──────┘ │ ▼ ┌──────────────────┐ │ Compliance │ │ REQUIRED=[hv] │ └────────┬─────────┘ │ SAFE / UNSAFE / PARTIAL │ ▼ ┌──────────────────┐ │ Stationary check │ │ (centroid drift) │ └────────┬─────────┘ │ ▼ emit_event() ──► stdout │ ▼ bridge.handle_line() (reader thread) │ ┌─────────────────────────┼─────────────────────┐ ▼ ▼ ▼ RobotController ArmReplayer (log only) .speak(text,cat,key) .play(motion, home) │ │ ▼ ▼ TtsWorker thread rt/arm_sdk @ 60 Hz │ │ ▼ │ AudioClient.TtsMaker │ │ │ └─────── G1 firmware ◄────┘ │ ▼ Speaker + arms ``` ### Event lifecycle (single UNSAFE example) 1. Frame `N`: YOLO detects a person with `no-helmet` and `no-vest` boxes. 2. Tracker assigns (or keeps) `track_id=42`. 3. Compliance → UNSAFE, `wearing=[], missing=[helmet, vest]`. 4. Stationary check: same centroid for ≥15 frames → green-lit. 5. `core.events.emit_event()` prints to stdout: `[HH:MM:SS.fff] ID 0042 | NEW | UNSAFE | wearing: none | missing: helmet, vest | unknown: gloves, goggles, boots` 6. `bridge._read_stdout` parses, `handle_line` matches `EVENT_RE`. 7. Cooldown check on `(42, UNSAFE)` passes → fire actions. 8. `robot.speak("Please stop. Wear your proper safety equipment. You are missing helmet and vest.", category="unsafe_missing", key="helmet_vest")`. 9. `time.sleep(audio_lead_s=0.3)` — hand the audio worker a head start. 10. `robot.reject(release_after=0.5)` → `ArmReplayer.play(adnoc1.jsonl, arm_home.jsonl)` — blocks the main thread for ~12 s while publishing joint commands. 11. Audio worker picks up the speak request, resets with `AUDIO_STOP_PLAY`, calls `TtsMaker`, retries once if `rc!=0`. 12. Bridge returns to idle; next event at `(42, UNSAFE)` is ignored for 8 s per the cooldown. ## 4. Concurrency model **Three process / thread boundaries:** | Layer | Process | Thread | Purpose | |---|---|---|---| | Bridge | `bridge.py` main | MainThread | orchestrator, arm action calls (blocking) | | Bridge | `bridge.py` main | TriggerLoop | polls `hub.combo_r2x()` / `combo_r2y()` | | Bridge | `bridge.py` main | StdoutReader | reads subprocess stdout line-by-line | | Bridge | `bridge.py` main | TtsWorker | drains audio queue, calls `TtsMaker` | | Saqr CLI | `apps.saqr_cli` (subprocess) | MainThread | camera + inference + stdout emit | | Saqr CLI | subprocess | Streaming | MJPEG server thread on `:8080` | **Synchronisation primitives** (all in `RobotController`): - `_tts_queue` — bounded deque (`queue_max=4`), drained by TtsWorker. - `_tts_event` — signalled on every `speak()` to wake the worker. - `_audio_idle` — set when queue empty AND no dispatch in flight. Callers can `wait_for_audio_done()` to block until audio drains — this is the primitive you'd use to serialise audio-before-arm. - `_tts_worker_stop` — shutdown flag, set in `shutdown_tts()`. **Freshness policy:** a new `speak()` call clears the queue and `cancel()`s any in-flight player — newer events always take precedence, avoiding stale "helmet+vest" audio finishing after the worker already moved to a "vest only" event. ## 5. Configuration flow ``` config/*.json ──► utils.config.load_config() ──► cached dict │ module top-level constants (e.g. TTS_VOLUME, MOTION_UNSAFE_FILE) │ runtime ``` Env-var overrides are applied at `start_saqr.sh` level (for `CONDA_ENV`, `SAQR_SOURCE`, etc.) or inside `load_config` for specific keys. The philosophy is: no repo-edits for the common knobs — flip the JSON and restart. ## 6. Firmware constraints (the big lesson) Unitree G1 firmware routes the audio subsystem (`TtsMaker`, `PlayStream`, `AUDIO_STOP_PLAY`, `SetVolume`) and the low-level arm SDK (`rt/arm_sdk`, published at 60 Hz during motion replay) through the same onboard MCU / bus. While arm SDK is actively publishing, audio RPCs block until their timeout and return `rc=3104` ("device busy"). Consequences for this codebase: 1. **Parallel audio + custom motion is unreliable.** Testing showed a ~50 % audio dropout rate when `ArmReplayer` runs concurrently with `TtsMaker`. 2. **The 10-second default DDS timeout made failures catastrophic.** We shortened the `AudioClient` timeout to 3 s so hung calls surface in seconds, not tens of seconds. 3. **Retries, warm-ups, and STOP_PLAY resets cannot fix a firmware that's busy on the other channel** — they can only recover after arm sdk releases the bus. We keep all three as robustness hygiene but they aren't a substitute for serialisation. 4. **The high-level `ExecuteAction('reject')` is a single RPC**, not 60 Hz publishing, so it contends less with audio. If parallel is required, `motion.enabled=false` plus `audio_lead_s=0.3` is the closest thing that works — at the cost of the canned gesture. The deterministic path is to serialise: `speak()` → `wait_for_audio_done()` → `reject()`. This trades ~6 s of per-event latency for guaranteed audio delivery — the right trade for a safety-critical warning. ## 7. Deployment and lifecycle 1. **Install**: `scripts/deploy.sh` rsyncs the repo to `unitree@:~/Saqr/` and runs `pip install -e .` inside the robot's `saqr` conda env. 2. **Enable**: `sudo systemctl enable --now saqr-bridge`. 3. **Runtime**: bridge.py → `TtsMaker("Saqr is running. Press R2 plus X to start.")` → idle. 4. **Trigger**: operator presses R2+X → subprocess spawns. 5. **Stop**: operator presses R2+Y → subprocess gets SIGINT, bridge announces "Saqr deactivated.", returns to idle. 6. **Reboot**: systemd auto-restarts the bridge with `Restart=on-failure`. 7. **Logs**: `journalctl -u saqr-bridge` + `logs/*.log` + `runtime/runs//events.csv`. ## 8. Plan & open threads Short list — things that would materially improve reliability if picked up next: | Area | What | Why | |---|---|---| | Audio + arm | Serialise (audio done → arm) | Eliminates the ~50 % dropout on safety alerts | | Detection | Per-identity cooldown, not per-track-id | Track-ID churn currently re-triggers the alert for the same person | | Robustness | Boot-time self-check | Surface camera/DDS/model/motion issues before the first event, not during | | Observability | Rotate `logs/`, add `logs/events.jsonl` | Audit trail + disk safety | | UX | Short-phrase fallback when audio drops | The arm still moves even when audio fails; user sees gesture but hears nothing | | Testing | Mock RobotController for bridge unit tests | Today the bridge is only tested end-to-end on the robot | ## 9. Glossary - **G1 / Unitree G1** — humanoid platform this runs on. - **DDS** — Data Distribution Service, Cyclone DDS v0.10.2, the pub/sub bus used by the Unitree SDK. - **`rt/lowstate`** — robot state topic; carries wireless remote bits. - **`rt/arm_sdk`** — 60 Hz arm joint command topic used by the teach-and-replay path. - **`AudioClient` / `G1ArmActionClient`** — Unitree SDK service clients that wrap DDS RPC. - **`TtsMaker`** — firmware text-to-speech RPC. - **`PlayStream`** — firmware PCM-chunk playback RPC. - **`rc=3104`** — firmware-level "device busy" error returned by audio RPCs when the audio bus is held by another consumer. - **PPE** — Personal Protective Equipment (helmet, vest, boots, gloves, goggles in this project's dataset).