Saqr/docs/ARCHITECTURE.md
2026-04-21 10:33:13 +04:00

18 KiB

Saqr Architecture

This document describes how Saqr is built: the components, how they communicate, how data flows through the system, the concurrency model, and the constraints imposed by the G1 firmware. It's meant as a "start-here" for anyone extending the code or debugging an incident.

1. High-level system map

┌─────────────────────────── Dev machine ────────────────────────────┐
│                                                                    │
│  apps.train_cli  ──(saqr_best.pt)──► data/models/                  │
│  gui/  (optional: PySide6 desktop QA tool, not used in production) │
│                                                                    │
│  scripts/deploy.sh  ──(rsync + pip install -e .)──►                │
│                                                                    │
└────────────────────────────────┬───────────────────────────────────┘
                                 │
                        robot_ip (eth0 / 192.168.123.164)
                                 │
┌────────────────────────────────▼───────────────────────────────────┐
│                      Unitree G1 (Jetson Orin NX)                   │
│                                                                    │
│  scripts/start_saqr.sh                                             │
│        │                                                           │
│        ▼                                                           │
│  ┌───────────────  robot/bridge.py  (main process) ─────────────┐  │
│  │                                                              │  │
│  │  RobotController ─── G1ArmActionClient ─┐                    │  │
│  │          │                               ├── DDS (eth0) ─┐   │  │
│  │          ├── AudioClient ────────────────┤               │   │  │
│  │          ├── LowStateHub  ← rt/lowstate ─┤               │   │  │
│  │          ├── ArmReplayer  ── rt/arm_sdk ─┘               │   │  │
│  │          └── TtsWorker thread (audio queue)              │   │  │
│  │                                                          │   │  │
│  │  TriggerLoop thread  ── R2+X / R2+Y polling ─────────────┘   │  │
│  │                                                              │  │
│  │  StdoutReader thread  ── parses event lines from subprocess ─┤  │
│  └──────────────────────────────────────────────────────────────┘  │
│                                 │                                  │
│                          subprocess.Popen                          │
│                                 │                                  │
│  ┌──────────────────  apps/saqr_cli.py  ──────────────────────┐    │
│  │                                                            │    │
│  │  Camera (RealSense) → YOLO11n → Tracker → Compliance       │    │
│  │                                       │                    │    │
│  │                                       └── emit_event() ──► stdout │
│  │  MJPEG stream on :8080 (optional)                          │    │
│  └────────────────────────────────────────────────────────────┘    │
│                                                                    │
└────────────────────────────────────────────────────────────────────┘
                                 │
                            Wireless remote (R2+X / R2+Y)
                            G1 speaker (audio out)
                            G1 arms (motion out)

2. Components

2.1 core/ — detection & reasoning (shared library)

Pure-Python, no Unitree SDK dependency. Used by apps/ and (indirectly) robot/bridge.py via subprocess.

  • core/camera.py — RealSense / webcam / video-file source. Yields (frame, depth) pairs.
  • core/model.py — YOLO11n wrapper, class filtering, confidence thresholding, batched inference.
  • core/tracker.py — ByteTrack-style persistent IDs across frames.
  • core/compliance.py — binary SAFE / UNSAFE classifier. Reads REQUIRED_PPE from config; split_wearing_missing() handles the no-X class convention.
  • core/events.py — event emission with structured format: ID NNNN | EVENT | STATUS | wearing: … | missing: … | unknown: …
  • core/stationary.py — "is this person standing still long enough to warrant an alert?" heuristic (pixel-level centroid stability).
  • core/drawing.py — overlay boxes + labels on frames for the MJPEG stream.
  • core/paths.py — resolves PROJECT_ROOT from the SAQR_ROOT env var or by walking up from __file__.

2.2 apps/ — executable entry points

  • apps/saqr_cli.py — the detection subprocess launched by the bridge. Reads config, opens the camera, runs the pipeline, prints events on stdout, serves MJPEG on :8080.
  • apps/detect_cli.py — stand-alone detector for testing on clips.
  • apps/train_cli.py — dev-machine training wrapper around ultralytics.
  • apps/manager_cli.py — dataset tooling (class rebalancing, splits).
  • apps/view_stream.py — OpenCV viewer attached to the MJPEG stream.

2.3 robot/ — G1 integration (only runs on the robot)

  • robot/bridge.py — orchestrator. Owns RobotController, spawns apps.saqr_cli as a subprocess, parses its stdout, routes UNSAFE/SAFE events into robot actions. Also the systemd entry point.
  • robot/robot_controller.py — owns all the G1 clients: arm action, audio, lowstate. Runs a TtsWorker background thread with a freshness policy (new announcement cancels and replaces the in-flight one).
  • robot/arm_replay.py — low-level rt/arm_sdk publisher that plays a recorded JSONL trajectory at 60 Hz. Used when motion.enabled=true.
  • robot/audio_player.pyPlayStream-based WAV player, with chunk retries for firmware 3104 and a cancel flag. Used when tts.mode="recorded_only" or "recorded_or_tts".
  • robot/controller.pyLowStateHub for decoding the wireless remote (R2+X / R2+Y combos) from rt/lowstate.

2.4 utils/ — shared helpers

  • utils/config.pyload_config(name) — reads config/<name>_config.json, caches, applies env-var overrides.
  • utils/logger.py — rotating file logger + console mirror.

2.5 config/ — runtime tunables

  • core_config.json — detection thresholds, tracker params, camera source, stream port, training hyperparams, compliance rules, capture.
  • robot_config.json — bridge timing, TTS mode + phrases, arm action names, recorded-motion filenames, deploy target IP, start_saqr defaults.
  • logging.json — log level per module.

Precedence: env var > config JSON > code fallback.

2.6 assets/ — runtime artefacts (in-repo)

  • assets/audio/fixed/*.wav — generic phrases (ready, safe, unsafe_generic, deactivated, no_camera).
  • assets/audio/unsafe_missing/*.wav — per missing-PPE combo (helmet, vest, helmet_vest).
  • assets/motions/adnoc1.jsonl — the UNSAFE arm gesture (7 s recorded trajectory).
  • assets/motions/arm_home.jsonl — the home pose used to smoothly return the arm at the end of a replay.

All WAVs are 16 kHz mono int16 — required by the G1 audio channel. Motion JSONL is {"t": seconds, "q": [29 floats]} per line, 60 Hz.

3. Data pipeline

┌──────────────┐     ┌─────────────┐     ┌────────────┐
│ RealSense    │────►│   YOLO11n   │────►│  Tracker   │
│ 640x480@30   │     │  saqr_best  │     │  ByteTrack │
└──────────────┘     └─────────────┘     └─────┬──────┘
                                               │
                                               ▼
                                      ┌──────────────────┐
                                      │   Compliance     │
                                      │  REQUIRED=[hv]   │
                                      └────────┬─────────┘
                                               │
                                  SAFE / UNSAFE / PARTIAL
                                               │
                                               ▼
                                      ┌──────────────────┐
                                      │ Stationary check │
                                      │ (centroid drift) │
                                      └────────┬─────────┘
                                               │
                                               ▼
                                     emit_event() ──► stdout
                                               │
                                               ▼
                                 bridge.handle_line() (reader thread)
                                               │
                     ┌─────────────────────────┼─────────────────────┐
                     ▼                         ▼                     ▼
              RobotController          ArmReplayer            (log only)
              .speak(text,cat,key)     .play(motion, home)
                     │                         │
                     ▼                         ▼
              TtsWorker thread          rt/arm_sdk @ 60 Hz
                     │                         │
                     ▼                         │
              AudioClient.TtsMaker             │
                     │                         │
                     └─────── G1 firmware ◄────┘
                                  │
                                  ▼
                            Speaker + arms

Event lifecycle (single UNSAFE example)

  1. Frame N: YOLO detects a person with no-helmet and no-vest boxes.
  2. Tracker assigns (or keeps) track_id=42.
  3. Compliance → UNSAFE, wearing=[], missing=[helmet, vest].
  4. Stationary check: same centroid for ≥15 frames → green-lit.
  5. core.events.emit_event() prints to stdout: [HH:MM:SS.fff] ID 0042 | NEW | UNSAFE | wearing: none | missing: helmet, vest | unknown: gloves, goggles, boots
  6. bridge._read_stdout parses, handle_line matches EVENT_RE.
  7. Cooldown check on (42, UNSAFE) passes → fire actions.
  8. robot.speak("Please stop. Wear your proper safety equipment. You are missing helmet and vest.", category="unsafe_missing", key="helmet_vest").
  9. time.sleep(audio_lead_s=0.3) — hand the audio worker a head start.
  10. robot.reject(release_after=0.5)ArmReplayer.play(adnoc1.jsonl, arm_home.jsonl) — blocks the main thread for ~12 s while publishing joint commands.
  11. Audio worker picks up the speak request, resets with AUDIO_STOP_PLAY, calls TtsMaker, retries once if rc!=0.
  12. Bridge returns to idle; next event at (42, UNSAFE) is ignored for 8 s per the cooldown.

4. Concurrency model

Three process / thread boundaries:

Layer Process Thread Purpose
Bridge bridge.py main MainThread orchestrator, arm action calls (blocking)
Bridge bridge.py main TriggerLoop polls hub.combo_r2x() / combo_r2y()
Bridge bridge.py main StdoutReader reads subprocess stdout line-by-line
Bridge bridge.py main TtsWorker drains audio queue, calls TtsMaker
Saqr CLI apps.saqr_cli (subprocess) MainThread camera + inference + stdout emit
Saqr CLI subprocess Streaming MJPEG server thread on :8080

Synchronisation primitives (all in RobotController):

  • _tts_queue — bounded deque (queue_max=4), drained by TtsWorker.
  • _tts_event — signalled on every speak() to wake the worker.
  • _audio_idle — set when queue empty AND no dispatch in flight. Callers can wait_for_audio_done() to block until audio drains — this is the primitive you'd use to serialise audio-before-arm.
  • _tts_worker_stop — shutdown flag, set in shutdown_tts().

Freshness policy: a new speak() call clears the queue and cancel()s any in-flight player — newer events always take precedence, avoiding stale "helmet+vest" audio finishing after the worker already moved to a "vest only" event.

5. Configuration flow

config/*.json ──► utils.config.load_config() ──► cached dict
                                                     │
                                      module top-level constants
                                 (e.g. TTS_VOLUME, MOTION_UNSAFE_FILE)
                                                     │
                                                 runtime

Env-var overrides are applied at start_saqr.sh level (for CONDA_ENV, SAQR_SOURCE, etc.) or inside load_config for specific keys. The philosophy is: no repo-edits for the common knobs — flip the JSON and restart.

6. Firmware constraints (the big lesson)

Unitree G1 firmware routes the audio subsystem (TtsMaker, PlayStream, AUDIO_STOP_PLAY, SetVolume) and the low-level arm SDK (rt/arm_sdk, published at 60 Hz during motion replay) through the same onboard MCU / bus. While arm SDK is actively publishing, audio RPCs block until their timeout and return rc=3104 ("device busy").

Consequences for this codebase:

  1. Parallel audio + custom motion is unreliable. Testing showed a ~50 % audio dropout rate when ArmReplayer runs concurrently with TtsMaker.
  2. The 10-second default DDS timeout made failures catastrophic. We shortened the AudioClient timeout to 3 s so hung calls surface in seconds, not tens of seconds.
  3. Retries, warm-ups, and STOP_PLAY resets cannot fix a firmware that's busy on the other channel — they can only recover after arm sdk releases the bus. We keep all three as robustness hygiene but they aren't a substitute for serialisation.
  4. The high-level ExecuteAction('reject') is a single RPC, not 60 Hz publishing, so it contends less with audio. If parallel is required, motion.enabled=false plus audio_lead_s=0.3 is the closest thing that works — at the cost of the canned gesture.

The deterministic path is to serialise: speak()wait_for_audio_done()reject(). This trades ~6 s of per-event latency for guaranteed audio delivery — the right trade for a safety-critical warning.

7. Deployment and lifecycle

  1. Install: scripts/deploy.sh rsyncs the repo to unitree@<ROBOT_IP>:~/Saqr/ and runs pip install -e . inside the robot's saqr conda env.
  2. Enable: sudo systemctl enable --now saqr-bridge.
  3. Runtime: bridge.py → TtsMaker("Saqr is running. Press R2 plus X to start.") → idle.
  4. Trigger: operator presses R2+X → subprocess spawns.
  5. Stop: operator presses R2+Y → subprocess gets SIGINT, bridge announces "Saqr deactivated.", returns to idle.
  6. Reboot: systemd auto-restarts the bridge with Restart=on-failure.
  7. Logs: journalctl -u saqr-bridge + logs/*.log + runtime/runs/<timestamp>/events.csv.

8. Plan & open threads

Short list — things that would materially improve reliability if picked up next:

Area What Why
Audio + arm Serialise (audio done → arm) Eliminates the ~50 % dropout on safety alerts
Detection Per-identity cooldown, not per-track-id Track-ID churn currently re-triggers the alert for the same person
Robustness Boot-time self-check Surface camera/DDS/model/motion issues before the first event, not during
Observability Rotate logs/, add logs/events.jsonl Audit trail + disk safety
UX Short-phrase fallback when audio drops The arm still moves even when audio fails; user sees gesture but hears nothing
Testing Mock RobotController for bridge unit tests Today the bridge is only tested end-to-end on the robot

9. Glossary

  • G1 / Unitree G1 — humanoid platform this runs on.
  • DDS — Data Distribution Service, Cyclone DDS v0.10.2, the pub/sub bus used by the Unitree SDK.
  • rt/lowstate — robot state topic; carries wireless remote bits.
  • rt/arm_sdk — 60 Hz arm joint command topic used by the teach-and-replay path.
  • AudioClient / G1ArmActionClient — Unitree SDK service clients that wrap DDS RPC.
  • TtsMaker — firmware text-to-speech RPC.
  • PlayStream — firmware PCM-chunk playback RPC.
  • rc=3104 — firmware-level "device busy" error returned by audio RPCs when the audio bus is held by another consumer.
  • PPE — Personal Protective Equipment (helmet, vest, boots, gloves, goggles in this project's dataset).