18 KiB
Saqr Architecture
This document describes how Saqr is built: the components, how they communicate, how data flows through the system, the concurrency model, and the constraints imposed by the G1 firmware. It's meant as a "start-here" for anyone extending the code or debugging an incident.
1. High-level system map
┌─────────────────────────── Dev machine ────────────────────────────┐
│ │
│ apps.train_cli ──(saqr_best.pt)──► data/models/ │
│ gui/ (optional: PySide6 desktop QA tool, not used in production) │
│ │
│ scripts/deploy.sh ──(rsync + pip install -e .)──► │
│ │
└────────────────────────────────┬───────────────────────────────────┘
│
robot_ip (eth0 / 192.168.123.164)
│
┌────────────────────────────────▼───────────────────────────────────┐
│ Unitree G1 (Jetson Orin NX) │
│ │
│ scripts/start_saqr.sh │
│ │ │
│ ▼ │
│ ┌─────────────── robot/bridge.py (main process) ─────────────┐ │
│ │ │ │
│ │ RobotController ─── G1ArmActionClient ─┐ │ │
│ │ │ ├── DDS (eth0) ─┐ │ │
│ │ ├── AudioClient ────────────────┤ │ │ │
│ │ ├── LowStateHub ← rt/lowstate ─┤ │ │ │
│ │ ├── ArmReplayer ── rt/arm_sdk ─┘ │ │ │
│ │ └── TtsWorker thread (audio queue) │ │ │
│ │ │ │ │
│ │ TriggerLoop thread ── R2+X / R2+Y polling ─────────────┘ │ │
│ │ │ │
│ │ StdoutReader thread ── parses event lines from subprocess ─┤ │
│ └──────────────────────────────────────────────────────────────┘ │
│ │ │
│ subprocess.Popen │
│ │ │
│ ┌────────────────── apps/saqr_cli.py ──────────────────────┐ │
│ │ │ │
│ │ Camera (RealSense) → YOLO11n → Tracker → Compliance │ │
│ │ │ │ │
│ │ └── emit_event() ──► stdout │
│ │ MJPEG stream on :8080 (optional) │ │
│ └────────────────────────────────────────────────────────────┘ │
│ │
└────────────────────────────────────────────────────────────────────┘
│
Wireless remote (R2+X / R2+Y)
G1 speaker (audio out)
G1 arms (motion out)
2. Components
2.1 core/ — detection & reasoning (shared library)
Pure-Python, no Unitree SDK dependency. Used by apps/ and (indirectly)
robot/bridge.py via subprocess.
core/camera.py— RealSense / webcam / video-file source. Yields(frame, depth)pairs.core/model.py— YOLO11n wrapper, class filtering, confidence thresholding, batched inference.core/tracker.py— ByteTrack-style persistent IDs across frames.core/compliance.py— binary SAFE / UNSAFE classifier. ReadsREQUIRED_PPEfrom config;split_wearing_missing()handles theno-Xclass convention.core/events.py— event emission with structured format:ID NNNN | EVENT | STATUS | wearing: … | missing: … | unknown: …core/stationary.py— "is this person standing still long enough to warrant an alert?" heuristic (pixel-level centroid stability).core/drawing.py— overlay boxes + labels on frames for the MJPEG stream.core/paths.py— resolvesPROJECT_ROOTfrom theSAQR_ROOTenv var or by walking up from__file__.
2.2 apps/ — executable entry points
apps/saqr_cli.py— the detection subprocess launched by the bridge. Reads config, opens the camera, runs the pipeline, prints events on stdout, serves MJPEG on:8080.apps/detect_cli.py— stand-alone detector for testing on clips.apps/train_cli.py— dev-machine training wrapper aroundultralytics.apps/manager_cli.py— dataset tooling (class rebalancing, splits).apps/view_stream.py— OpenCV viewer attached to the MJPEG stream.
2.3 robot/ — G1 integration (only runs on the robot)
robot/bridge.py— orchestrator. OwnsRobotController, spawnsapps.saqr_clias a subprocess, parses its stdout, routes UNSAFE/SAFE events into robot actions. Also the systemd entry point.robot/robot_controller.py— owns all the G1 clients: arm action, audio, lowstate. Runs aTtsWorkerbackground thread with a freshness policy (new announcement cancels and replaces the in-flight one).robot/arm_replay.py— low-levelrt/arm_sdkpublisher that plays a recorded JSONL trajectory at 60 Hz. Used whenmotion.enabled=true.robot/audio_player.py—PlayStream-based WAV player, with chunk retries for firmware 3104 and a cancel flag. Used whentts.mode="recorded_only"or"recorded_or_tts".robot/controller.py—LowStateHubfor decoding the wireless remote (R2+X / R2+Y combos) fromrt/lowstate.
2.4 utils/ — shared helpers
utils/config.py—load_config(name)— readsconfig/<name>_config.json, caches, applies env-var overrides.utils/logger.py— rotating file logger + console mirror.
2.5 config/ — runtime tunables
core_config.json— detection thresholds, tracker params, camera source, stream port, training hyperparams, compliance rules, capture.robot_config.json— bridge timing, TTS mode + phrases, arm action names, recorded-motion filenames, deploy target IP, start_saqr defaults.logging.json— log level per module.
Precedence: env var > config JSON > code fallback.
2.6 assets/ — runtime artefacts (in-repo)
assets/audio/fixed/*.wav— generic phrases (ready, safe, unsafe_generic, deactivated, no_camera).assets/audio/unsafe_missing/*.wav— per missing-PPE combo (helmet, vest, helmet_vest).assets/motions/adnoc1.jsonl— the UNSAFE arm gesture (7 s recorded trajectory).assets/motions/arm_home.jsonl— the home pose used to smoothly return the arm at the end of a replay.
All WAVs are 16 kHz mono int16 — required by the G1 audio channel.
Motion JSONL is {"t": seconds, "q": [29 floats]} per line, 60 Hz.
3. Data pipeline
┌──────────────┐ ┌─────────────┐ ┌────────────┐
│ RealSense │────►│ YOLO11n │────►│ Tracker │
│ 640x480@30 │ │ saqr_best │ │ ByteTrack │
└──────────────┘ └─────────────┘ └─────┬──────┘
│
▼
┌──────────────────┐
│ Compliance │
│ REQUIRED=[hv] │
└────────┬─────────┘
│
SAFE / UNSAFE / PARTIAL
│
▼
┌──────────────────┐
│ Stationary check │
│ (centroid drift) │
└────────┬─────────┘
│
▼
emit_event() ──► stdout
│
▼
bridge.handle_line() (reader thread)
│
┌─────────────────────────┼─────────────────────┐
▼ ▼ ▼
RobotController ArmReplayer (log only)
.speak(text,cat,key) .play(motion, home)
│ │
▼ ▼
TtsWorker thread rt/arm_sdk @ 60 Hz
│ │
▼ │
AudioClient.TtsMaker │
│ │
└─────── G1 firmware ◄────┘
│
▼
Speaker + arms
Event lifecycle (single UNSAFE example)
- Frame
N: YOLO detects a person withno-helmetandno-vestboxes. - Tracker assigns (or keeps)
track_id=42. - Compliance → UNSAFE,
wearing=[], missing=[helmet, vest]. - Stationary check: same centroid for ≥15 frames → green-lit.
core.events.emit_event()prints to stdout:[HH:MM:SS.fff] ID 0042 | NEW | UNSAFE | wearing: none | missing: helmet, vest | unknown: gloves, goggles, bootsbridge._read_stdoutparses,handle_linematchesEVENT_RE.- Cooldown check on
(42, UNSAFE)passes → fire actions. robot.speak("Please stop. Wear your proper safety equipment. You are missing helmet and vest.", category="unsafe_missing", key="helmet_vest").time.sleep(audio_lead_s=0.3)— hand the audio worker a head start.robot.reject(release_after=0.5)→ArmReplayer.play(adnoc1.jsonl, arm_home.jsonl)— blocks the main thread for ~12 s while publishing joint commands.- Audio worker picks up the speak request, resets with
AUDIO_STOP_PLAY, callsTtsMaker, retries once ifrc!=0. - Bridge returns to idle; next event at
(42, UNSAFE)is ignored for 8 s per the cooldown.
4. Concurrency model
Three process / thread boundaries:
| Layer | Process | Thread | Purpose |
|---|---|---|---|
| Bridge | bridge.py main |
MainThread | orchestrator, arm action calls (blocking) |
| Bridge | bridge.py main |
TriggerLoop | polls hub.combo_r2x() / combo_r2y() |
| Bridge | bridge.py main |
StdoutReader | reads subprocess stdout line-by-line |
| Bridge | bridge.py main |
TtsWorker | drains audio queue, calls TtsMaker |
| Saqr CLI | apps.saqr_cli (subprocess) |
MainThread | camera + inference + stdout emit |
| Saqr CLI | subprocess | Streaming | MJPEG server thread on :8080 |
Synchronisation primitives (all in RobotController):
_tts_queue— bounded deque (queue_max=4), drained by TtsWorker._tts_event— signalled on everyspeak()to wake the worker._audio_idle— set when queue empty AND no dispatch in flight. Callers canwait_for_audio_done()to block until audio drains — this is the primitive you'd use to serialise audio-before-arm._tts_worker_stop— shutdown flag, set inshutdown_tts().
Freshness policy: a new speak() call clears the queue and
cancel()s any in-flight player — newer events always take precedence,
avoiding stale "helmet+vest" audio finishing after the worker already
moved to a "vest only" event.
5. Configuration flow
config/*.json ──► utils.config.load_config() ──► cached dict
│
module top-level constants
(e.g. TTS_VOLUME, MOTION_UNSAFE_FILE)
│
runtime
Env-var overrides are applied at start_saqr.sh level (for
CONDA_ENV, SAQR_SOURCE, etc.) or inside load_config for specific
keys. The philosophy is: no repo-edits for the common knobs — flip the
JSON and restart.
6. Firmware constraints (the big lesson)
Unitree G1 firmware routes the audio subsystem (TtsMaker,
PlayStream, AUDIO_STOP_PLAY, SetVolume) and the low-level arm SDK
(rt/arm_sdk, published at 60 Hz during motion replay) through the
same onboard MCU / bus. While arm SDK is actively publishing, audio
RPCs block until their timeout and return rc=3104 ("device busy").
Consequences for this codebase:
- Parallel audio + custom motion is unreliable. Testing showed a
~50 % audio dropout rate when
ArmReplayerruns concurrently withTtsMaker. - The 10-second default DDS timeout made failures catastrophic. We
shortened the
AudioClienttimeout to 3 s so hung calls surface in seconds, not tens of seconds. - Retries, warm-ups, and STOP_PLAY resets cannot fix a firmware that's busy on the other channel — they can only recover after arm sdk releases the bus. We keep all three as robustness hygiene but they aren't a substitute for serialisation.
- The high-level
ExecuteAction('reject')is a single RPC, not 60 Hz publishing, so it contends less with audio. If parallel is required,motion.enabled=falseplusaudio_lead_s=0.3is the closest thing that works — at the cost of the canned gesture.
The deterministic path is to serialise: speak() →
wait_for_audio_done() → reject(). This trades ~6 s of per-event
latency for guaranteed audio delivery — the right trade for a
safety-critical warning.
7. Deployment and lifecycle
- Install:
scripts/deploy.shrsyncs the repo tounitree@<ROBOT_IP>:~/Saqr/and runspip install -e .inside the robot'ssaqrconda env. - Enable:
sudo systemctl enable --now saqr-bridge. - Runtime: bridge.py →
TtsMaker("Saqr is running. Press R2 plus X to start.")→ idle. - Trigger: operator presses R2+X → subprocess spawns.
- Stop: operator presses R2+Y → subprocess gets SIGINT, bridge announces "Saqr deactivated.", returns to idle.
- Reboot: systemd auto-restarts the bridge with
Restart=on-failure. - Logs:
journalctl -u saqr-bridge+logs/*.log+runtime/runs/<timestamp>/events.csv.
8. Plan & open threads
Short list — things that would materially improve reliability if picked up next:
| Area | What | Why |
|---|---|---|
| Audio + arm | Serialise (audio done → arm) | Eliminates the ~50 % dropout on safety alerts |
| Detection | Per-identity cooldown, not per-track-id | Track-ID churn currently re-triggers the alert for the same person |
| Robustness | Boot-time self-check | Surface camera/DDS/model/motion issues before the first event, not during |
| Observability | Rotate logs/, add logs/events.jsonl |
Audit trail + disk safety |
| UX | Short-phrase fallback when audio drops | The arm still moves even when audio fails; user sees gesture but hears nothing |
| Testing | Mock RobotController for bridge unit tests | Today the bridge is only tested end-to-end on the robot |
9. Glossary
- G1 / Unitree G1 — humanoid platform this runs on.
- DDS — Data Distribution Service, Cyclone DDS v0.10.2, the pub/sub bus used by the Unitree SDK.
rt/lowstate— robot state topic; carries wireless remote bits.rt/arm_sdk— 60 Hz arm joint command topic used by the teach-and-replay path.AudioClient/G1ArmActionClient— Unitree SDK service clients that wrap DDS RPC.TtsMaker— firmware text-to-speech RPC.PlayStream— firmware PCM-chunk playback RPC.rc=3104— firmware-level "device busy" error returned by audio RPCs when the audio bus is held by another consumer.- PPE — Personal Protective Equipment (helmet, vest, boots, gloves, goggles in this project's dataset).