Sanad

Voice + motion assistant for the Unitree G1 humanoid. Gemini Live handles conversation; the arm controller plays built-in SDK poses and recorded JSONL macros; everything is orchestrated by a FastAPI dashboard.

┌────────────────────────────────────────────────────────────────────┐
│  Dashboard (FastAPI) ── http://<robot>:8000                        │
│  ├─ Voice & Audio      Live Gemini, Typed Replay, Wake Phrases     │
│  ├─ Motion & Replay    SDK actions, JSONL replays, teaching mode   │
│  ├─ Camera & Vision    (deprecated, UI kept for compat)            │
│  ├─ Recordings         Skills registry, saved Gemini turns         │
│  └─ Settings & Logs    System info, tail live log                  │
└────────────────────────────────────────────────────────────────────┘
        │
        ├─ voice/sanad_voice.py  (subprocess — Gemini Live audio loop)
        ├─ gemini/client.py      (short-session client for Typed Replay)
        ├─ gemini/subprocess.py  (spawns+supervises sanad_voice.py)
        ├─ motion/arm_controller.py  (G1 arm DDS publisher)
        ├─ voice/audio_io.py     (mic + speaker abstraction — 3 profiles)
        └─ core/brain.py         (skill dispatcher, event bus)

Quick start (on the robot)

conda activate gemini_sdk
cd ~/Sanad
python3 main.py

Then open http://<robot-ip>:8000 in a browser.

Directory layout

Path	Contents
`main.py`	Entry point — boots all subsystems + dashboard.
`config.py`	Runtime constants derived from `config/*_config.json`.
`config/`	Per-subsystem JSON config: `core`, `voice`, `gemini`, `motion`, `dashboard`, `local`.
`core/`	Brain, skill registry, event bus, config loader, logger.
`gemini/`	Gemini Live — `client.py` (one-shot), `script.py` (live brain), `subprocess.py` (supervisor).
`voice/`	`sanad_voice.py` (subprocess entry), `audio_io.py` (mic/speaker), `audio_manager.py`, `local_tts.py`, `live_voice_loop.py`, `typed_replay.py`, `wake_phrase_manager.py`, `text_utils.py`, `model_script.py` (brain template).
`local/`	Offline pipeline skeleton — Silero VAD, Whisper, Qwen (via Ollama), CosyVoice2. Opt-in via `SANAD_VOICE_BRAIN=local`.
`motion/`	`arm_controller.py` (main), `sanad_arm_controller.py`, `macro_player.py`, `macro_recorder.py`, `teaching.py`.
`dashboard/`	FastAPI routes (`dashboard/routes/*.py`) + static UI (`dashboard/static/index.html`).
`scripts/`	Persona files — `sanad_script.txt` (voice persona), `sanad_rule.txt`, `sanad_arm.txt` (voice→arm phrases).
`data/`	Runtime state — `audio/` (typed-replay WAVs), `motions/` (arm JSONL files), `recordings/` (live-captured turns), `motions/config.json` (dashboard-editable settings).
`model/`	Place for local SpeechT5 / CosyVoice2 weights when using offline pipeline.
`logs/`	Per-module rotating logs.

Runtime selection (env vars)

Var	Values	Default	Effect
`SANAD_AUDIO_PROFILE`	`builtin`, `anker`, `hollyland_builtin`	`builtin`	Which mic + speaker pair `audio_io.py` mounts. `builtin` = G1 UDP mic + G1 chest speaker via DDS.
`SANAD_VOICE_BRAIN`	`gemini`, `local`, `model`	`gemini`	Which brain the subprocess loads (see `voice/sanad_voice.py:_build_brain`).
`SANAD_DDS_INTERFACE`	network iface	`eth0`	DDS network for G1 low-level comms.
`SANAD_GEMINI_API_KEY`	string	reads config	Override the API key in `data/motions/config.json`.
`SANAD_LIVE_SCRIPT`	path	auto	Override the subprocess entry script path.
`SANAD_RECORD`	`0` or `1`	`1`	Record every Gemini turn to `data/recordings/`.
`SANAD_AEC_ENABLE`	`0` or `1`	`1`	Enable WebRTC AEC3 (if the Python binding is installed).

Dashboard features

Operations

Quick-fire SDK + JSONL arm actions (chip buttons), gestural speaking toggle.

Voice & Audio

Live Voice Commands — arm trigger from user transcripts (wake-phrase → arm action). Master gate + Deferred-trigger toggle.
Live Gemini Process — start/stop the voice conversation subprocess, tail its log.
Typed Replay — Gemini reads typed text aloud (wrapped with a "repeat verbatim" prompt).
Gemini API Key — hot-swap the key without restart.
Wake Phrase Manager — add/remove phrase → action bindings.

Motion & Replay

Motion Control — list SDK (built-in) + JSONL (recorded) actions, select + play. Cancel smoothly returns to arm_home.jsonl.
Replay Manager — upload .jsonl files, test-play with speed, Teaching Mode (kinesthetic record).
Macro Recorder — Record new audio+motion pair, OR pick any WAV + any motion (SDK or JSONL) and Play them in parallel.

Recordings

Skill Registry (predefined audio+motion skills from skills.json) + Saved Records (Gemini turn recordings).

Architecture notes

Subprocess isolation: voice/sanad_voice.py runs as a child of main.py via gemini/subprocess.py. If the voice loop crashes, the dashboard + arm stay up.
Brain contract: see voice/model_script.py — any new model (OpenAI Realtime, Claude Voice, local offline) implements __init__(audio_io, recorder, voice, system_prompt), async run(), stop(). Drop a file in voice/ or a new <brand>/ folder, add a branch to voice/sanad_voice.py:_build_brain().
Supervisor contract: each brain ships a sibling supervisor (e.g., gemini/subprocess.py) that spawns sanad_voice.py with its SANAD_VOICE_BRAIN env var and parses the brain's log markers. Template: voice/model_subprocess.py.
Audio routing: the G1's platform-sound PulseAudio sink is NOT wired to a physical speaker. All dashboard-triggered playback (play_wav, typed-replay audio, record playback) routes through DDS AudioClient.PlayStream via audio_manager._play_pcm_via_g1. The PyAudio path is kept as a desktop/dev fallback only.
Arm replay: motion/arm_controller.py:_replay_file_inner() is a verbatim port of G1_Lootah/Manual_Recorder/g1_replay_v4_stable.py:Run() — ramp-in → settle hold → playback → smooth return → disable SDK. Cancel breaks the play loop; _return_home() runs unconditionally afterwards for a jerk-free return.

Dynamic paths

Every path is derived at runtime — no hard-coded /home/zedx/… anywhere. Resolution order for BASE_DIR in config.py:

SANAD_PROJECT_ROOT env var (if set).
PROJECT_BASE + PROJECT_NAME from a .env file in Sanad/ or its parent.
Path(__file__).resolve().parent — auto-detected.

The project runs unchanged from either layout:

dev: <anywhere>/Project/Sanad/
deployed: /home/unitree/Sanad/

Deployment (workstation → robot)

rsync -av --delete \
  --exclude=__pycache__ --exclude=logs --exclude=model --exclude=.git \
  /path/to/Sanad/ \
  unitree@192.168.123.164:/home/unitree/Sanad/

Then on the robot: Ctrl+C the running main.py and re-run.

Troubleshooting

Symptom	Fix
`No LowState received in 2s — refusing to replay`	`main.py` was re-executed as both `__main__` and `Project.Sanad.main`, creating two arm instances. Fix lives in the `sys.modules` alias at `main.py:~50`. Restart.
`G1ArmActionClient not available — skipping` for SDK actions	Same duplicate-init issue as above.
`No module named 'Project'` in subprocess	Bootstrap preamble in `voice/sanad_voice.py:~30` synthesises the `Project.Sanad` namespace when run as `__main__`.
Arm jumps at start of JSONL replay	`SETTLE_HOLD_SEC` (in `config/motion_config.json > arm_controller`) too low — try `0.7` or `1.0`.
Record playback silent	`audio_mgr.play_wav` only routes to G1 DDS if the Unitree SDK is importable; on desktop it falls back to the PulseAudio sink.
Live Voice Commands transcript stuck	Deferred trigger was queued but `trigger_enabled` toggle was off. Toggle on — or the pending-trigger poll now fires it automatically once enabled.
Gemini "no audio" on Typed Replay	Non-deterministic; the retry chain in `voice/typed_replay.py:generate_audio` tries three prompt variants. For reliable TTS, use the offline `local_tts` SpeechT5 path.
Dashboard `Not Found` 404s for `/api/vision/*`	Vision module was deleted; HTML still has stale fetches for a few endpoints. Cosmetic — `dashboard/static/index.html` init block already skips most.

License / attribution

Internal project for YS Lootah Technology. Reuses/ports patterns from:

G1_Lootah/Manual_Recorder/g1_replay_v4_stable.py (arm replay math)
SanadVoice/gemini_interact (arm-phrase dispatch, skill registry)
SanadVoice/gemini_voice_v2 (local SpeechT5 TTS)
Unitree unitree_sdk2py (G1 low-level SDK, LocoClient, G1ArmActionClient)

8.9 KiB Raw Blame History