# Sanad Voice + motion assistant for the Unitree G1 humanoid. Gemini Live handles conversation; the arm controller plays built-in SDK poses and recorded JSONL macros; everything is orchestrated by a FastAPI dashboard. ``` ┌────────────────────────────────────────────────────────────────────┐ │ Dashboard (FastAPI) ── http://:8000 │ │ ├─ Voice & Audio Live Gemini, Typed Replay, Wake Phrases │ │ ├─ Motion & Replay SDK actions, JSONL replays, teaching mode │ │ ├─ Camera & Vision (deprecated, UI kept for compat) │ │ ├─ Recordings Skills registry, saved Gemini turns │ │ └─ Settings & Logs System info, tail live log │ └────────────────────────────────────────────────────────────────────┘ │ ├─ voice/sanad_voice.py (subprocess — Gemini Live audio loop) ├─ gemini/client.py (short-session client for Typed Replay) ├─ gemini/subprocess.py (spawns+supervises sanad_voice.py) ├─ motion/arm_controller.py (G1 arm DDS publisher) ├─ voice/audio_io.py (mic + speaker abstraction — 3 profiles) └─ core/brain.py (skill dispatcher, event bus) ``` ## Quick start (on the robot) ```bash conda activate gemini_sdk cd ~/Sanad python3 main.py ``` Then open `http://:8000` in a browser. ## Directory layout | Path | Contents | |---|---| | `main.py` | Entry point — boots all subsystems + dashboard. | | `config.py` | Runtime constants derived from `config/*_config.json`. | | `config/` | Per-subsystem JSON config: `core`, `voice`, `gemini`, `motion`, `dashboard`, `local`. | | `core/` | Brain, skill registry, event bus, config loader, logger. | | `gemini/` | Gemini Live — `client.py` (one-shot), `script.py` (live brain), `subprocess.py` (supervisor). | | `voice/` | `sanad_voice.py` (subprocess entry), `audio_io.py` (mic/speaker), `audio_manager.py`, `local_tts.py`, `live_voice_loop.py`, `typed_replay.py`, `wake_phrase_manager.py`, `text_utils.py`, `model_script.py` (brain template). | | `local/` | Offline pipeline skeleton — Silero VAD, Whisper, Qwen (via Ollama), CosyVoice2. Opt-in via `SANAD_VOICE_BRAIN=local`. | | `motion/` | `arm_controller.py` (main), `sanad_arm_controller.py`, `macro_player.py`, `macro_recorder.py`, `teaching.py`. | | `dashboard/` | FastAPI routes (`dashboard/routes/*.py`) + static UI (`dashboard/static/index.html`). | | `scripts/` | Persona files — `sanad_script.txt` (voice persona), `sanad_rule.txt`, `sanad_arm.txt` (voice→arm phrases). | | `data/` | Runtime state — `audio/` (typed-replay WAVs), `motions/` (arm JSONL files), `recordings/` (live-captured turns), `motions/config.json` (dashboard-editable settings). | | `model/` | Place for local SpeechT5 / CosyVoice2 weights when using offline pipeline. | | `logs/` | Per-module rotating logs. | ## Runtime selection (env vars) | Var | Values | Default | Effect | |---|---|---|---| | `SANAD_AUDIO_PROFILE` | `builtin`, `anker`, `hollyland_builtin` | `builtin` | Which mic + speaker pair `audio_io.py` mounts. `builtin` = G1 UDP mic + G1 chest speaker via DDS. | | `SANAD_VOICE_BRAIN` | `gemini`, `local`, `model` | `gemini` | Which brain the subprocess loads (see `voice/sanad_voice.py:_build_brain`). | | `SANAD_DDS_INTERFACE` | network iface | `eth0` | DDS network for G1 low-level comms. | | `SANAD_GEMINI_API_KEY` | string | reads config | Override the API key in `data/motions/config.json`. | | `SANAD_LIVE_SCRIPT` | path | auto | Override the subprocess entry script path. | | `SANAD_RECORD` | `0` or `1` | `1` | Record every Gemini turn to `data/recordings/`. | | `SANAD_AEC_ENABLE` | `0` or `1` | `1` | Enable WebRTC AEC3 (if the Python binding is installed). | ## Dashboard features ### Operations Quick-fire SDK + JSONL arm actions (chip buttons), gestural speaking toggle. ### Voice & Audio - **Live Voice Commands** — arm trigger from user transcripts (wake-phrase → arm action). Master gate + Deferred-trigger toggle. - **Live Gemini Process** — start/stop the voice conversation subprocess, tail its log. - **Typed Replay** — Gemini reads typed text aloud (wrapped with a "repeat verbatim" prompt). - **Gemini API Key** — hot-swap the key without restart. - **Wake Phrase Manager** — add/remove phrase → action bindings. ### Motion & Replay - **Motion Control** — list SDK (built-in) + JSONL (recorded) actions, select + play. Cancel smoothly returns to `arm_home.jsonl`. - **Replay Manager** — upload `.jsonl` files, test-play with speed, Teaching Mode (kinesthetic record). - **Macro Recorder** — Record new audio+motion pair, OR pick any WAV + any motion (SDK or JSONL) and Play them in parallel. ### Recordings Skill Registry (predefined audio+motion skills from `skills.json`) + Saved Records (Gemini turn recordings). ## Architecture notes - **Subprocess isolation**: `voice/sanad_voice.py` runs as a child of `main.py` via `gemini/subprocess.py`. If the voice loop crashes, the dashboard + arm stay up. - **Brain contract**: see `voice/model_script.py` — any new model (OpenAI Realtime, Claude Voice, local offline) implements `__init__(audio_io, recorder, voice, system_prompt)`, `async run()`, `stop()`. Drop a file in `voice/` or a new `/` folder, add a branch to `voice/sanad_voice.py:_build_brain()`. - **Supervisor contract**: each brain ships a sibling supervisor (e.g., `gemini/subprocess.py`) that spawns `sanad_voice.py` with its `SANAD_VOICE_BRAIN` env var and parses the brain's log markers. Template: `voice/model_subprocess.py`. - **Audio routing**: the G1's platform-sound PulseAudio sink is NOT wired to a physical speaker. All dashboard-triggered playback (`play_wav`, typed-replay audio, record playback) routes through DDS `AudioClient.PlayStream` via `audio_manager._play_pcm_via_g1`. The PyAudio path is kept as a desktop/dev fallback only. - **Arm replay**: `motion/arm_controller.py:_replay_file_inner()` is a verbatim port of `G1_Lootah/Manual_Recorder/g1_replay_v4_stable.py:Run()` — ramp-in → settle hold → playback → smooth return → disable SDK. Cancel breaks the play loop; `_return_home()` runs unconditionally afterwards for a jerk-free return. ## Dynamic paths Every path is derived at runtime — no hard-coded `/home/zedx/…` anywhere. Resolution order for `BASE_DIR` in `config.py`: 1. `SANAD_PROJECT_ROOT` env var (if set). 2. `PROJECT_BASE + PROJECT_NAME` from a `.env` file in `Sanad/` or its parent. 3. `Path(__file__).resolve().parent` — auto-detected. The project runs unchanged from either layout: - dev: `/Project/Sanad/` - deployed: `/home/unitree/Sanad/` ## Deployment (workstation → robot) ```bash rsync -av --delete \ --exclude=__pycache__ --exclude=logs --exclude=model --exclude=.git \ /path/to/Sanad/ \ unitree@192.168.123.164:/home/unitree/Sanad/ ``` Then on the robot: `Ctrl+C` the running `main.py` and re-run. ## Troubleshooting | Symptom | Fix | |---|---| | `No LowState received in 2s — refusing to replay` | `main.py` was re-executed as both `__main__` and `Project.Sanad.main`, creating two arm instances. Fix lives in the `sys.modules` alias at `main.py:~50`. Restart. | | `G1ArmActionClient not available — skipping` for SDK actions | Same duplicate-init issue as above. | | `No module named 'Project'` in subprocess | Bootstrap preamble in `voice/sanad_voice.py:~30` synthesises the `Project.Sanad` namespace when run as `__main__`. | | Arm jumps at start of JSONL replay | `SETTLE_HOLD_SEC` (in `config/motion_config.json > arm_controller`) too low — try `0.7` or `1.0`. | | Record playback silent | `audio_mgr.play_wav` only routes to G1 DDS if the Unitree SDK is importable; on desktop it falls back to the PulseAudio sink. | | Live Voice Commands transcript stuck | Deferred trigger was queued but `trigger_enabled` toggle was off. Toggle on — or the pending-trigger poll now fires it automatically once enabled. | | Gemini "no audio" on Typed Replay | Non-deterministic; the retry chain in `voice/typed_replay.py:generate_audio` tries three prompt variants. For reliable TTS, use the offline `local_tts` SpeechT5 path. | | Dashboard `Not Found` 404s for `/api/vision/*` | Vision module was deleted; HTML still has stale fetches for a few endpoints. Cosmetic — `dashboard/static/index.html` init block already skips most. | ## License / attribution Internal project for YS Lootah Technology. Reuses/ports patterns from: - `G1_Lootah/Manual_Recorder/g1_replay_v4_stable.py` (arm replay math) - `SanadVoice/gemini_interact` (arm-phrase dispatch, skill registry) - `SanadVoice/gemini_voice_v2` (local SpeechT5 TTS) - Unitree `unitree_sdk2py` (G1 low-level SDK, LocoClient, G1ArmActionClient)