8.9 KiB
8.9 KiB
Sanad
Voice + motion assistant for the Unitree G1 humanoid. Gemini Live handles conversation; the arm controller plays built-in SDK poses and recorded JSONL macros; everything is orchestrated by a FastAPI dashboard.
┌────────────────────────────────────────────────────────────────────┐
│ Dashboard (FastAPI) ── http://<robot>:8000 │
│ ├─ Voice & Audio Live Gemini, Typed Replay, Wake Phrases │
│ ├─ Motion & Replay SDK actions, JSONL replays, teaching mode │
│ ├─ Camera & Vision (deprecated, UI kept for compat) │
│ ├─ Recordings Skills registry, saved Gemini turns │
│ └─ Settings & Logs System info, tail live log │
└────────────────────────────────────────────────────────────────────┘
│
├─ voice/sanad_voice.py (subprocess — Gemini Live audio loop)
├─ gemini/client.py (short-session client for Typed Replay)
├─ gemini/subprocess.py (spawns+supervises sanad_voice.py)
├─ motion/arm_controller.py (G1 arm DDS publisher)
├─ voice/audio_io.py (mic + speaker abstraction — 3 profiles)
└─ core/brain.py (skill dispatcher, event bus)
Quick start (on the robot)
conda activate gemini_sdk
cd ~/Sanad
python3 main.py
Then open http://<robot-ip>:8000 in a browser.
Directory layout
| Path | Contents |
|---|---|
main.py |
Entry point — boots all subsystems + dashboard. |
config.py |
Runtime constants derived from config/*_config.json. |
config/ |
Per-subsystem JSON config: core, voice, gemini, motion, dashboard, local. |
core/ |
Brain, skill registry, event bus, config loader, logger. |
gemini/ |
Gemini Live — client.py (one-shot), script.py (live brain), subprocess.py (supervisor). |
voice/ |
sanad_voice.py (subprocess entry), audio_io.py (mic/speaker), audio_manager.py, local_tts.py, live_voice_loop.py, typed_replay.py, wake_phrase_manager.py, text_utils.py, model_script.py (brain template). |
local/ |
Offline pipeline skeleton — Silero VAD, Whisper, Qwen (via Ollama), CosyVoice2. Opt-in via SANAD_VOICE_BRAIN=local. |
motion/ |
arm_controller.py (main), sanad_arm_controller.py, macro_player.py, macro_recorder.py, teaching.py. |
dashboard/ |
FastAPI routes (dashboard/routes/*.py) + static UI (dashboard/static/index.html). |
scripts/ |
Persona files — sanad_script.txt (voice persona), sanad_rule.txt, sanad_arm.txt (voice→arm phrases). |
data/ |
Runtime state — audio/ (typed-replay WAVs), motions/ (arm JSONL files), recordings/ (live-captured turns), motions/config.json (dashboard-editable settings). |
model/ |
Place for local SpeechT5 / CosyVoice2 weights when using offline pipeline. |
logs/ |
Per-module rotating logs. |
Runtime selection (env vars)
| Var | Values | Default | Effect |
|---|---|---|---|
SANAD_AUDIO_PROFILE |
builtin, anker, hollyland_builtin |
builtin |
Which mic + speaker pair audio_io.py mounts. builtin = G1 UDP mic + G1 chest speaker via DDS. |
SANAD_VOICE_BRAIN |
gemini, local, model |
gemini |
Which brain the subprocess loads (see voice/sanad_voice.py:_build_brain). |
SANAD_DDS_INTERFACE |
network iface | eth0 |
DDS network for G1 low-level comms. |
SANAD_GEMINI_API_KEY |
string | reads config | Override the API key in data/motions/config.json. |
SANAD_LIVE_SCRIPT |
path | auto | Override the subprocess entry script path. |
SANAD_RECORD |
0 or 1 |
1 |
Record every Gemini turn to data/recordings/. |
SANAD_AEC_ENABLE |
0 or 1 |
1 |
Enable WebRTC AEC3 (if the Python binding is installed). |
Dashboard features
Operations
Quick-fire SDK + JSONL arm actions (chip buttons), gestural speaking toggle.
Voice & Audio
- Live Voice Commands — arm trigger from user transcripts (wake-phrase → arm action). Master gate + Deferred-trigger toggle.
- Live Gemini Process — start/stop the voice conversation subprocess, tail its log.
- Typed Replay — Gemini reads typed text aloud (wrapped with a "repeat verbatim" prompt).
- Gemini API Key — hot-swap the key without restart.
- Wake Phrase Manager — add/remove phrase → action bindings.
Motion & Replay
- Motion Control — list SDK (built-in) + JSONL (recorded) actions, select + play. Cancel smoothly returns to
arm_home.jsonl. - Replay Manager — upload
.jsonlfiles, test-play with speed, Teaching Mode (kinesthetic record). - Macro Recorder — Record new audio+motion pair, OR pick any WAV + any motion (SDK or JSONL) and Play them in parallel.
Recordings
Skill Registry (predefined audio+motion skills from skills.json) + Saved Records (Gemini turn recordings).
Architecture notes
- Subprocess isolation:
voice/sanad_voice.pyruns as a child ofmain.pyviagemini/subprocess.py. If the voice loop crashes, the dashboard + arm stay up. - Brain contract: see
voice/model_script.py— any new model (OpenAI Realtime, Claude Voice, local offline) implements__init__(audio_io, recorder, voice, system_prompt),async run(),stop(). Drop a file invoice/or a new<brand>/folder, add a branch tovoice/sanad_voice.py:_build_brain(). - Supervisor contract: each brain ships a sibling supervisor (e.g.,
gemini/subprocess.py) that spawnssanad_voice.pywith itsSANAD_VOICE_BRAINenv var and parses the brain's log markers. Template:voice/model_subprocess.py. - Audio routing: the G1's platform-sound PulseAudio sink is NOT wired to a physical speaker. All dashboard-triggered playback (
play_wav, typed-replay audio, record playback) routes through DDSAudioClient.PlayStreamviaaudio_manager._play_pcm_via_g1. The PyAudio path is kept as a desktop/dev fallback only. - Arm replay:
motion/arm_controller.py:_replay_file_inner()is a verbatim port ofG1_Lootah/Manual_Recorder/g1_replay_v4_stable.py:Run()— ramp-in → settle hold → playback → smooth return → disable SDK. Cancel breaks the play loop;_return_home()runs unconditionally afterwards for a jerk-free return.
Dynamic paths
Every path is derived at runtime — no hard-coded /home/zedx/… anywhere.
Resolution order for BASE_DIR in config.py:
SANAD_PROJECT_ROOTenv var (if set).PROJECT_BASE + PROJECT_NAMEfrom a.envfile inSanad/or its parent.Path(__file__).resolve().parent— auto-detected.
The project runs unchanged from either layout:
- dev:
<anywhere>/Project/Sanad/ - deployed:
/home/unitree/Sanad/
Deployment (workstation → robot)
rsync -av --delete \
--exclude=__pycache__ --exclude=logs --exclude=model --exclude=.git \
/path/to/Sanad/ \
unitree@192.168.123.164:/home/unitree/Sanad/
Then on the robot: Ctrl+C the running main.py and re-run.
Troubleshooting
| Symptom | Fix |
|---|---|
No LowState received in 2s — refusing to replay |
main.py was re-executed as both __main__ and Project.Sanad.main, creating two arm instances. Fix lives in the sys.modules alias at main.py:~50. Restart. |
G1ArmActionClient not available — skipping for SDK actions |
Same duplicate-init issue as above. |
No module named 'Project' in subprocess |
Bootstrap preamble in voice/sanad_voice.py:~30 synthesises the Project.Sanad namespace when run as __main__. |
| Arm jumps at start of JSONL replay | SETTLE_HOLD_SEC (in config/motion_config.json > arm_controller) too low — try 0.7 or 1.0. |
| Record playback silent | audio_mgr.play_wav only routes to G1 DDS if the Unitree SDK is importable; on desktop it falls back to the PulseAudio sink. |
| Live Voice Commands transcript stuck | Deferred trigger was queued but trigger_enabled toggle was off. Toggle on — or the pending-trigger poll now fires it automatically once enabled. |
| Gemini "no audio" on Typed Replay | Non-deterministic; the retry chain in voice/typed_replay.py:generate_audio tries three prompt variants. For reliable TTS, use the offline local_tts SpeechT5 path. |
Dashboard Not Found 404s for /api/vision/* |
Vision module was deleted; HTML still has stale fetches for a few endpoints. Cosmetic — dashboard/static/index.html init block already skips most. |
License / attribution
Internal project for YS Lootah Technology. Reuses/ports patterns from:
G1_Lootah/Manual_Recorder/g1_replay_v4_stable.py(arm replay math)SanadVoice/gemini_interact(arm-phrase dispatch, skill registry)SanadVoice/gemini_voice_v2(local SpeechT5 TTS)- Unitree
unitree_sdk2py(G1 low-level SDK, LocoClient, G1ArmActionClient)