Update 2026-04-21 12:10:35

2026-04-21 12:10:37 +04:00 · 2026-04-21 12:10:37 +04:00 · f7da15da1b
commit f7da15da1b
parent 1693776f3f
4 changed files with 208 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -0,0 +1,145 @@
+# Sanad
+
+Voice + motion assistant for the Unitree G1 humanoid. Gemini Live handles
+conversation; the arm controller plays built-in SDK poses and recorded
+JSONL macros; everything is orchestrated by a FastAPI dashboard.
+
+```
+┌────────────────────────────────────────────────────────────────────┐
+│  Dashboard (FastAPI) ── http://<robot>:8000                        │
+│  ├─ Voice & Audio      Live Gemini, Typed Replay, Wake Phrases     │
+│  ├─ Motion & Replay    SDK actions, JSONL replays, teaching mode   │
+│  ├─ Camera & Vision    (deprecated, UI kept for compat)            │
+│  ├─ Recordings         Skills registry, saved Gemini turns         │
+│  └─ Settings & Logs    System info, tail live log                  │
+└────────────────────────────────────────────────────────────────────┘
+        │
+        ├─ voice/sanad_voice.py  (subprocess — Gemini Live audio loop)
+        ├─ gemini/client.py      (short-session client for Typed Replay)
+        ├─ gemini/subprocess.py  (spawns+supervises sanad_voice.py)
+        ├─ motion/arm_controller.py  (G1 arm DDS publisher)
+        ├─ voice/audio_io.py     (mic + speaker abstraction — 3 profiles)
+        └─ core/brain.py         (skill dispatcher, event bus)
+```
+
+
+## Quick start (on the robot)
+
+```bash
+conda activate gemini_sdk
+cd ~/Sanad
+python3 main.py
+```
+
+Then open `http://<robot-ip>:8000` in a browser.
+
+
+## Directory layout
+
+| Path | Contents |
+|---|---|
+| `main.py` | Entry point — boots all subsystems + dashboard. |
+| `config.py` | Runtime constants derived from `config/*_config.json`. |
+| `config/` | Per-subsystem JSON config: `core`, `voice`, `gemini`, `motion`, `dashboard`, `local`. |
+| `core/` | Brain, skill registry, event bus, config loader, logger. |
+| `gemini/` | Gemini Live — `client.py` (one-shot), `script.py` (live brain), `subprocess.py` (supervisor). |
+| `voice/` | `sanad_voice.py` (subprocess entry), `audio_io.py` (mic/speaker), `audio_manager.py`, `local_tts.py`, `live_voice_loop.py`, `typed_replay.py`, `wake_phrase_manager.py`, `text_utils.py`, `model_script.py` (brain template). |
+| `local/` | Offline pipeline skeleton — Silero VAD, Whisper, Qwen (via Ollama), CosyVoice2. Opt-in via `SANAD_VOICE_BRAIN=local`. |
+| `motion/` | `arm_controller.py` (main), `sanad_arm_controller.py`, `macro_player.py`, `macro_recorder.py`, `teaching.py`. |
+| `dashboard/` | FastAPI routes (`dashboard/routes/*.py`) + static UI (`dashboard/static/index.html`). |
+| `scripts/` | Persona files — `sanad_script.txt` (voice persona), `sanad_rule.txt`, `sanad_arm.txt` (voice→arm phrases). |
+| `data/` | Runtime state — `audio/` (typed-replay WAVs), `motions/` (arm JSONL files), `recordings/` (live-captured turns), `motions/config.json` (dashboard-editable settings). |
+| `model/` | Place for local SpeechT5 / CosyVoice2 weights when using offline pipeline. |
+| `logs/` | Per-module rotating logs. |
+
+
+## Runtime selection (env vars)
+
+| Var | Values | Default | Effect |
+|---|---|---|---|
+| `SANAD_AUDIO_PROFILE` | `builtin`, `anker`, `hollyland_builtin` | `builtin` | Which mic + speaker pair `audio_io.py` mounts. `builtin` = G1 UDP mic + G1 chest speaker via DDS. |
+| `SANAD_VOICE_BRAIN` | `gemini`, `local`, `model` | `gemini` | Which brain the subprocess loads (see `voice/sanad_voice.py:_build_brain`). |
+| `SANAD_DDS_INTERFACE` | network iface | `eth0` | DDS network for G1 low-level comms. |
+| `SANAD_GEMINI_API_KEY` | string | reads config | Override the API key in `data/motions/config.json`. |
+| `SANAD_LIVE_SCRIPT` | path | auto | Override the subprocess entry script path. |
+| `SANAD_RECORD` | `0` or `1` | `1` | Record every Gemini turn to `data/recordings/`. |
+| `SANAD_AEC_ENABLE` | `0` or `1` | `1` | Enable WebRTC AEC3 (if the Python binding is installed). |
+
+
+## Dashboard features
+
+### Operations
+Quick-fire SDK + JSONL arm actions (chip buttons), gestural speaking toggle.
+
+### Voice & Audio
+- **Live Voice Commands** — arm trigger from user transcripts (wake-phrase → arm action). Master gate + Deferred-trigger toggle.
+- **Live Gemini Process** — start/stop the voice conversation subprocess, tail its log.
+- **Typed Replay** — Gemini reads typed text aloud (wrapped with a "repeat verbatim" prompt).
+- **Gemini API Key** — hot-swap the key without restart.
+- **Wake Phrase Manager** — add/remove phrase → action bindings.
+
+### Motion & Replay
+- **Motion Control** — list SDK (built-in) + JSONL (recorded) actions, select + play. Cancel smoothly returns to `arm_home.jsonl`.
+- **Replay Manager** — upload `.jsonl` files, test-play with speed, Teaching Mode (kinesthetic record).
+- **Macro Recorder** — Record new audio+motion pair, OR pick any WAV + any motion (SDK or JSONL) and Play them in parallel.
+
+### Recordings
+Skill Registry (predefined audio+motion skills from `skills.json`) + Saved Records (Gemini turn recordings).
+
+
+## Architecture notes
+
+- **Subprocess isolation**: `voice/sanad_voice.py` runs as a child of `main.py` via `gemini/subprocess.py`. If the voice loop crashes, the dashboard + arm stay up.
+- **Brain contract**: see `voice/model_script.py` — any new model (OpenAI Realtime, Claude Voice, local offline) implements `__init__(audio_io, recorder, voice, system_prompt)`, `async run()`, `stop()`. Drop a file in `voice/` or a new `<brand>/` folder, add a branch to `voice/sanad_voice.py:_build_brain()`.
+- **Supervisor contract**: each brain ships a sibling supervisor (e.g., `gemini/subprocess.py`) that spawns `sanad_voice.py` with its `SANAD_VOICE_BRAIN` env var and parses the brain's log markers. Template: `voice/model_subprocess.py`.
+- **Audio routing**: the G1's platform-sound PulseAudio sink is NOT wired to a physical speaker. All dashboard-triggered playback (`play_wav`, typed-replay audio, record playback) routes through DDS `AudioClient.PlayStream` via `audio_manager._play_pcm_via_g1`. The PyAudio path is kept as a desktop/dev fallback only.
+- **Arm replay**: `motion/arm_controller.py:_replay_file_inner()` is a verbatim port of `G1_Lootah/Manual_Recorder/g1_replay_v4_stable.py:Run()` — ramp-in → settle hold → playback → smooth return → disable SDK. Cancel breaks the play loop; `_return_home()` runs unconditionally afterwards for a jerk-free return.
+
+
+## Dynamic paths
+
+Every path is derived at runtime — no hard-coded `/home/zedx/…` anywhere.
+Resolution order for `BASE_DIR` in `config.py`:
+
+1. `SANAD_PROJECT_ROOT` env var (if set).
+2. `PROJECT_BASE + PROJECT_NAME` from a `.env` file in `Sanad/` or its parent.
+3. `Path(__file__).resolve().parent` — auto-detected.
+
+The project runs unchanged from either layout:
+- dev: `<anywhere>/Project/Sanad/`
+- deployed: `/home/unitree/Sanad/`
+
+
+## Deployment (workstation → robot)
+
+```bash
+rsync -av --delete \
+  --exclude=__pycache__ --exclude=logs --exclude=model --exclude=.git \
+  /path/to/Sanad/ \
+  unitree@192.168.123.164:/home/unitree/Sanad/
+```
+
+Then on the robot: `Ctrl+C` the running `main.py` and re-run.
+
+
+## Troubleshooting
+
+| Symptom | Fix |
+|---|---|
+| `No LowState received in 2s — refusing to replay` | `main.py` was re-executed as both `__main__` and `Project.Sanad.main`, creating two arm instances. Fix lives in the `sys.modules` alias at `main.py:~50`. Restart. |
+| `G1ArmActionClient not available — skipping` for SDK actions | Same duplicate-init issue as above. |
+| `No module named 'Project'` in subprocess | Bootstrap preamble in `voice/sanad_voice.py:~30` synthesises the `Project.Sanad` namespace when run as `__main__`. |
+| Arm jumps at start of JSONL replay | `SETTLE_HOLD_SEC` (in `config/motion_config.json > arm_controller`) too low — try `0.7` or `1.0`. |
+| Record playback silent | `audio_mgr.play_wav` only routes to G1 DDS if the Unitree SDK is importable; on desktop it falls back to the PulseAudio sink. |
+| Live Voice Commands transcript stuck | Deferred trigger was queued but `trigger_enabled` toggle was off. Toggle on — or the pending-trigger poll now fires it automatically once enabled. |
+| Gemini "no audio" on Typed Replay | Non-deterministic; the retry chain in `voice/typed_replay.py:generate_audio` tries three prompt variants. For reliable TTS, use the offline `local_tts` SpeechT5 path. |
+| Dashboard `Not Found` 404s for `/api/vision/*` | Vision module was deleted; HTML still has stale fetches for a few endpoints. Cosmetic — `dashboard/static/index.html` init block already skips most. |
+
+
+## License / attribution
+
+Internal project for YS Lootah Technology. Reuses/ports patterns from:
+- `G1_Lootah/Manual_Recorder/g1_replay_v4_stable.py` (arm replay math)
+- `SanadVoice/gemini_interact` (arm-phrase dispatch, skill registry)
+- `SanadVoice/gemini_voice_v2` (local SpeechT5 TTS)
+- Unitree `unitree_sdk2py` (G1 low-level SDK, LocoClient, G1ArmActionClient)
--- a/dashboard/routes/macros.py
+++ b/dashboard/routes/macros.py
@ -112,6 +112,36 @@ async def list_motion_files():
    return {"files": files, "dir": str(MOTIONS_DIR)}


+@router.post("/stop-combined")
+async def stop_combined():
+    """Immediately stop any in-flight combined playback.
+
+    - `arm.cancel()` — breaks the replay loop and triggers the smooth
+      return-to-home ramp (see `_return_home` in arm_controller.py).
+    - `audio_mgr.stop_playback()` — sends AUDIO_STOP_PLAY to the G1
+      chest speaker via DDS.
+    Both run unconditionally so Stop works even if only one side was
+    actually playing.
+    """
+    from Project.Sanad.main import audio_mgr, arm
+    result = {"motion_stopped": False, "audio_stopped": False}
+    if arm is not None:
+        try:
+            arm.cancel()
+            result["motion_stopped"] = True
+        except Exception as exc:
+            log.warning("stop-combined: arm.cancel failed: %s", exc)
+            result["motion_error"] = str(exc)
+    if audio_mgr is not None:
+        try:
+            audio_mgr.stop_playback()
+            result["audio_stopped"] = True
+        except Exception as exc:
+            log.warning("stop-combined: audio stop failed: %s", exc)
+            result["audio_error"] = str(exc)
+    return {"ok": True, **result}
+
+
@router.post("/play-combined")
 async def play_combined(payload: ComboPlayPayload):
    """Fire a user-picked audio clip and arm action in parallel.
--- a/dashboard/static/index.html
+++ b/dashboard/static/index.html
@ -459,6 +459,7 @@
      <div style="align-self:flex-end;display:flex;gap:.3rem">
        <button class="btn btn-ghost btn-sm" onclick="refreshCombo()" title="Reload file lists">↻</button>
        <button class="btn btn-success btn-sm" onclick="playCombo(this)">Play</button>
+        <button class="btn btn-danger btn-sm" onclick="stopCombo(this)" title="Stop audio + return arm to home">Stop</button>
      </div>
    </div>

@ -1193,6 +1194,19 @@ async function playCombo(b){
  }catch(e){st.textContent='Failed';}
  btnDone(b);
 }
+async function stopCombo(b){
+  const st=document.getElementById('combo-status');
+  btnLoad(b);
+  try{
+    const r=await api('POST','/api/macros/stop-combined');
+    const parts=[];
+    if(r.motion_stopped)parts.push('motion stopped');
+    if(r.audio_stopped)parts.push('audio stopped');
+    st.textContent='Stopped: '+(parts.join(', ')||'nothing was playing');
+    toast('Stopped','info');
+  }catch(e){st.textContent='Stop failed';}
+  btnDone(b);
+}

 // Replay
 async function refreshReplayFiles(){try{const r=await api('GET','/api/replay/files');const el=document.getElementById('replay-files');if(!(r.files||[]).length){el.innerHTML='<div class="empty">No motion files</div>';return;}el.innerHTML='<table><tr><th>File</th><th>Frames</th><th>Duration</th><th>Size</th><th></th></tr>'+(r.files||[]).map(f=>`<tr><td>${esc(f.name)}</td><td>${f.frames}</td><td>${f.duration_sec}s</td><td>${f.size_kb}KB</td><td><button class="btn btn-primary btn-sm" onclick="document.getElementById('replay-name').value='${esc(f.name)}';testReplay()">Play</button> <button class="btn btn-danger btn-sm" onclick="deleteMotionFile('${esc(f.name)}')">Del</button></td></tr>`).join('')+'</table>';}catch(e){}}
--- a/voice/audio_manager.py
+++ b/voice/audio_manager.py
@ -218,6 +218,25 @@ class AudioManager:
    _G1_STREAM_APP = "sanad_playback"
    _G1_HW_RATE = 16_000

+    def stop_playback(self) -> None:
+        """Stop any in-flight G1 DDS audio stream.
+
+        Used by the dashboard's Stop button to halt `play_wav` /
+        `_play_pcm_via_g1` mid-stream. Safe to call even when nothing
+        is playing — the DDS call is idempotent.
+        """
+        client = self._get_g1_audio_client()
+        if client is None:
+            return
+        try:
+            client._Call(
+                ROBOT_API_ID_AUDIO_STOP_PLAY,
+                json.dumps({"app_name": self._G1_STREAM_APP}),
+            )
+            log.info("G1 audio stream stopped (app=%s)", self._G1_STREAM_APP)
+        except Exception as exc:
+            log.warning("stop_playback failed: %s", exc)
+
    def _play_pcm_via_g1(self, pcm_bytes: bytes, channels: int, source_rate: int) -> None:
        """Stream int16 PCM to the G1 chest speaker via AudioClient.PlayStream.