Update 2026-06-08 12:59:00

2026-06-08 12:59:01 +04:00 · 2026-06-08 12:59:01 +04:00 · 4210c4cc61
commit 4210c4cc61
parent ca0de44401
3 changed files with 261 additions and 95 deletions
--- a/README.md
+++ b/README.md
@ -1,34 +1,44 @@
 # Sanad

-Voice + motion assistant for the Unitree G1 humanoid. Gemini Live handles
-conversation; the arm controller plays built-in SDK poses and recorded
-JSONL macros; everything is orchestrated by a FastAPI dashboard.
+Voice + motion assistant for the Unitree G1 humanoid. **Gemini Live** (or a
+fully-offline pipeline) handles bilingual Arabic/English conversation; an arm
+controller plays built-in SDK poses and recorded JSONL macros; a locomotion
+controller walks/turns the robot; an optional camera feeds **Gemini-side face &
+place recognition**; everything is orchestrated through a fault-isolated
+**FastAPI dashboard** on `http://<robot>:8000`.

 ```
-┌────────────────────────────────────────────────────────────────────┐
-│  Dashboard (FastAPI) ── http://<robot>:8000                        │
-│  ├─ Operations         Quick-fire arm actions                      │
-│  ├─ Voice & Audio      Live Gemini, Typed Replay, Wake Phrases     │
-│  ├─ Motion & Replay    SDK actions, JSONL replays, teaching mode   │
-│  ├─ Recognition        Camera vision + face gallery (Gemini-side)  │
-│  ├─ Recordings         Skills registry, saved Gemini turns         │
-│  └─ Settings & Logs    System info, tail live log                  │
-└────────────────────────────────────────────────────────────────────┘
+┌──────────────────────────────────────────────────────────────────────┐
+│  Dashboard (FastAPI) ── http://<robot>:8000                            │
+│  ├─ Operations         Quick-fire arm actions + gestural-speaking      │
+│  ├─ Voice & Audio      Live Gemini, Typed Replay, Wake Phrases, Audio  │
+│  ├─ Motion & Replay    SDK actions, JSONL replays, macros, teaching    │
+│  ├─ Controller         Locomotion teleop, postures, FSM modes, E-STOP  │
+│  ├─ Recognition        Camera vision + face gallery + zones/places     │
+│  ├─ Recordings         Skill registry, saved Gemini turns              │
+│  ├─ Temperature        Live 3D motor-temperature heatmap (three.js)    │
+│  ├─ Terminal           In-browser shell (PTY) to the robot             │
+│  └─ Settings & Logs    System info, tail/stream live logs              │
+└──────────────────────────────────────────────────────────────────────┘
        │
-        ├─ voice/sanad_voice.py  (subprocess — Gemini Live audio loop)
-        ├─ gemini/script.py      (Gemini Live brain — audio + video + state)
-        ├─ gemini/client.py      (short-session client for Typed Replay)
-        ├─ gemini/subprocess.py  (spawns+supervises sanad_voice.py;
-        │                         pushes camera frames + motion state
-        │                         to the child over its stdin)
-        ├─ vision/camera.py      (RealSense/USB capture daemon)
-        ├─ vision/face_gallery.py (data/faces/ CRUD for the primer turn)
-        ├─ motion/arm_controller.py  (G1 arm DDS publisher)
-        ├─ voice/audio_io.py     (mic + speaker abstraction — 3 profiles)
-        └─ core/brain.py         (skill dispatcher, event bus)
+        ├─ voice/sanad_voice.py      (subprocess — model-agnostic voice loop)
+        │    ├─ gemini/script.py     (Gemini Live brain — audio+video+state)
+        │    └─ local/script.py      (offline brain — VAD→STT→LLM→TTS)
+        ├─ gemini/client.py          (short-session client for Typed Replay)
+        ├─ gemini/subprocess.py      (spawns+supervises sanad_voice.py;
+        │                             pushes camera frames + motion state
+        │                             to the child over its stdin)
+        ├─ voice/movement_dispatch.py(Gemini spoken phrase → locomotion)
+        ├─ vision/camera.py          (RealSense/USB capture daemon)
+        ├─ vision/face_gallery.py    (data/faces/ CRUD for the primer turn)
+        ├─ vision/zone_gallery.py    (data/zones/ places + "go here" targets)
+        ├─ motion/arm_controller.py  (G1 arm DDS publisher — owns DDS init)
+        ├─ G1_Controller/loco_controller.py (G1 locomotion via LocoClient)
+        ├─ voice/audio_io.py         (mic + speaker abstraction — 3 profiles)
+        └─ core/brain.py             (skill dispatcher, event bus)
 ```

-### Camera + face recognition data flow
+### Camera + face/place recognition data flow

 ```
 CameraDaemon (parent, in-memory JPEG+b64 cache)
@ -43,6 +53,10 @@ ArmController ─emit→ event bus ─→ main.py ─→ live_sub.send_state()
                            │             session.send_realtime_input(video=Blob)
                            └─ state: → _STATE_PENDING → _send_state_loop →
                                          session.send_realtime_input(text=…)
+
+Recognition toggles (vision / face-rec / zone-rec / movement) are written by the
+dashboard to data/.recognition_state.json and POLLED by the Gemini child at 1 Hz
+— so flipping a toggle takes effect mid-session with NO restart.
 ```


@ -54,37 +68,155 @@ cd ~/Sanad
 python3 main.py
 ```

-Then open `http://<robot-ip>:8000` in a browser.
+Then open `http://<robot-ip>:8000` in a browser. (The dashboard binds to the
+`wlan0` IP by default — see *Runtime selection* to override.)
+
+Fully-offline brain (no cloud): `SANAD_VOICE_BRAIN=local python3 main.py`
+(requires `ollama serve` + the local model env — see *Voice brains*).
+
+> **Gemini API key — required, none ships with the repo.** The `api_key`
+> fields in `config/core_config.json` (`gemini_defaults`) and
+> `data/motions/config.json` (`gemini`) are intentionally empty (`""`).
+> The voice loop cannot connect until you supply one, by any of:
+> - **Dashboard** → *Voice & Audio → Gemini API Key* — paste + save, hot-swaps live (no restart). Persists to `data/motions/config.json`.
+> - **Env var** — `export SANAD_GEMINI_API_KEY=AIza...` before `python3 main.py`.
+> - **Config file** — set `gemini_defaults.api_key` in `config/core_config.json`.
+>
+> Precedence (highest first): `data/motions/config.json` → `SANAD_GEMINI_API_KEY` → `config/core_config.json`. Get a key at <https://aistudio.google.com/apikey>.
+
+
+## Dashboard features
+
+### Operations
+Quick-fire SDK + JSONL arm actions (chip buttons), gestural-speaking toggle.
+
+### Voice & Audio
+- **Live Voice Commands** — fire arm gestures from the *user's* transcript
+  (wake-phrase → arm action). Master gate + Deferred-trigger toggle.
+- **Live Gemini Process** — start/stop the voice conversation subprocess, tail
+  its log. Choose the Gemini cloud brain or the offline brain via
+  `SANAD_VOICE_BRAIN`.
+- **Typed Replay** — Gemini reads typed text aloud (wrapped with a
+  "repeat verbatim" prompt); optionally records the clip.
+- **Gemini API Key** — hot-swap the key without restart.
+- **Wake Phrase Manager** — add/remove phrase → action bindings.
+- **Audio Controls** — mic/speaker mute, G1 chest-speaker volume (DDS), device
+  profile selection, PulseAudio soft-reset and Anker USB hard-reset.
+
+### Motion & Replay
+- **Motion Control** — list SDK (built-in) + JSONL (recorded) actions, select +
+  play. Cancel smoothly returns to `arm_home.jsonl`.
+- **Replay Manager** — upload `.jsonl` files, test-play with speed, Teaching
+  Mode (kinesthetic record — limp the arm and hand-guide it).
+- **Macro Recorder** — record a new audio+motion pair, OR pick any WAV + any
+  motion (SDK or JSONL) and play them in parallel.
+
+### Controller  *(locomotion)*
+Manual teleoperation of the G1's **legs** via the Unitree `LocoClient`.
+**Disarmed every boot**; all motion writes require Arm first.
+- **Move / Step** — continuous teleop (vx/vy/vyaw) or discrete one-shot steps.
+- **Postures & FSM modes** — zero-torque, damp, squat, sit, stand, balance,
+  stand-height; prep/ready sequences; MotionSwitcher select-AI/release.
+- **Gemini Movement** — toggle voice-driven walking: the `MovementDispatcher`
+  parses Gemini's *own spoken confirmation phrases* ("Turning right." /
+  "أستدير يميناً.") and drives the legs (gated on this toggle + an E-STOP latch).
+- **E-STOP** — always available; `StopMove` + disarm + latch the dispatcher.
+
+> **Safety:** the arm and locomotion are **mutually exclusive** —
+> `arm.set_motion_block(loco.movement_active)` makes every arm
+> replay/gesture refuse while the robot is (or just was, within ~1.5 s) walking.
+
+### Recognition
+Camera vision + Gemini-side **face** and **zone/place** recognition. All are
+**off by default**; each is a **hot toggle** (≈1 s to take effect, no restart).
+- **Camera Vision** — `CameraDaemon` captures from a RealSense (preferred) or
+  USB camera; the supervisor streams JPEG frames to Gemini Live so it can answer
+  "what do you see?". Live preview panel. Auto-reconnects on USB unplug/stall
+  and warns if a RealSense negotiated USB 2.0 (Marcus-ported resilience).
+- **Face Recognition** — manage `data/faces/face_{id}/` galleries: enroll from
+  the live camera or upload photos, rename, describe, download (per-photo or
+  ZIP), delete. On session start (and on any gallery change) the child sends a
+  **primer turn** carrying every enrolled face + a Khaleeji greeting
+  instruction — **Gemini matches in-context, so there is no local
+  face-recognition model**. Recognition needs vision on.
+- **Zones & Places** — `data/zones/zone_{zid}/place_{pid}/` two-level gallery:
+  reference photos per place, optional linked face_ids, and a **"go here"** nav
+  target (`nav_target_zone/place_id` in the recognition-state file) for
+  place-aware navigation.
+- **Sync Gallery** — force-resend the face/zone primer to the live session.
+
+### Recordings
+Skill Registry (predefined audio+motion+callback skills from `skills.json`) +
+Saved Records (captured Gemini turn recordings; play/pause/stop/rename/delete).
+
+### Temperature
+Live **3D motor-temperature heatmap** — a standalone three.js viewer
+(`dashboard/static/temp3d/`) loads the G1 29-DoF URDF + STL meshes and colors
+each joint blue→red from the arm controller's throttled `rt/lowstate` snapshot,
+streamed over `/ws/motor-temps` at ~8 fps. No second DDS subscriber.
+
+### Terminal
+In-browser **PTY shell** to the robot (`/ws/terminal`, xterm.js) — a `bash -i`
+as the dashboard's user, with resize + backpressure, bounded to 4 sessions.
+(See *Security* — this is full shell access to whoever reaches the URL.)
+
+### Settings & Logs
+System info (host, network interfaces, DDS interface, bound dashboard host/port,
+per-subsystem status, audio devices), live log stream (`/ws/logs`), per-file
+tail, snapshot, and a one-blob "Copy All Logs" bundle.


 ## Directory layout

 | Path | Contents |
 |---|---|
-| `main.py` | Entry point — boots all subsystems + dashboard. |
-| `config.py` | Runtime constants derived from `config/*_config.json`. |
-| `config/` | Per-subsystem JSON config: `core`, `voice`, `gemini`, `motion`, `dashboard`, `local`. |
-| `core/` | Brain, skill registry, event bus, config loader, logger. |
+| `main.py` | Entry point — fault-isolated boot of all subsystems + the dashboard. Doubles as the service container (route handlers `import` its module globals). |
+| `config.py` | Runtime constants + layout-agnostic path resolution; layers `data/motions/config.json` over the JSON config at import. |
+| `config/` | Per-subsystem JSON: `core`, `voice`, `gemini`, `local`, `motion`, `dashboard`. |
+| `core/` | `brain.py` (skill dispatcher), `event_bus.py`, `skill_registry.py`, `config_loader.py`, `logger.py` (rotating + WS push), `asyncio_compat.py` (3.8 `to_thread` shim). |
 | `gemini/` | Gemini Live — `client.py` (one-shot), `script.py` (live brain: audio + video + motion-state), `subprocess.py` (supervisor + stdin frame/state push). |
-| `voice/` | `sanad_voice.py` (subprocess entry), `audio_io.py` (mic/speaker), `audio_manager.py`, `local_tts.py`, `live_voice_loop.py`, `typed_replay.py`, `wake_phrase_manager.py`, `text_utils.py`, `model_script.py` (brain template). |
-| `vision/` | `camera.py` (RealSense/USB capture daemon, auto-reconnect), `face_gallery.py` (`data/faces/` CRUD), `recognition_state.py` (toggle state file I/O). |
-| `local/` | Offline pipeline skeleton — Silero VAD, Whisper, Qwen (via Ollama), CosyVoice2. Opt-in via `SANAD_VOICE_BRAIN=local`. |
-| `motion/` | `arm_controller.py` (main), `sanad_arm_controller.py`, `macro_player.py`, `macro_recorder.py`, `teaching.py`. |
-| `dashboard/` | FastAPI routes (`dashboard/routes/*.py`) + static UI (`dashboard/static/index.html`). |
-| `scripts/` | Persona files — `sanad_v2` (voice persona), `sanad_rule.txt`, `sanad_arm.txt` (voice→arm phrases). |
-| `data/` | Runtime state — `audio/` (typed-replay WAVs), `motions/` (arm JSONL files), `recordings/` (live-captured turns), `faces/face_{id}/` (enrolled face galleries), `.recognition_state.json` (vision/face-rec toggle state), `motions/config.json` (dashboard-editable settings). |
-| `model/` | Place for local SpeechT5 / CosyVoice2 weights when using offline pipeline. |
+| `local/` | Fully-offline brain — `vad.py` (Silero), `stt.py` (faster-whisper), `llm.py` (Qwen via Ollama/llama.cpp), `tts.py` (CosyVoice2), `script.py` (the brain), `subprocess.py` (supervisor). Opt-in via `SANAD_VOICE_BRAIN=local`. |
+| `voice/` | `sanad_voice.py` (subprocess entry, model-agnostic), `audio_io.py` / `audio_manager.py` / `audio_devices.py` (mic/speaker), `local_tts.py` (SpeechT5 Arabic TTS), `live_voice_loop.py` (user-transcript → arm gesture), `movement_dispatch.py` (Gemini-phrase → locomotion), `typed_replay.py`, `wake_phrase_manager.py`, `text_utils.py` (Arabic normalization + phrase matching), `model_script.py` / `model_subprocess.py` (brain templates). |
+| `motion/` | `arm_controller.py` (production 5-phase JSONL replay engine, owns the single DDS init), `macro_player.py`, `macro_recorder.py`, `teaching.py`. (`sanad_arm_controller.py` is a legacy alternate — not wired by `main.py`.) |
+| `G1_Controller/` | `loco_controller.py` — locomotion via Unitree `LocoClient` (move/step/postures/FSM/E-STOP); reuses the arm's DDS participant. |
+| `vision/` | `camera.py` (RealSense/USB daemon, auto-reconnect), `face_gallery.py`, `zone_gallery.py`, `recognition_state.py` (atomic-JSON toggle IPC). |
+| `dashboard/` | `app.py` (FastAPI factory + fault-isolated router registration), `routes/*.py` (20 REST routers), `websockets/*.py` (logs, motor-temps, terminal), `static/index.html` (single-page UI), `static/temp3d/` (3D viewer). |
+| `scripts/` | Persona files — `sanad_script.txt` (voice persona "Bousandah"), `sanad_rule.txt`, `sanad_arm.txt` (voice→arm phrases). |
+| `data/` | Runtime state — `motions/*.jsonl` (arm trajectories) + `instruction.json` (locomotion phrase map) + `skills.json` + `config.json` (dashboard-editable), `recordings/` (captured turns + macros), `faces/face_{id}/` + `zones/zone_{zid}/place_{pid}/` (galleries), `audio/` (typed-replay WAVs + records index), `.recognition_state.json` (toggle IPC). |
+| `model/` | Local SpeechT5 / Whisper / CosyVoice2 weights when using the offline pipeline. |
 | `logs/` | Per-module rotating logs. |


+## Voice brains
+
+The child `voice/sanad_voice.py` is model-agnostic and selects a brain via
+`SANAD_VOICE_BRAIN`. Every brain implements the same contract
+(`__init__(audio_io, recorder, voice, system_prompt)`, `async run()`, `stop()`)
+and ships a sibling supervisor that spawns the child and parses its
+`USER:` / `BOT:` / state log markers.
+
+| Value | Brain | Pipeline |
+|---|---|---|
+| `gemini` *(default)* | `gemini/script.py` | Gemini Live native-audio (full-duplex speech-to-speech, server-side VAD, vision frames, face/zone primers, voice→movement). Cloud. |
+| `local` | `local/script.py` | Silero VAD → faster-whisper (large-v3-turbo, CUDA int8) → Qwen2.5 (Ollama/llama.cpp) → CosyVoice2 streaming TTS. Fully on-device. |
+| `model` | `voice/model_script.py` | Template/stub for adding a new provider (OpenAI Realtime, Claude Voice, …). |
+
+To add a brain: drop a file in `voice/` or a new `<brand>/` folder and add a
+branch to `voice/sanad_voice.py:_build_brain()`; ship a supervisor modeled on
+`voice/model_subprocess.py`.
+
+
 ## Runtime selection (env vars)

 | Var | Values | Default | Effect |
 |---|---|---|---|
-| `SANAD_AUDIO_PROFILE` | `builtin`, `anker`, `hollyland_builtin` | `builtin` | Which mic + speaker pair `audio_io.py` mounts. `builtin` = G1 UDP mic + G1 chest speaker via DDS. |
 | `SANAD_VOICE_BRAIN` | `gemini`, `local`, `model` | `gemini` | Which brain the subprocess loads (see `voice/sanad_voice.py:_build_brain`). |
-| `SANAD_DDS_INTERFACE` | network iface | `eth0` | DDS network for G1 low-level comms. |
-| `SANAD_GEMINI_API_KEY` | string | reads config | Override the API key in `data/motions/config.json`. |
+| `SANAD_AUDIO_PROFILE` | `builtin`, `anker`, `hollyland_builtin` | `builtin` | Mic + speaker pair. `builtin` = G1 UDP mic + G1 chest speaker via DDS. |
+| `SANAD_DDS_INTERFACE` | network iface | `eth0` | DDS network for G1 low-level comms (arm + locomotion + speaker). |
+| `SANAD_DASHBOARD_HOST` / `_INTERFACE` | IP / iface | `wlan0` IP | Dashboard bind address. |
+| `SANAD_GEMINI_API_KEY` | string | `""` (empty) | Gemini API key. No key ships in the repo — set this, paste one in the dashboard (**Voice & Audio → Gemini API Key**), or fill `gemini_defaults.api_key` in `config/core_config.json`. See [Quick start](#quick-start-on-the-robot). |
+| `SANAD_GEMINI_MODEL` / `_VOICE` | string | reads config | Override the Gemini model id / prebuilt voice. |
+| `SANAD_G1_VOLUME` | `0`–`100` | `100` | G1 chest-speaker volume; also scales the barge-in threshold. |
 | `SANAD_LIVE_SCRIPT` | path | auto | Override the subprocess entry script path. |
 | `SANAD_RECORD` | `0` or `1` | `1` | Record every Gemini turn to `data/recordings/`. |
 | `SANAD_AEC_ENABLE` | `0` or `1` | `1` | Enable WebRTC AEC3 (if the Python binding is installed). |
@ -92,64 +224,82 @@ Then open `http://<robot-ip>:8000` in a browser.
 | `SANAD_FACE_RECOGNITION_ENABLE` | `0` or `1` | `0` | Boot default for Gemini-side face recognition. Also a hot toggle. |
 | `SANAD_VISION_SEND_HZ` | float | `2` | Frames/sec the Gemini child relays to Live. |
 | `SANAD_CAMERA_WIDTH` / `_HEIGHT` / `_FPS` | int | `424` / `240` / `15` | Capture profile. Also settable per-deploy in `config/core_config.json > camera`. |
+| `SANAD_CAMERA_USB_INDEX` | int | auto | Pin a `/dev/videoN` node (avoids picking a RealSense IR stream). |
 | `SANAD_FACES_MAX_SAMPLES` | int | `3` | Max photos per person fed into the gallery primer turn (token budget). |
+| `SANAD_PROJECT_ROOT` | path | auto | Override the project root (see *Dynamic paths*). |

 > All `SANAD_VISION_*` / `SANAD_CAMERA_*` / `SANAD_FACE_*` vars are **boot
-> defaults** forwarded to the Gemini child via `LIVE_TUNE`. Once running,
-> the Recognition tab's toggles are the live source of truth — they write
-> `data/.recognition_state.json`, which the child polls at 1 Hz.
+> defaults** forwarded to the Gemini child via `LIVE_TUNE`. Once running, the
+> Recognition tab's toggles (vision / face-rec / zone-rec / movement) are the
+> live source of truth in `data/.recognition_state.json`, polled at 1 Hz.
+
+CLI flags: `python3 main.py --host <ip> --port 8000 --network <dds_iface>`;
+`--check-env` prints a subsystem/environment diagnostic and exits.


-## Dashboard features
+## API surface

-### Operations
-Quick-fire SDK + JSONL arm actions (chip buttons), gestural speaking toggle.
+All routes are registered defensively — a router whose import fails is recorded
+(`GET /api/_dashboard_status`) and the server still boots without it.

-### Voice & Audio
- **Live Voice Commands** — arm trigger from user transcripts (wake-phrase → arm action). Master gate + Deferred-trigger toggle.
- **Live Gemini Process** — start/stop the voice conversation subprocess, tail its log.
- **Typed Replay** — Gemini reads typed text aloud (wrapped with a "repeat verbatim" prompt).
- **Gemini API Key** — hot-swap the key without restart.
- **Wake Phrase Manager** — add/remove phrase → action bindings.
+**REST** (prefix → controls): `/api` health · `/api/system` info ·
+`/api/voice` Gemini/local generate+connect+key · `/api/motion` arm actions ·
+`/api/skills` skill registry · `/api/macros` record/play · `/api/replay` JSONL
+CRUD + teaching · `/api/audio` mute/volume/devices/reset · `/api/scripts`
+persona files · `/api/records` saved WAVs · `/api/prompt` system prompt ·
+`/api/wake-phrases` bindings · `/api/live-voice` arm-phrase dispatcher ·
+`/api/live-subprocess` Gemini child · `/api/typed-replay` TTS · `/api/recognition`
+vision + face gallery · `/api/zones` zones/places + nav target · `/api/temp`
+motor map + snapshot · `/api/controller` locomotion (move/step/postures/modes/
+E-STOP).

-### Motion & Replay
- **Motion Control** — list SDK (built-in) + JSONL (recorded) actions, select + play. Cancel smoothly returns to `arm_home.jsonl`.
- **Replay Manager** — upload `.jsonl` files, test-play with speed, Teaching Mode (kinesthetic record).
- **Macro Recorder** — Record new audio+motion pair, OR pick any WAV + any motion (SDK or JSONL) and Play them in parallel.
-
-### Recognition
-Camera vision + Gemini-side face recognition. Both are **off by default**;
-each is a **hot toggle** — flipping it takes effect on the running Gemini
-session within ~1 s, no restart.
-
- **Camera Vision** — when on, the `CameraDaemon` captures from a RealSense
-  (preferred) or USB camera and the supervisor streams JPEG frames to
-  Gemini Live so it can answer "what do you see?". Live preview panel.
- **Face Recognition** — manage `data/faces/face_{id}/` galleries: enroll
-  from the live camera or upload photos, rename, download (per-photo or
-  ZIP), delete. On a session start (and on any gallery change) the child
-  sends a **primer turn** carrying every enrolled face + a Khaleeji
-  greeting instruction — Gemini itself does the matching in-context, so
-  there's **no local face-recognition model**. Recognition needs vision on.
- **Sync Gallery** — force-resend the primer to the live session.
-
-The camera daemon auto-reconnects on USB unplug / stalled frames and warns
-if a RealSense negotiated USB 2.0 (Marcus-ported resilience).
-
-### Recordings
-Skill Registry (predefined audio+motion skills from `skills.json`) + Saved Records (Gemini turn recordings).
+**WebSockets**: `/ws/logs` (live log stream + 500-line replay) ·
+`/ws/motor-temps` (3D heatmap data, ~8 fps) · `/ws/terminal` (PTY shell).


 ## Architecture notes

- **Subprocess isolation**: `voice/sanad_voice.py` runs as a child of `main.py` via `gemini/subprocess.py`. If the voice loop crashes, the dashboard + arm stay up.
- **Brain contract**: see `voice/model_script.py` — any new model (OpenAI Realtime, Claude Voice, local offline) implements `__init__(audio_io, recorder, voice, system_prompt)`, `async run()`, `stop()`. Drop a file in `voice/` or a new `<brand>/` folder, add a branch to `voice/sanad_voice.py:_build_brain()`.
- **Supervisor contract**: each brain ships a sibling supervisor (e.g., `gemini/subprocess.py`) that spawns `sanad_voice.py` with its `SANAD_VOICE_BRAIN` env var and parses the brain's log markers. Template: `voice/model_subprocess.py`.
- **Audio routing**: the G1's platform-sound PulseAudio sink is NOT wired to a physical speaker. All dashboard-triggered playback (`play_wav`, typed-replay audio, record playback) routes through DDS `AudioClient.PlayStream` via `audio_manager._play_pcm_via_g1`. The PyAudio path is kept as a desktop/dev fallback only.
- **Arm replay**: `motion/arm_controller.py:_replay_file_inner()` is a verbatim port of `G1_Lootah/Manual_Recorder/g1_replay_v4_stable.py:Run()` — ramp-in → settle hold → playback → smooth return → disable SDK. Cancel breaks the play loop; `_return_home()` runs unconditionally afterwards for a jerk-free return.
- **Camera frame transport (stdin push)**: the `CameraDaemon` lives in the parent and caches frames in memory. `GeminiSubprocess` runs a `_frame_forwarder` thread that base64-encodes the latest frame and writes `frame:<b64>\n` to the child's stdin (~2 fps). The child's `_stdin_watcher` thread decodes into `_LATEST_FRAME`; `_send_frame_loop` relays it to Gemini Live with a staleness guard. This is the Marcus pattern — chosen over a file drop so the parent owns the camera once and the dashboard preview reads the same in-memory cache.
- **Motion-state channel**: `arm_controller._execute()` emits `motion.action_started` / `_done` / `_error` on the event bus. `main.py` forwards each to `live_sub.send_state()`, which writes `state:<json>\n` to the child's stdin. The child injects `[STATE-START] wave_hand`, `[STATE-DONE] wave_hand (2.3s)`, etc. into Gemini Live as silent text context (`send_realtime_input(text=…)`) so it can honestly answer "what are you doing?".
- **Face recognition is Gemini-side**: no dlib/insightface/onnxruntime. `vision/face_gallery.py` is pure file IO over `data/faces/face_{id}/` (`face_N.jpg|png` samples + optional `meta.json` with a `name`). At session start (and on any gallery change) `gemini/script.py:_send_gallery_primer()` builds one multimodal `send_client_content` turn — every enrolled face's photos + a greeting instruction — and Gemini matches incoming frames against it in-context.
+- **Subprocess isolation**: `voice/sanad_voice.py` runs as a child of `main.py`
+  via the supervisor. If the voice loop crashes, the dashboard + arm + legs stay
+  up.
+- **Single DDS init**: `motion/arm_controller.py` owns the one
+  `ChannelFactoryInitialize`; `LocoController` and the audio routes reuse that
+  participant rather than re-initializing.
+- **Brain contract**: see `voice/model_script.py` — any new model implements
+  `__init__(audio_io, recorder, voice, system_prompt)`, `async run()`, `stop()`.
+- **Supervisor contract**: each brain ships a sibling supervisor (e.g.
+  `gemini/subprocess.py`) that spawns `sanad_voice.py` with its
+  `SANAD_VOICE_BRAIN` and parses the brain's log markers. Template:
+  `voice/model_subprocess.py`.
+- **Locomotion safety**: `LocoController` is disarmed every boot, has velocity
+  caps + a `StopMove` watchdog, and is mutually exclusive with the arm.
+  Voice-driven movement is **off by default** and gated by the Controller
+  toggle. Distances/degrees in `data/motions/instruction.json` are
+  **approximate and must be calibrated on the real robot** — there is no
+  obstacle/abort stack.
+- **Audio routing**: the G1's platform-sound PulseAudio sink is NOT wired to a
+  physical speaker. All dashboard-triggered playback (`play_wav`, typed-replay
+  audio, record playback) routes through DDS `AudioClient.PlayStream` via
+  `audio_manager._play_pcm_via_g1`. The PyAudio path is a desktop/dev fallback.
+- **Arm replay**: `motion/arm_controller.py:_replay_file_inner()` is a port of
+  `G1_Lootah/Manual_Recorder/g1_replay_v4_stable.py:Run()` — ramp-in → settle
+  hold → playback → smooth return → disable SDK. Body motors (0–14) lock to a
+  live snapshot while arm motors (15–28) follow the file at 60 Hz. `_return_home()`
+  runs unconditionally after a cancel for a jerk-free return.
+- **Camera frame transport (stdin push)**: the `CameraDaemon` lives in the
+  parent and caches frames in memory. `GeminiSubprocess` base64-encodes the
+  latest frame to the child's stdin (~2 fps); the child's `_stdin_watcher`
+  relays it to Gemini Live with a staleness guard. Chosen over a file drop so
+  the parent owns the camera once and the dashboard preview reads the same cache.
+- **Motion-state channel**: `arm_controller._execute()` emits
+  `motion.action_started` / `_done` / `_error` on the event bus. `main.py`
+  forwards each to the child as `state:<json>\n`, injected to Gemini Live as
+  silent `[STATE-START] wave_hand` / `[STATE-DONE] wave_hand (2.3s)` text so it
+  can honestly answer "what are you doing?".
+- **Recognition is Gemini-side**: no dlib/insightface/onnxruntime. Galleries are
+  pure file IO; `gemini/script.py:_send_gallery_primer()` builds one multimodal
+  `send_client_content` turn — every enrolled face/place's photos + a greeting
+  instruction — and Gemini matches incoming frames against it in-context.


 ## Camera vision on Jetson
@ -188,12 +338,13 @@ Match the `--branch` tag to the installed runtime (`dpkg -l | grep librealsense2
 If the build isn't worth it, `CameraDaemon` falls back to `cv2.VideoCapture(0)`
 automatically — fine for a plain USB webcam, but note a RealSense exposes its
 *depth* stream at `/dev/video0`, not RGB, so a real USB cam is the cleaner
-fallback. On x86_64 / Ubuntu 22.04+ desktops, `pip install pyrealsense2` just works.
+fallback (or pin `SANAD_CAMERA_USB_INDEX`). On x86_64 / Ubuntu 22.04+ desktops,
+`pip install pyrealsense2` just works.


 ## Dynamic paths

-Every path is derived at runtime — no hard-coded `/home/zedx/…` anywhere.
+Every path is derived at runtime — no hard-coded `/home/...` anywhere.
 Resolution order for `BASE_DIR` in `config.py`:

 1. `SANAD_PROJECT_ROOT` env var (if set).
@ -217,17 +368,30 @@ rsync -av --delete \
 Then on the robot: `Ctrl+C` the running `main.py` and re-run.


+## Security
+
+The dashboard has **no authentication**. Anyone who can reach
+`http://<robot>:8000` gets full robot control — locomotion, arm, audio, file
+upload/delete — and, via the **Terminal tab**, an interactive shell as the
+dashboard's user. Bind it to a **trusted LAN only**; add auth before any wider
+exposure.
+
+
 ## Troubleshooting

 | Symptom | Fix |
 |---|---|
-| `No LowState received in 2s — refusing to replay` | `main.py` was re-executed as both `__main__` and `Project.Sanad.main`, creating two arm instances. Fix lives in the `sys.modules` alias at `main.py:~50`. Restart. |
+| `No LowState received in 2s — refusing to replay` | `main.py` was re-executed as both `__main__` and `Project.Sanad.main`, creating two arm instances. Fix lives in the `sys.modules` alias near the top of `main.py`. Restart. |
 | `G1ArmActionClient not available — skipping` for SDK actions | Same duplicate-init issue as above. |
 | `No module named 'Project'` in subprocess | Bootstrap preamble in `voice/sanad_voice.py:~30` synthesises the `Project.Sanad` namespace when run as `__main__`. |
+| Controller moves rejected (409) | The Controller is **disarmed by default** — hit Arm first. Reads + E-STOP are always allowed. |
+| Arm action refused while "movement armed" | Arm ↔ locomotion are mutually exclusive. Disarm/stop locomotion, then trigger the arm. |
+| Voice-driven walking does nothing | "Gemini Movement" toggle off, or E-STOP latched. Toggle on; clear E-STOP. Distances are uncalibrated. |
 | Arm jumps at start of JSONL replay | `SETTLE_HOLD_SEC` (in `config/motion_config.json > arm_controller`) too low — try `0.7` or `1.0`. |
 | Record playback silent | `audio_mgr.play_wav` only routes to G1 DDS if the Unitree SDK is importable; on desktop it falls back to the PulseAudio sink. |
-| Live Voice Commands transcript stuck | Deferred trigger was queued but `trigger_enabled` toggle was off. Toggle on — or the pending-trigger poll now fires it automatically once enabled. |
+| Live Voice Commands transcript stuck | Deferred trigger was queued but `trigger_enabled` toggle was off. Toggle on — or the pending-trigger poll fires it automatically once enabled. |
 | Gemini "no audio" on Typed Replay | Non-deterministic; the retry chain in `voice/typed_replay.py:generate_audio` tries three prompt variants. For reliable TTS, use the offline `local_tts` SpeechT5 path. |
+| Local brain exits immediately | `ollama serve` not running / model not pulled, or weights missing under `model/`. Check `logs/local_subprocess.log`. The Gemini brain is the safe default. |
 | Recognition tab: "Camera could not start (no backend)" | No camera backend acquired. Check `rs-enumerate-devices` (RealSense at OS level) and `python3 -c 'import pyrealsense2'` in the `gemini_sdk` env. The glibc `ImportError` means the pip wheel is incompatible — see "Camera vision on Jetson" above. |
 | Camera badge stuck on "reconnecting…" | `CameraDaemon` lost the device and is retrying with exponential backoff. Re-seat the USB 3 cable; check `logs/camera.log` for the USB-2.0 warning. |
 | Gemini doesn't greet an enrolled face | Face Recognition toggle on? Vision on? (Face rec needs frames.) Check `logs/gemini_brain.log` for `face gallery primed: N person(s)`. Hit "Sync Gallery" to force a re-prime. |
@ -241,6 +405,8 @@ Internal project for YS Lootah Technology. Reuses/ports patterns from:
 - `SanadVoice/gemini_interact` (arm-phrase dispatch, skill registry)
 - `SanadVoice/gemini_voice_v2` (local SpeechT5 TTS)
 - `Project/Marcus` — camera→Gemini stdin-push transport, motion-state
-  injection, camera daemon resilience (auto-reconnect, USB-2.0 warning),
-  and the `API/camera_api.py` cache shape (`get_frame_b64` / `get_fresh_frame`).
- Unitree `unitree_sdk2py` (G1 low-level SDK, LocoClient, G1ArmActionClient)
+  injection, camera daemon resilience (auto-reconnect, USB-2.0 warning), the
+  `API/camera_api.py` cache shape (`get_frame_b64` / `get_fresh_frame`), and the
+  confirmation-phrase → locomotion pattern (`movement_dispatch`).
+- Unitree `unitree_sdk2py` (G1 low-level SDK, `LocoClient`, `G1ArmActionClient`,
+  `AudioClient.PlayStream`).
--- a/config/core_config.json
+++ b/config/core_config.json
@ -36,7 +36,7 @@

  "gemini_defaults": {
    "_comment": "Baseline Gemini API config — SINGLE SOURCE OF TRUTH. All voice modules read from here.",
-    "api_key": "AIzaSyDt9Xi83MDZuuPpfwfHyMD92X7ZKdGkqf8",
+    "api_key": "",
    "model_live": "gemini-2.5-flash-native-audio-preview-12-2025",
    "model_ws_uri": "wss://generativelanguage.googleapis.com/ws/google.ai.generativelanguage.v1alpha.GenerativeService.BidiGenerateContent",
    "voice_name": "Charon",
--- a/data/motions/config.json
+++ b/data/motions/config.json
@ -1,6 +1,6 @@
 {
  "gemini": {
-    "api_key": "AIzaSyDt9Xi83MDZuuPpfwfHyMD92X7ZKdGkqf8",
+    "api_key": "",
    "model": "models/gemini-2.5-flash-native-audio-preview-12-2025",
    "voice_name": "Charon"
  },