Sanadv3/README.md

# Sanadv3

Voice + motion assistant for the Unitree G1 humanoid. **Gemini Live** (or a
fully-offline pipeline) handles bilingual Arabic/English conversation; an arm
controller plays built-in SDK poses and recorded JSONL macros; a locomotion
controller walks/turns the robot; an optional camera feeds **Gemini-side face &
place recognition**; everything is orchestrated through a fault-isolated
**FastAPI dashboard** on `http://<robot>:8000`.

```
┌──────────────────────────────────────────────────────────────────────┐
│  Dashboard (FastAPI) ── http://<robot>:8000                            │
│  ├─ Operations         Quick-fire arm actions + gestural-speaking      │
│  ├─ Voice & Audio      Live Gemini, Typed Replay, Wake Phrases, Audio  │
│  ├─ Motion & Replay    SDK actions, JSONL replays, macros, teaching    │
│  ├─ Controller         Locomotion teleop, postures, FSM modes, E-STOP  │
│  ├─ Recognition        Camera vision + face gallery + zones/places     │
│  ├─ Recordings         Skill registry, saved Gemini turns              │
│  ├─ Temperature        Live 3D motor-temperature heatmap (three.js)    │
│  ├─ Terminal           In-browser shell (PTY) to the robot             │
│  └─ Settings & Logs    System info, tail/stream live logs              │
└──────────────────────────────────────────────────────────────────────┘
        │
        ├─ voice/sanad_voice.py      (subprocess — model-agnostic voice loop)
        │    ├─ gemini/script.py     (Gemini Live brain — audio+video+state)
        │    └─ local/script.py      (offline brain — VAD→STT→LLM→TTS)
        ├─ gemini/client.py          (short-session client for Typed Replay)
        ├─ gemini/subprocess.py      (spawns+supervises sanad_voice.py;
        │                             pushes camera frames + motion state
        │                             to the child over its stdin)
        ├─ voice/movement_dispatch.py(Gemini spoken phrase → locomotion)
        ├─ vision/camera.py          (RealSense/USB capture daemon)
        ├─ vision/face_gallery.py    (data/faces/ CRUD for the primer turn)
        ├─ vision/zone_gallery.py    (data/zones/ places + "go here" targets)
        ├─ motion/arm_controller.py  (G1 arm DDS publisher — owns DDS init)
        ├─ G1_Controller/loco_controller.py (G1 locomotion via LocoClient)
        ├─ voice/audio_io.py         (mic + speaker abstraction — 3 profiles)
        └─ core/brain.py             (skill dispatcher, event bus)
```

### Camera + face/place recognition data flow

```
CameraDaemon (parent, in-memory JPEG+b64 cache)
  ├─→ dashboard /api/recognition/frame.jpg   ── snapshot_jpeg()
  └─→ GeminiSubprocess._frame_forwarder      ── get_frame_b64()
                                                 │ "frame:<b64>\n" over stdin
ArmController ─emit→ event bus ─→ main.py ─→ live_sub.send_state()
                                                 │ "state:<json>\n" over stdin
                                                 ▼
                          gemini/script.py  _stdin_watcher thread
                            ├─ frame: → _LATEST_FRAME → _send_frame_loop →
                            │             session.send_realtime_input(video=Blob)
                            └─ state: → _STATE_PENDING → _send_state_loop →
                                          session.send_realtime_input(text=…)

Recognition toggles (vision / face-rec / zone-rec / movement) are written by the
dashboard to data/.recognition_state.json and POLLED by the Gemini child at 1 Hz
— so flipping a toggle takes effect mid-session with NO restart.
```


## Quick start (on the robot)

```bash
conda activate gemini_sdk
cd ~/Sanad
python3 main.py
```

Then open `http://<robot-ip>:8000` in a browser. (The dashboard binds to the
`wlan0` IP by default — see *Runtime selection* to override.)

Fully-offline brain (no cloud): `SANAD_VOICE_BRAIN=local python3 main.py`
(requires `ollama serve` + the local model env — see *Voice brains*).

> **Gemini API key — required, none ships with the repo.** The `api_key`
> fields in `config/core_config.json` (`gemini_defaults`) and
> `data/motions/config.json` (`gemini`) are intentionally empty (`""`).
> The voice loop cannot connect until you supply one, by any of:
> - **Dashboard** → *Voice & Audio → Gemini API Key* — paste + save, hot-swaps live (no restart). Persists to `data/motions/config.json`.
> - **Env var** — `export SANAD_GEMINI_API_KEY=AIza...` before `python3 main.py`.
> - **Config file** — set `gemini_defaults.api_key` in `config/core_config.json`.
>
> Precedence (highest first): `data/motions/config.json` → `SANAD_GEMINI_API_KEY` → `config/core_config.json`. Get a key at <https://aistudio.google.com/apikey>.


## Dashboard features

### Operations
Quick-fire SDK + JSONL arm actions (chip buttons), gestural-speaking toggle.

### Voice & Audio
- **Live Voice Commands** — fire arm gestures from the *user's* transcript
  (wake-phrase → arm action). Master gate + Deferred-trigger toggle.
- **Live Gemini Process** — start/stop the voice conversation subprocess, tail
  its log. Choose the Gemini cloud brain or the offline brain via
  `SANAD_VOICE_BRAIN`.
- **Typed Replay** — Gemini reads typed text aloud (wrapped with a
  "repeat verbatim" prompt); optionally records the clip.
- **Gemini API Key** — hot-swap the key without restart.
- **Wake Phrase Manager** — add/remove phrase → action bindings.
- **Audio Controls** — mic/speaker mute, G1 chest-speaker volume (DDS), device
  profile selection, PulseAudio soft-reset and Anker USB hard-reset.

### Motion & Replay
- **Motion Control** — list SDK (built-in) + JSONL (recorded) actions, select +
  play. Cancel smoothly returns to `arm_home.jsonl`.
- **Replay Manager** — upload `.jsonl` files, test-play with speed, Teaching
  Mode (kinesthetic record — limp the arm and hand-guide it).
- **Macro Recorder** — record a new audio+motion pair, OR pick any WAV + any
  motion (SDK or JSONL) and play them in parallel.

### Controller  *(locomotion)*
Manual teleoperation of the G1's **legs** via the Unitree `LocoClient`.
**Disarmed every boot**; all motion writes require Arm first.
- **Move / Step** — continuous teleop (vx/vy/vyaw) or discrete one-shot steps.
- **Postures & FSM modes** — zero-torque, damp, squat, sit, stand, balance,
  stand-height; prep/ready sequences; MotionSwitcher select-AI/release.
- **Gemini Movement** — toggle voice-driven walking: the `MovementDispatcher`
  parses Gemini's *own spoken confirmation phrases* ("Turning right." /
  "أستدير يميناً.") and drives the legs (gated on this toggle + an E-STOP latch).
- **E-STOP** — always available; `StopMove` + disarm + latch the dispatcher.

> **Safety:** the arm and locomotion are **mutually exclusive** —
> `arm.set_motion_block(loco.movement_active)` makes every arm
> replay/gesture refuse while the robot is (or just was, within ~1.5 s) walking.

### Recognition
Camera vision + Gemini-side **face** and **zone/place** recognition. All are
**off by default**; each is a **hot toggle** (≈1 s to take effect, no restart).
- **Camera Vision** — `CameraDaemon` captures from a RealSense (preferred) or
  USB camera; the supervisor streams JPEG frames to Gemini Live so it can answer
  "what do you see?". Live preview panel. Auto-reconnects on USB unplug/stall
  and warns if a RealSense negotiated USB 2.0 (Marcus-ported resilience).
- **Face Recognition** — manage `data/faces/face_{id}/` galleries: enroll from
  the live camera or upload photos, rename, describe, download (per-photo or
  ZIP), delete. On session start (and on any gallery change) the child sends a
  **primer turn** carrying every enrolled face + a Khaleeji greeting
  instruction — **Gemini matches in-context, so there is no local
  face-recognition model**. Recognition needs vision on.
- **Zones & Places** — `data/zones/zone_{zid}/place_{pid}/` two-level gallery:
  reference photos per place, optional linked face_ids, and a **"go here"** nav
  target (`nav_target_zone/place_id` in the recognition-state file) for
  place-aware navigation.
- **Sync Gallery** — force-resend the face/zone primer to the live session.

### Recordings
Skill Registry (predefined audio+motion+callback skills from `skills.json`) +
Saved Records (captured Gemini turn recordings; play/pause/stop/rename/delete).

### Temperature
Live **3D motor-temperature heatmap** — a standalone three.js viewer
(`dashboard/static/temp3d/`) loads the G1 29-DoF URDF + STL meshes and colors
each joint blue→red from the arm controller's throttled `rt/lowstate` snapshot,
streamed over `/ws/motor-temps` at ~8 fps. No second DDS subscriber.

### Terminal
In-browser **PTY shell** to the robot (`/ws/terminal`, xterm.js) — a `bash -i`
as the dashboard's user, with resize + backpressure, bounded to 4 sessions.
(See *Security* — this is full shell access to whoever reaches the URL.)

### Settings & Logs
System info (host, network interfaces, DDS interface, bound dashboard host/port,
per-subsystem status, audio devices), live log stream (`/ws/logs`), per-file
tail, snapshot, and a one-blob "Copy All Logs" bundle.


## Directory layout

| Path | Contents |
|---|---|
| `main.py` | Entry point — fault-isolated boot of all subsystems + the dashboard. Doubles as the service container (route handlers `import` its module globals). |
| `config.py` | Runtime constants + layout-agnostic path resolution; layers `data/motions/config.json` over the JSON config at import. |
| `config/` | Per-subsystem JSON: `core`, `voice`, `gemini`, `local`, `motion`, `dashboard`. |
| `core/` | `brain.py` (skill dispatcher), `event_bus.py`, `skill_registry.py`, `config_loader.py`, `logger.py` (rotating + WS push), `asyncio_compat.py` (3.8 `to_thread` shim). |
| `gemini/` | Gemini Live — `client.py` (one-shot), `script.py` (live brain: audio + video + motion-state), `subprocess.py` (supervisor + stdin frame/state push). |
| `local/` | Fully-offline brain — `vad.py` (Silero), `stt.py` (faster-whisper), `llm.py` (Qwen via Ollama/llama.cpp), `tts.py` (CosyVoice2), `script.py` (the brain), `subprocess.py` (supervisor). Opt-in via `SANAD_VOICE_BRAIN=local`. |
| `voice/` | `sanad_voice.py` (subprocess entry, model-agnostic), `audio_io.py` / `audio_manager.py` / `audio_devices.py` (mic/speaker), `local_tts.py` (SpeechT5 Arabic TTS), `live_voice_loop.py` (user-transcript → arm gesture), `movement_dispatch.py` (Gemini-phrase → locomotion), `typed_replay.py`, `wake_phrase_manager.py`, `text_utils.py` (Arabic normalization + phrase matching), `model_script.py` / `model_subprocess.py` (brain templates). |
| `motion/` | `arm_controller.py` (production 5-phase JSONL replay engine, owns the single DDS init), `macro_player.py`, `macro_recorder.py`, `teaching.py`. (`sanad_arm_controller.py` is a legacy alternate — not wired by `main.py`.) |
| `G1_Controller/` | `loco_controller.py` — locomotion via Unitree `LocoClient` (move/step/postures/FSM/E-STOP); reuses the arm's DDS participant. |
| `vision/` | `camera.py` (RealSense/USB daemon, auto-reconnect), `face_gallery.py`, `zone_gallery.py`, `recognition_state.py` (atomic-JSON toggle IPC). |
| `dashboard/` | `app.py` (FastAPI factory + fault-isolated router registration), `routes/*.py` (20 REST routers), `websockets/*.py` (logs, motor-temps, terminal), `static/index.html` (single-page UI), `static/temp3d/` (3D viewer). |
| `scripts/` | Persona files — `sanad_script.txt` (voice persona "Bousandah"), `sanad_rule.txt`, `sanad_arm.txt` (voice→arm phrases). |
| `data/` | Runtime state — `motions/*.jsonl` (arm trajectories) + `instruction.json` (locomotion phrase map) + `skills.json` + `config.json` (dashboard-editable), `recordings/` (captured turns + macros), `faces/face_{id}/` + `zones/zone_{zid}/place_{pid}/` (galleries), `audio/` (typed-replay WAVs + records index), `.recognition_state.json` (toggle IPC). |
| `model/` | Local SpeechT5 / Whisper / CosyVoice2 weights when using the offline pipeline. |
| `logs/` | Per-module rotating logs. |


## Voice brains

The child `voice/sanad_voice.py` is model-agnostic and selects a brain via
`SANAD_VOICE_BRAIN`. Every brain implements the same contract
(`__init__(audio_io, recorder, voice, system_prompt)`, `async run()`, `stop()`)
and ships a sibling supervisor that spawns the child and parses its
`USER:` / `BOT:` / state log markers.

| Value | Brain | Pipeline |
|---|---|---|
| `gemini` *(default)* | `gemini/script.py` | Gemini Live native-audio (full-duplex speech-to-speech, server-side VAD, vision frames, face/zone primers, voice→movement). Cloud. |
| `local` | `local/script.py` | Silero VAD → faster-whisper (large-v3-turbo, CUDA int8) → Qwen2.5 (Ollama/llama.cpp) → CosyVoice2 streaming TTS. Fully on-device. |
| `model` | `voice/model_script.py` | Template/stub for adding a new provider (OpenAI Realtime, Claude Voice, …). |

To add a brain: drop a file in `voice/` or a new `<brand>/` folder and add a
branch to `voice/sanad_voice.py:_build_brain()`; ship a supervisor modeled on
`voice/model_subprocess.py`.


## Runtime selection (env vars)

| Var | Values | Default | Effect |
|---|---|---|---|
| `SANAD_VOICE_BRAIN` | `gemini`, `local`, `model` | `gemini` | Which brain the subprocess loads (see `voice/sanad_voice.py:_build_brain`). |
| `SANAD_AUDIO_PROFILE` | `builtin`, `anker`, `hollyland_builtin` | `builtin` | Mic + speaker pair. `builtin` = G1 UDP mic + G1 chest speaker via DDS. |
| `SANAD_DDS_INTERFACE` | network iface | `eth0` | DDS network for G1 low-level comms (arm + locomotion + speaker). |
| `SANAD_DASHBOARD_HOST` / `_INTERFACE` | IP / iface | `wlan0` IP | Dashboard bind address. |
| `SANAD_GEMINI_API_KEY` | string | `""` (empty) | Gemini API key. No key ships in the repo — set this, paste one in the dashboard (**Voice & Audio → Gemini API Key**), or fill `gemini_defaults.api_key` in `config/core_config.json`. See [Quick start](#quick-start-on-the-robot). |
| `SANAD_GEMINI_MODEL` / `_VOICE` | string | reads config | Override the Gemini model id / prebuilt voice. |
| `SANAD_G1_VOLUME` | `0`–`100` | `100` | G1 chest-speaker volume; also scales the barge-in threshold. |
| `SANAD_LIVE_SCRIPT` | path | auto | Override the subprocess entry script path. |
| `SANAD_RECORD` | `0` or `1` | `1` | Record every Gemini turn to `data/recordings/`. |
| `SANAD_AEC_ENABLE` | `0` or `1` | `1` | Enable WebRTC AEC3 (if the Python binding is installed). |
| `SANAD_VISION_ENABLE` | `0` or `1` | `0` | Boot default for camera vision. **Runtime truth is the Recognition-tab toggle** → `data/.recognition_state.json`, hot-applied without a restart. |
| `SANAD_FACE_RECOGNITION_ENABLE` | `0` or `1` | `0` | Boot default for Gemini-side face recognition. Also a hot toggle. |
| `SANAD_VISION_SEND_HZ` | float | `2` | Frames/sec the Gemini child relays to Live. |
| `SANAD_CAMERA_WIDTH` / `_HEIGHT` / `_FPS` | int | `424` / `240` / `15` | Capture profile. Also settable per-deploy in `config/core_config.json > camera`. |
| `SANAD_CAMERA_USB_INDEX` | int | auto | Pin a `/dev/videoN` node (avoids picking a RealSense IR stream). |
| `SANAD_FACES_MAX_SAMPLES` | int | `3` | Max photos per person fed into the gallery primer turn (token budget). |
| `SANAD_PROJECT_ROOT` | path | auto | Override the project root (see *Dynamic paths*). |

> All `SANAD_VISION_*` / `SANAD_CAMERA_*` / `SANAD_FACE_*` vars are **boot
> defaults** forwarded to the Gemini child via `LIVE_TUNE`. Once running, the
> Recognition tab's toggles (vision / face-rec / zone-rec / movement) are the
> live source of truth in `data/.recognition_state.json`, polled at 1 Hz.

CLI flags: `python3 main.py --host <ip> --port 8000 --network <dds_iface>`;
`--check-env` prints a subsystem/environment diagnostic and exits.


## API surface

All routes are registered defensively — a router whose import fails is recorded
(`GET /api/_dashboard_status`) and the server still boots without it.

**REST** (prefix → controls): `/api` health · `/api/system` info ·
`/api/voice` Gemini/local generate+connect+key · `/api/motion` arm actions ·
`/api/skills` skill registry · `/api/macros` record/play · `/api/replay` JSONL
CRUD + teaching · `/api/audio` mute/volume/devices/reset · `/api/scripts`
persona files · `/api/records` saved WAVs · `/api/prompt` system prompt ·
`/api/wake-phrases` bindings · `/api/live-voice` arm-phrase dispatcher ·
`/api/live-subprocess` Gemini child · `/api/typed-replay` TTS · `/api/recognition`
vision + face gallery · `/api/zones` zones/places + nav target · `/api/temp`
motor map + snapshot · `/api/controller` locomotion (move/step/postures/modes/
E-STOP).

**WebSockets**: `/ws/logs` (live log stream + 500-line replay) ·
`/ws/motor-temps` (3D heatmap data, ~8 fps) · `/ws/terminal` (PTY shell).


## Architecture notes

- **Subprocess isolation**: `voice/sanad_voice.py` runs as a child of `main.py`
  via the supervisor. If the voice loop crashes, the dashboard + arm + legs stay
  up.
- **Single DDS init**: `motion/arm_controller.py` owns the one
  `ChannelFactoryInitialize`; `LocoController` and the audio routes reuse that
  participant rather than re-initializing.
- **Brain contract**: see `voice/model_script.py` — any new model implements
  `__init__(audio_io, recorder, voice, system_prompt)`, `async run()`, `stop()`.
- **Supervisor contract**: each brain ships a sibling supervisor (e.g.
  `gemini/subprocess.py`) that spawns `sanad_voice.py` with its
  `SANAD_VOICE_BRAIN` and parses the brain's log markers. Template:
  `voice/model_subprocess.py`.
- **Locomotion safety**: `LocoController` is disarmed every boot, has velocity
  caps + a `StopMove` watchdog, and is mutually exclusive with the arm.
  Voice-driven movement is **off by default** and gated by the Controller
  toggle. Distances/degrees in `data/motions/instruction.json` are
  **approximate and must be calibrated on the real robot** — there is no
  obstacle/abort stack.
- **Audio routing**: the G1's platform-sound PulseAudio sink is NOT wired to a
  physical speaker. All dashboard-triggered playback (`play_wav`, typed-replay
  audio, record playback) routes through DDS `AudioClient.PlayStream` via
  `audio_manager._play_pcm_via_g1`. The PyAudio path is a desktop/dev fallback.
- **Arm replay**: `motion/arm_controller.py:_replay_file_inner()` is a port of
  `G1_Lootah/Manual_Recorder/g1_replay_v4_stable.py:Run()` — ramp-in → settle
  hold → playback → smooth return → disable SDK. Body motors (0–14) lock to a
  live snapshot while arm motors (15–28) follow the file at 60 Hz. `_return_home()`
  runs unconditionally after a cancel for a jerk-free return.
- **Camera frame transport (stdin push)**: the `CameraDaemon` lives in the
  parent and caches frames in memory. `GeminiSubprocess` base64-encodes the
  latest frame to the child's stdin (~2 fps); the child's `_stdin_watcher`
  relays it to Gemini Live with a staleness guard. Chosen over a file drop so
  the parent owns the camera once and the dashboard preview reads the same cache.
- **Motion-state channel**: `arm_controller._execute()` emits
  `motion.action_started` / `_done` / `_error` on the event bus. `main.py`
  forwards each to the child as `state:<json>\n`, injected to Gemini Live as
  silent `[STATE-START] wave_hand` / `[STATE-DONE] wave_hand (2.3s)` text so it
  can honestly answer "what are you doing?".
- **Recognition is Gemini-side**: no dlib/insightface/onnxruntime. Galleries are
  pure file IO; `gemini/script.py:_send_gallery_primer()` builds one multimodal
  `send_client_content` turn — every enrolled face/place's photos + a greeting
  instruction — and Gemini matches incoming frames against it in-context.


## Camera vision on Jetson

The Recognition tab needs `pyrealsense2` to talk to the Intel RealSense.
**Do not `pip install pyrealsense2` on JetPack 5** — the PyPI wheel is built
against glibc 2.32+ (Ubuntu 22.04) and fails to load on JetPack 5's glibc
2.31 with `ImportError: ... version 'GLIBC_2.32' not found`.

The native runtime is already there (`apt`-installed `librealsense2`). Build
just the Python binding from source against it, into the `gemini_sdk` env:

```bash
rs-enumerate-devices            # confirm the D435I shows up at OS level first

source ~/miniconda3/etc/profile.d/conda.sh && conda activate gemini_sdk
pip uninstall -y pyrealsense2   # remove the broken wheel if present
sudo apt install -y cmake build-essential git python3-dev libusb-1.0-0-dev pkg-config libssl-dev

cd /tmp && rm -rf librealsense
git clone --depth=1 --branch v2.56.5 https://github.com/IntelRealSense/librealsense.git
cd librealsense && mkdir -p build && cd build
cmake .. -DBUILD_PYTHON_BINDINGS=ON -DPYTHON_EXECUTABLE=$(which python3) \
         -DBUILD_EXAMPLES=OFF -DBUILD_GRAPHICAL_EXAMPLES=OFF \
         -DBUILD_UNIT_TESTS=OFF -DCHECK_FOR_UPDATES=OFF -DCMAKE_BUILD_TYPE=Release
make -j$(nproc) pyrealsense2
SITE=$(python3 -c "import sysconfig; print(sysconfig.get_paths()['purelib'])")
mkdir -p "$SITE/pyrealsense2"
cp wrappers/python/pyrealsense2*.so "$SITE/pyrealsense2/"
cp ../wrappers/python/pyrealsense2/__init__.py "$SITE/pyrealsense2/" 2>/dev/null || true

python3 -c 'import pyrealsense2 as rs; print([d.get_info(rs.camera_info.name) for d in rs.context().query_devices()])'
```

Match the `--branch` tag to the installed runtime (`dpkg -l | grep librealsense2`).
If the build isn't worth it, `CameraDaemon` falls back to `cv2.VideoCapture(0)`
automatically — fine for a plain USB webcam, but note a RealSense exposes its
*depth* stream at `/dev/video0`, not RGB, so a real USB cam is the cleaner
fallback (or pin `SANAD_CAMERA_USB_INDEX`). On x86_64 / Ubuntu 22.04+ desktops,
`pip install pyrealsense2` just works.


## Dynamic paths

Every path is derived at runtime — no hard-coded `/home/...` anywhere.
Resolution order for `BASE_DIR` in `config.py`:

1. `SANAD_PROJECT_ROOT` env var (if set).
2. `PROJECT_BASE + PROJECT_NAME` from a `.env` file in `Sanad/` or its parent.
3. `Path(__file__).resolve().parent` — auto-detected.

The project runs unchanged from either layout:
- dev: `<anywhere>/Project/Sanad/`
- deployed: `/home/unitree/Sanad/`


## Deployment (workstation → robot)

```bash
rsync -av --delete \
  --exclude=__pycache__ --exclude=logs --exclude=model --exclude=.git \
  /path/to/Sanad/ \
  unitree@192.168.123.164:/home/unitree/Sanad/
```

Then on the robot: `Ctrl+C` the running `main.py` and re-run.


## Security

The dashboard has **no authentication**. Anyone who can reach
`http://<robot>:8000` gets full robot control — locomotion, arm, audio, file
upload/delete — and, via the **Terminal tab**, an interactive shell as the
dashboard's user. Bind it to a **trusted LAN only**; add auth before any wider
exposure.


## Troubleshooting

| Symptom | Fix |
|---|---|
| `No LowState received in 2s — refusing to replay` | `main.py` was re-executed as both `__main__` and `Project.Sanad.main`, creating two arm instances. Fix lives in the `sys.modules` alias near the top of `main.py`. Restart. |
| `G1ArmActionClient not available — skipping` for SDK actions | Same duplicate-init issue as above. |
| `No module named 'Project'` in subprocess | Bootstrap preamble in `voice/sanad_voice.py:~30` synthesises the `Project.Sanad` namespace when run as `__main__`. |
| Controller moves rejected (409) | The Controller is **disarmed by default** — hit Arm first. Reads + E-STOP are always allowed. |
| Arm action refused while "movement armed" | Arm ↔ locomotion are mutually exclusive. Disarm/stop locomotion, then trigger the arm. |
| Voice-driven walking does nothing | "Gemini Movement" toggle off, or E-STOP latched. Toggle on; clear E-STOP. Distances are uncalibrated. |
| Arm jumps at start of JSONL replay | `SETTLE_HOLD_SEC` (in `config/motion_config.json > arm_controller`) too low — try `0.7` or `1.0`. |
| Record playback silent | `audio_mgr.play_wav` only routes to G1 DDS if the Unitree SDK is importable; on desktop it falls back to the PulseAudio sink. |
| Live Voice Commands transcript stuck | Deferred trigger was queued but `trigger_enabled` toggle was off. Toggle on — or the pending-trigger poll fires it automatically once enabled. |
| Gemini "no audio" on Typed Replay | Non-deterministic; the retry chain in `voice/typed_replay.py:generate_audio` tries three prompt variants. For reliable TTS, use the offline `local_tts` SpeechT5 path. |
| Local brain exits immediately | `ollama serve` not running / model not pulled, or weights missing under `model/`. Check `logs/local_subprocess.log`. The Gemini brain is the safe default. |
| Recognition tab: "Camera could not start (no backend)" | No camera backend acquired. Check `rs-enumerate-devices` (RealSense at OS level) and `python3 -c 'import pyrealsense2'` in the `gemini_sdk` env. The glibc `ImportError` means the pip wheel is incompatible — see "Camera vision on Jetson" above. |
| Camera badge stuck on "reconnecting…" | `CameraDaemon` lost the device and is retrying with exponential backoff. Re-seat the USB 3 cable; check `logs/camera.log` for the USB-2.0 warning. |
| Gemini doesn't greet an enrolled face | Face Recognition toggle on? Vision on? (Face rec needs frames.) Check `logs/gemini_brain.log` for `face gallery primed: N person(s)`. Hit "Sync Gallery" to force a re-prime. |
| Gemini unaware of motion state | The `motion.action_*` → `send_state` chain only runs when Live Gemini is up. Check `logs/gemini_subprocess.log` and `logs/gemini_brain.log` for `STATE injected:` lines. |


## License / attribution

Internal project for YS Lootah Technology. Reuses/ports patterns from:
- `G1_Lootah/Manual_Recorder/g1_replay_v4_stable.py` (arm replay math)
- `SanadVoice/gemini_interact` (arm-phrase dispatch, skill registry)
- `SanadVoice/gemini_voice_v2` (local SpeechT5 TTS)
- `Project/Marcus` — camera→Gemini stdin-push transport, motion-state
  injection, camera daemon resilience (auto-reconnect, USB-2.0 warning), the
  `API/camera_api.py` cache shape (`get_frame_b64` / `get_fresh_frame`), and the
  confirmation-phrase → locomotion pattern (`movement_dispatch`).
- Unitree `unitree_sdk2py` (G1 low-level SDK, `LocoClient`, `G1ArmActionClient`,
  `AudioClient.PlayStream`).