Sanadv3/README.md
2026-07-04 19:37:27 +00:00

413 lines
26 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Sanadv3
Voice + motion assistant for the Unitree G1 humanoid. **Gemini Live** (or a
fully-offline pipeline) handles bilingual Arabic/English conversation; an arm
controller plays built-in SDK poses and recorded JSONL macros; a locomotion
controller walks/turns the robot; an optional camera feeds **Gemini-side face &
place recognition**; everything is orchestrated through a fault-isolated
**FastAPI dashboard** on `http://<robot>:8000`.
```
┌──────────────────────────────────────────────────────────────────────┐
│ Dashboard (FastAPI) ── http://<robot>:8000 │
│ ├─ Operations Quick-fire arm actions + gestural-speaking │
│ ├─ Voice & Audio Live Gemini, Typed Replay, Wake Phrases, Audio │
│ ├─ Motion & Replay SDK actions, JSONL replays, macros, teaching │
│ ├─ Controller Locomotion teleop, postures, FSM modes, E-STOP │
│ ├─ Recognition Camera vision + face gallery + zones/places │
│ ├─ Recordings Skill registry, saved Gemini turns │
│ ├─ Temperature Live 3D motor-temperature heatmap (three.js) │
│ ├─ Terminal In-browser shell (PTY) to the robot │
│ └─ Settings & Logs System info, tail/stream live logs │
└──────────────────────────────────────────────────────────────────────┘
├─ voice/sanad_voice.py (subprocess — model-agnostic voice loop)
│ ├─ gemini/script.py (Gemini Live brain — audio+video+state)
│ └─ local/script.py (offline brain — VAD→STT→LLM→TTS)
├─ gemini/client.py (short-session client for Typed Replay)
├─ gemini/subprocess.py (spawns+supervises sanad_voice.py;
│ pushes camera frames + motion state
│ to the child over its stdin)
├─ voice/movement_dispatch.py(Gemini spoken phrase → locomotion)
├─ vision/camera.py (RealSense/USB capture daemon)
├─ vision/face_gallery.py (data/faces/ CRUD for the primer turn)
├─ vision/zone_gallery.py (data/zones/ places + "go here" targets)
├─ motion/arm_controller.py (G1 arm DDS publisher — owns DDS init)
├─ G1_Controller/loco_controller.py (G1 locomotion via LocoClient)
├─ voice/audio_io.py (mic + speaker abstraction — 3 profiles)
└─ core/brain.py (skill dispatcher, event bus)
```
### Camera + face/place recognition data flow
```
CameraDaemon (parent, in-memory JPEG+b64 cache)
├─→ dashboard /api/recognition/frame.jpg ── snapshot_jpeg()
└─→ GeminiSubprocess._frame_forwarder ── get_frame_b64()
│ "frame:<b64>\n" over stdin
ArmController ─emit→ event bus ─→ main.py ─→ live_sub.send_state()
│ "state:<json>\n" over stdin
gemini/script.py _stdin_watcher thread
├─ frame: → _LATEST_FRAME → _send_frame_loop →
│ session.send_realtime_input(video=Blob)
└─ state: → _STATE_PENDING → _send_state_loop →
session.send_realtime_input(text=…)
Recognition toggles (vision / face-rec / zone-rec / movement) are written by the
dashboard to data/.recognition_state.json and POLLED by the Gemini child at 1 Hz
— so flipping a toggle takes effect mid-session with NO restart.
```
## Quick start (on the robot)
```bash
conda activate gemini_sdk
cd ~/Sanad
python3 main.py
```
Then open `http://<robot-ip>:8000` in a browser. (The dashboard binds to the
`wlan0` IP by default — see *Runtime selection* to override.)
Fully-offline brain (no cloud): `SANAD_VOICE_BRAIN=local python3 main.py`
(requires `ollama serve` + the local model env — see *Voice brains*).
> **Gemini API key — required, none ships with the repo.** The `api_key`
> fields in `config/core_config.json` (`gemini_defaults`) and
> `data/motions/config.json` (`gemini`) are intentionally empty (`""`).
> The voice loop cannot connect until you supply one, by any of:
> - **Dashboard** → *Voice & Audio → Gemini API Key* — paste + save, hot-swaps live (no restart). Persists to `data/motions/config.json`.
> - **Env var** — `export SANAD_GEMINI_API_KEY=AIza...` before `python3 main.py`.
> - **Config file** — set `gemini_defaults.api_key` in `config/core_config.json`.
>
> Precedence (highest first): `data/motions/config.json` → `SANAD_GEMINI_API_KEY` → `config/core_config.json`. Get a key at <https://aistudio.google.com/apikey>.
## Dashboard features
### Operations
Quick-fire SDK + JSONL arm actions (chip buttons), gestural-speaking toggle.
### Voice & Audio
- **Live Voice Commands** — fire arm gestures from the *user's* transcript
(wake-phrase → arm action). Master gate + Deferred-trigger toggle.
- **Live Gemini Process** — start/stop the voice conversation subprocess, tail
its log. Choose the Gemini cloud brain or the offline brain via
`SANAD_VOICE_BRAIN`.
- **Typed Replay** — Gemini reads typed text aloud (wrapped with a
"repeat verbatim" prompt); optionally records the clip.
- **Gemini API Key** — hot-swap the key without restart.
- **Wake Phrase Manager** — add/remove phrase → action bindings.
- **Audio Controls** — mic/speaker mute, G1 chest-speaker volume (DDS), device
profile selection, PulseAudio soft-reset and Anker USB hard-reset.
### Motion & Replay
- **Motion Control** — list SDK (built-in) + JSONL (recorded) actions, select +
play. Cancel smoothly returns to `arm_home.jsonl`.
- **Replay Manager** — upload `.jsonl` files, test-play with speed, Teaching
Mode (kinesthetic record — limp the arm and hand-guide it).
- **Macro Recorder** — record a new audio+motion pair, OR pick any WAV + any
motion (SDK or JSONL) and play them in parallel.
### Controller *(locomotion)*
Manual teleoperation of the G1's **legs** via the Unitree `LocoClient`.
**Disarmed every boot**; all motion writes require Arm first.
- **Move / Step** — continuous teleop (vx/vy/vyaw) or discrete one-shot steps.
- **Postures & FSM modes** — zero-torque, damp, squat, sit, stand, balance,
stand-height; prep/ready sequences; MotionSwitcher select-AI/release.
- **Gemini Movement** — toggle voice-driven walking: the `MovementDispatcher`
parses Gemini's *own spoken confirmation phrases* ("Turning right." /
"أستدير يميناً.") and drives the legs (gated on this toggle + an E-STOP latch).
- **E-STOP** — always available; `StopMove` + disarm + latch the dispatcher.
> **Safety:** the arm and locomotion are **mutually exclusive** —
> `arm.set_motion_block(loco.movement_active)` makes every arm
> replay/gesture refuse while the robot is (or just was, within ~1.5 s) walking.
### Recognition
Camera vision + Gemini-side **face** and **zone/place** recognition. All are
**off by default**; each is a **hot toggle** (≈1 s to take effect, no restart).
- **Camera Vision** — `CameraDaemon` captures from a RealSense (preferred) or
USB camera; the supervisor streams JPEG frames to Gemini Live so it can answer
"what do you see?". Live preview panel. Auto-reconnects on USB unplug/stall
and warns if a RealSense negotiated USB 2.0 (Marcus-ported resilience).
- **Face Recognition** — manage `data/faces/face_{id}/` galleries: enroll from
the live camera or upload photos, rename, describe, download (per-photo or
ZIP), delete. On session start (and on any gallery change) the child sends a
**primer turn** carrying every enrolled face + a Khaleeji greeting
instruction — **Gemini matches in-context, so there is no local
face-recognition model**. Recognition needs vision on.
- **Zones & Places** — `data/zones/zone_{zid}/place_{pid}/` two-level gallery:
reference photos per place, optional linked face_ids, and a **"go here"** nav
target (`nav_target_zone/place_id` in the recognition-state file) for
place-aware navigation.
- **Sync Gallery** — force-resend the face/zone primer to the live session.
### Recordings
Skill Registry (predefined audio+motion+callback skills from `skills.json`) +
Saved Records (captured Gemini turn recordings; play/pause/stop/rename/delete).
### Temperature
Live **3D motor-temperature heatmap** — a standalone three.js viewer
(`dashboard/static/temp3d/`) loads the G1 29-DoF URDF + STL meshes and colors
each joint blue→red from the arm controller's throttled `rt/lowstate` snapshot,
streamed over `/ws/motor-temps` at ~8 fps. No second DDS subscriber.
### Terminal
In-browser **PTY shell** to the robot (`/ws/terminal`, xterm.js) — a `bash -i`
as the dashboard's user, with resize + backpressure, bounded to 4 sessions.
(See *Security* — this is full shell access to whoever reaches the URL.)
### Settings & Logs
System info (host, network interfaces, DDS interface, bound dashboard host/port,
per-subsystem status, audio devices), live log stream (`/ws/logs`), per-file
tail, snapshot, and a one-blob "Copy All Logs" bundle.
## Directory layout
| Path | Contents |
|---|---|
| `main.py` | Entry point — fault-isolated boot of all subsystems + the dashboard. Doubles as the service container (route handlers `import` its module globals). |
| `config.py` | Runtime constants + layout-agnostic path resolution; layers `data/motions/config.json` over the JSON config at import. |
| `config/` | Per-subsystem JSON: `core`, `voice`, `gemini`, `local`, `motion`, `dashboard`. |
| `core/` | `brain.py` (skill dispatcher), `event_bus.py`, `skill_registry.py`, `config_loader.py`, `logger.py` (rotating + WS push), `asyncio_compat.py` (3.8 `to_thread` shim). |
| `gemini/` | Gemini Live — `client.py` (one-shot), `script.py` (live brain: audio + video + motion-state), `subprocess.py` (supervisor + stdin frame/state push). |
| `local/` | Fully-offline brain — `vad.py` (Silero), `stt.py` (faster-whisper), `llm.py` (Qwen via Ollama/llama.cpp), `tts.py` (CosyVoice2), `script.py` (the brain), `subprocess.py` (supervisor). Opt-in via `SANAD_VOICE_BRAIN=local`. |
| `voice/` | `sanad_voice.py` (subprocess entry, model-agnostic), `audio_io.py` / `audio_manager.py` / `audio_devices.py` (mic/speaker), `local_tts.py` (SpeechT5 Arabic TTS), `live_voice_loop.py` (user-transcript → arm gesture), `movement_dispatch.py` (Gemini-phrase → locomotion), `typed_replay.py`, `wake_phrase_manager.py`, `text_utils.py` (Arabic normalization + phrase matching), `model_script.py` / `model_subprocess.py` (brain templates). |
| `motion/` | `arm_controller.py` (production 5-phase JSONL replay engine, owns the single DDS init), `macro_player.py`, `macro_recorder.py`, `teaching.py`. (`sanad_arm_controller.py` is a legacy alternate — not wired by `main.py`.) |
| `G1_Controller/` | `loco_controller.py` — locomotion via Unitree `LocoClient` (move/step/postures/FSM/E-STOP); reuses the arm's DDS participant. |
| `vision/` | `camera.py` (RealSense/USB daemon, auto-reconnect), `face_gallery.py`, `zone_gallery.py`, `recognition_state.py` (atomic-JSON toggle IPC). |
| `dashboard/` | `app.py` (FastAPI factory + fault-isolated router registration), `routes/*.py` (20 REST routers), `websockets/*.py` (logs, motor-temps, terminal), `static/index.html` (single-page UI), `static/temp3d/` (3D viewer). |
| `scripts/` | Persona files — `sanad_script.txt` (voice persona "Bousandah"), `sanad_rule.txt`, `sanad_arm.txt` (voice→arm phrases). |
| `data/` | Runtime state — `motions/*.jsonl` (arm trajectories) + `instruction.json` (locomotion phrase map) + `skills.json` + `config.json` (dashboard-editable), `recordings/` (captured turns + macros), `faces/face_{id}/` + `zones/zone_{zid}/place_{pid}/` (galleries), `audio/` (typed-replay WAVs + records index), `.recognition_state.json` (toggle IPC). |
| `model/` | Local SpeechT5 / Whisper / CosyVoice2 weights when using the offline pipeline. |
| `logs/` | Per-module rotating logs. |
## Voice brains
The child `voice/sanad_voice.py` is model-agnostic and selects a brain via
`SANAD_VOICE_BRAIN`. Every brain implements the same contract
(`__init__(audio_io, recorder, voice, system_prompt)`, `async run()`, `stop()`)
and ships a sibling supervisor that spawns the child and parses its
`USER:` / `BOT:` / state log markers.
| Value | Brain | Pipeline |
|---|---|---|
| `gemini` *(default)* | `gemini/script.py` | Gemini Live native-audio (full-duplex speech-to-speech, server-side VAD, vision frames, face/zone primers, voice→movement). Cloud. |
| `local` | `local/script.py` | Silero VAD → faster-whisper (large-v3-turbo, CUDA int8) → Qwen2.5 (Ollama/llama.cpp) → CosyVoice2 streaming TTS. Fully on-device. |
| `model` | `voice/model_script.py` | Template/stub for adding a new provider (OpenAI Realtime, Claude Voice, …). |
To add a brain: drop a file in `voice/` or a new `<brand>/` folder and add a
branch to `voice/sanad_voice.py:_build_brain()`; ship a supervisor modeled on
`voice/model_subprocess.py`.
## Runtime selection (env vars)
| Var | Values | Default | Effect |
|---|---|---|---|
| `SANAD_VOICE_BRAIN` | `gemini`, `local`, `model` | `gemini` | Which brain the subprocess loads (see `voice/sanad_voice.py:_build_brain`). |
| `SANAD_AUDIO_PROFILE` | `builtin`, `anker`, `hollyland_builtin` | `builtin` | Mic + speaker pair. `builtin` = G1 UDP mic + G1 chest speaker via DDS. |
| `SANAD_DDS_INTERFACE` | network iface | `eth0` | DDS network for G1 low-level comms (arm + locomotion + speaker). |
| `SANAD_DASHBOARD_HOST` / `_INTERFACE` | IP / iface | `wlan0` IP | Dashboard bind address. |
| `SANAD_GEMINI_API_KEY` | string | `""` (empty) | Gemini API key. No key ships in the repo — set this, paste one in the dashboard (**Voice & Audio → Gemini API Key**), or fill `gemini_defaults.api_key` in `config/core_config.json`. See [Quick start](#quick-start-on-the-robot). |
| `SANAD_GEMINI_MODEL` / `_VOICE` | string | reads config | Override the Gemini model id / prebuilt voice. |
| `SANAD_G1_VOLUME` | `0``100` | `100` | G1 chest-speaker volume; also scales the barge-in threshold. |
| `SANAD_LIVE_SCRIPT` | path | auto | Override the subprocess entry script path. |
| `SANAD_RECORD` | `0` or `1` | `1` | Record every Gemini turn to `data/recordings/`. |
| `SANAD_AEC_ENABLE` | `0` or `1` | `1` | Enable WebRTC AEC3 (if the Python binding is installed). |
| `SANAD_VISION_ENABLE` | `0` or `1` | `0` | Boot default for camera vision. **Runtime truth is the Recognition-tab toggle**`data/.recognition_state.json`, hot-applied without a restart. |
| `SANAD_FACE_RECOGNITION_ENABLE` | `0` or `1` | `0` | Boot default for Gemini-side face recognition. Also a hot toggle. |
| `SANAD_VISION_SEND_HZ` | float | `2` | Frames/sec the Gemini child relays to Live. |
| `SANAD_CAMERA_WIDTH` / `_HEIGHT` / `_FPS` | int | `424` / `240` / `15` | Capture profile. Also settable per-deploy in `config/core_config.json > camera`. |
| `SANAD_CAMERA_USB_INDEX` | int | auto | Pin a `/dev/videoN` node (avoids picking a RealSense IR stream). |
| `SANAD_FACES_MAX_SAMPLES` | int | `3` | Max photos per person fed into the gallery primer turn (token budget). |
| `SANAD_PROJECT_ROOT` | path | auto | Override the project root (see *Dynamic paths*). |
> All `SANAD_VISION_*` / `SANAD_CAMERA_*` / `SANAD_FACE_*` vars are **boot
> defaults** forwarded to the Gemini child via `LIVE_TUNE`. Once running, the
> Recognition tab's toggles (vision / face-rec / zone-rec / movement) are the
> live source of truth in `data/.recognition_state.json`, polled at 1 Hz.
CLI flags: `python3 main.py --host <ip> --port 8000 --network <dds_iface>`;
`--check-env` prints a subsystem/environment diagnostic and exits.
## API surface
All routes are registered defensively — a router whose import fails is recorded
(`GET /api/_dashboard_status`) and the server still boots without it.
**REST** (prefix → controls): `/api` health · `/api/system` info ·
`/api/voice` Gemini/local generate+connect+key · `/api/motion` arm actions ·
`/api/skills` skill registry · `/api/macros` record/play · `/api/replay` JSONL
CRUD + teaching · `/api/audio` mute/volume/devices/reset · `/api/scripts`
persona files · `/api/records` saved WAVs · `/api/prompt` system prompt ·
`/api/wake-phrases` bindings · `/api/live-voice` arm-phrase dispatcher ·
`/api/live-subprocess` Gemini child · `/api/typed-replay` TTS · `/api/recognition`
vision + face gallery · `/api/zones` zones/places + nav target · `/api/temp`
motor map + snapshot · `/api/controller` locomotion (move/step/postures/modes/
E-STOP).
**WebSockets**: `/ws/logs` (live log stream + 500-line replay) ·
`/ws/motor-temps` (3D heatmap data, ~8 fps) · `/ws/terminal` (PTY shell).
## Architecture notes
- **Subprocess isolation**: `voice/sanad_voice.py` runs as a child of `main.py`
via the supervisor. If the voice loop crashes, the dashboard + arm + legs stay
up.
- **Single DDS init**: `motion/arm_controller.py` owns the one
`ChannelFactoryInitialize`; `LocoController` and the audio routes reuse that
participant rather than re-initializing.
- **Brain contract**: see `voice/model_script.py` — any new model implements
`__init__(audio_io, recorder, voice, system_prompt)`, `async run()`, `stop()`.
- **Supervisor contract**: each brain ships a sibling supervisor (e.g.
`gemini/subprocess.py`) that spawns `sanad_voice.py` with its
`SANAD_VOICE_BRAIN` and parses the brain's log markers. Template:
`voice/model_subprocess.py`.
- **Locomotion safety**: `LocoController` is disarmed every boot, has velocity
caps + a `StopMove` watchdog, and is mutually exclusive with the arm.
Voice-driven movement is **off by default** and gated by the Controller
toggle. Distances/degrees in `data/motions/instruction.json` are
**approximate and must be calibrated on the real robot** — there is no
obstacle/abort stack.
- **Audio routing**: the G1's platform-sound PulseAudio sink is NOT wired to a
physical speaker. All dashboard-triggered playback (`play_wav`, typed-replay
audio, record playback) routes through DDS `AudioClient.PlayStream` via
`audio_manager._play_pcm_via_g1`. The PyAudio path is a desktop/dev fallback.
- **Arm replay**: `motion/arm_controller.py:_replay_file_inner()` is a port of
`G1_Lootah/Manual_Recorder/g1_replay_v4_stable.py:Run()` — ramp-in → settle
hold → playback → smooth return → disable SDK. Body motors (014) lock to a
live snapshot while arm motors (1528) follow the file at 60 Hz. `_return_home()`
runs unconditionally after a cancel for a jerk-free return.
- **Camera frame transport (stdin push)**: the `CameraDaemon` lives in the
parent and caches frames in memory. `GeminiSubprocess` base64-encodes the
latest frame to the child's stdin (~2 fps); the child's `_stdin_watcher`
relays it to Gemini Live with a staleness guard. Chosen over a file drop so
the parent owns the camera once and the dashboard preview reads the same cache.
- **Motion-state channel**: `arm_controller._execute()` emits
`motion.action_started` / `_done` / `_error` on the event bus. `main.py`
forwards each to the child as `state:<json>\n`, injected to Gemini Live as
silent `[STATE-START] wave_hand` / `[STATE-DONE] wave_hand (2.3s)` text so it
can honestly answer "what are you doing?".
- **Recognition is Gemini-side**: no dlib/insightface/onnxruntime. Galleries are
pure file IO; `gemini/script.py:_send_gallery_primer()` builds one multimodal
`send_client_content` turn — every enrolled face/place's photos + a greeting
instruction — and Gemini matches incoming frames against it in-context.
## Camera vision on Jetson
The Recognition tab needs `pyrealsense2` to talk to the Intel RealSense.
**Do not `pip install pyrealsense2` on JetPack 5** — the PyPI wheel is built
against glibc 2.32+ (Ubuntu 22.04) and fails to load on JetPack 5's glibc
2.31 with `ImportError: ... version 'GLIBC_2.32' not found`.
The native runtime is already there (`apt`-installed `librealsense2`). Build
just the Python binding from source against it, into the `gemini_sdk` env:
```bash
rs-enumerate-devices # confirm the D435I shows up at OS level first
source ~/miniconda3/etc/profile.d/conda.sh && conda activate gemini_sdk
pip uninstall -y pyrealsense2 # remove the broken wheel if present
sudo apt install -y cmake build-essential git python3-dev libusb-1.0-0-dev pkg-config libssl-dev
cd /tmp && rm -rf librealsense
git clone --depth=1 --branch v2.56.5 https://github.com/IntelRealSense/librealsense.git
cd librealsense && mkdir -p build && cd build
cmake .. -DBUILD_PYTHON_BINDINGS=ON -DPYTHON_EXECUTABLE=$(which python3) \
-DBUILD_EXAMPLES=OFF -DBUILD_GRAPHICAL_EXAMPLES=OFF \
-DBUILD_UNIT_TESTS=OFF -DCHECK_FOR_UPDATES=OFF -DCMAKE_BUILD_TYPE=Release
make -j$(nproc) pyrealsense2
SITE=$(python3 -c "import sysconfig; print(sysconfig.get_paths()['purelib'])")
mkdir -p "$SITE/pyrealsense2"
cp wrappers/python/pyrealsense2*.so "$SITE/pyrealsense2/"
cp ../wrappers/python/pyrealsense2/__init__.py "$SITE/pyrealsense2/" 2>/dev/null || true
python3 -c 'import pyrealsense2 as rs; print([d.get_info(rs.camera_info.name) for d in rs.context().query_devices()])'
```
Match the `--branch` tag to the installed runtime (`dpkg -l | grep librealsense2`).
If the build isn't worth it, `CameraDaemon` falls back to `cv2.VideoCapture(0)`
automatically — fine for a plain USB webcam, but note a RealSense exposes its
*depth* stream at `/dev/video0`, not RGB, so a real USB cam is the cleaner
fallback (or pin `SANAD_CAMERA_USB_INDEX`). On x86_64 / Ubuntu 22.04+ desktops,
`pip install pyrealsense2` just works.
## Dynamic paths
Every path is derived at runtime — no hard-coded `/home/...` anywhere.
Resolution order for `BASE_DIR` in `config.py`:
1. `SANAD_PROJECT_ROOT` env var (if set).
2. `PROJECT_BASE + PROJECT_NAME` from a `.env` file in `Sanad/` or its parent.
3. `Path(__file__).resolve().parent` — auto-detected.
The project runs unchanged from either layout:
- dev: `<anywhere>/Project/Sanad/`
- deployed: `/home/unitree/Sanad/`
## Deployment (workstation → robot)
```bash
rsync -av --delete \
--exclude=__pycache__ --exclude=logs --exclude=model --exclude=.git \
/path/to/Sanad/ \
unitree@192.168.123.164:/home/unitree/Sanad/
```
Then on the robot: `Ctrl+C` the running `main.py` and re-run.
## Security
The dashboard has **no authentication**. Anyone who can reach
`http://<robot>:8000` gets full robot control — locomotion, arm, audio, file
upload/delete — and, via the **Terminal tab**, an interactive shell as the
dashboard's user. Bind it to a **trusted LAN only**; add auth before any wider
exposure.
## Troubleshooting
| Symptom | Fix |
|---|---|
| `No LowState received in 2s — refusing to replay` | `main.py` was re-executed as both `__main__` and `Project.Sanad.main`, creating two arm instances. Fix lives in the `sys.modules` alias near the top of `main.py`. Restart. |
| `G1ArmActionClient not available — skipping` for SDK actions | Same duplicate-init issue as above. |
| `No module named 'Project'` in subprocess | Bootstrap preamble in `voice/sanad_voice.py:~30` synthesises the `Project.Sanad` namespace when run as `__main__`. |
| Controller moves rejected (409) | The Controller is **disarmed by default** — hit Arm first. Reads + E-STOP are always allowed. |
| Arm action refused while "movement armed" | Arm ↔ locomotion are mutually exclusive. Disarm/stop locomotion, then trigger the arm. |
| Voice-driven walking does nothing | "Gemini Movement" toggle off, or E-STOP latched. Toggle on; clear E-STOP. Distances are uncalibrated. |
| Arm jumps at start of JSONL replay | `SETTLE_HOLD_SEC` (in `config/motion_config.json > arm_controller`) too low — try `0.7` or `1.0`. |
| Record playback silent | `audio_mgr.play_wav` only routes to G1 DDS if the Unitree SDK is importable; on desktop it falls back to the PulseAudio sink. |
| Live Voice Commands transcript stuck | Deferred trigger was queued but `trigger_enabled` toggle was off. Toggle on — or the pending-trigger poll fires it automatically once enabled. |
| Gemini "no audio" on Typed Replay | Non-deterministic; the retry chain in `voice/typed_replay.py:generate_audio` tries three prompt variants. For reliable TTS, use the offline `local_tts` SpeechT5 path. |
| Local brain exits immediately | `ollama serve` not running / model not pulled, or weights missing under `model/`. Check `logs/local_subprocess.log`. The Gemini brain is the safe default. |
| Recognition tab: "Camera could not start (no backend)" | No camera backend acquired. Check `rs-enumerate-devices` (RealSense at OS level) and `python3 -c 'import pyrealsense2'` in the `gemini_sdk` env. The glibc `ImportError` means the pip wheel is incompatible — see "Camera vision on Jetson" above. |
| Camera badge stuck on "reconnecting…" | `CameraDaemon` lost the device and is retrying with exponential backoff. Re-seat the USB 3 cable; check `logs/camera.log` for the USB-2.0 warning. |
| Gemini doesn't greet an enrolled face | Face Recognition toggle on? Vision on? (Face rec needs frames.) Check `logs/gemini_brain.log` for `face gallery primed: N person(s)`. Hit "Sync Gallery" to force a re-prime. |
| Gemini unaware of motion state | The `motion.action_*``send_state` chain only runs when Live Gemini is up. Check `logs/gemini_subprocess.log` and `logs/gemini_brain.log` for `STATE injected:` lines. |
## License / attribution
Internal project for YS Lootah Technology. Reuses/ports patterns from:
- `G1_Lootah/Manual_Recorder/g1_replay_v4_stable.py` (arm replay math)
- `SanadVoice/gemini_interact` (arm-phrase dispatch, skill registry)
- `SanadVoice/gemini_voice_v2` (local SpeechT5 TTS)
- `Project/Marcus` — camera→Gemini stdin-push transport, motion-state
injection, camera daemon resilience (auto-reconnect, USB-2.0 warning), the
`API/camera_api.py` cache shape (`get_frame_b64` / `get_fresh_frame`), and the
confirmation-phrase → locomotion pattern (`movement_dispatch`).
- Unitree `unitree_sdk2py` (G1 low-level SDK, `LocoClient`, `G1ArmActionClient`,
`AudioClient.PlayStream`).