# Sanadv3 Voice + motion assistant for the Unitree G1 humanoid. **Gemini Live** (or a fully-offline pipeline) handles bilingual Arabic/English conversation; an arm controller plays built-in SDK poses and recorded JSONL macros; a locomotion controller walks/turns the robot; an optional camera feeds **Gemini-side face & place recognition**; everything is orchestrated through a fault-isolated **FastAPI dashboard** on `http://:8000`. ``` ┌──────────────────────────────────────────────────────────────────────┐ │ Dashboard (FastAPI) ── http://:8000 │ │ ├─ Operations Quick-fire arm actions + gestural-speaking │ │ ├─ Voice & Audio Live Gemini, Typed Replay, Wake Phrases, Audio │ │ ├─ Motion & Replay SDK actions, JSONL replays, macros, teaching │ │ ├─ Controller Locomotion teleop, postures, FSM modes, E-STOP │ │ ├─ Recognition Camera vision + face gallery + zones/places │ │ ├─ Recordings Skill registry, saved Gemini turns │ │ ├─ Temperature Live 3D motor-temperature heatmap (three.js) │ │ ├─ Terminal In-browser shell (PTY) to the robot │ │ └─ Settings & Logs System info, tail/stream live logs │ └──────────────────────────────────────────────────────────────────────┘ │ ├─ voice/sanad_voice.py (subprocess — model-agnostic voice loop) │ ├─ gemini/script.py (Gemini Live brain — audio+video+state) │ └─ local/script.py (offline brain — VAD→STT→LLM→TTS) ├─ gemini/client.py (short-session client for Typed Replay) ├─ gemini/subprocess.py (spawns+supervises sanad_voice.py; │ pushes camera frames + motion state │ to the child over its stdin) ├─ voice/movement_dispatch.py(Gemini spoken phrase → locomotion) ├─ vision/camera.py (RealSense/USB capture daemon) ├─ vision/face_gallery.py (data/faces/ CRUD for the primer turn) ├─ vision/zone_gallery.py (data/zones/ places + "go here" targets) ├─ motion/arm_controller.py (G1 arm DDS publisher — owns DDS init) ├─ G1_Controller/loco_controller.py (G1 locomotion via LocoClient) ├─ voice/audio_io.py (mic + speaker abstraction — 3 profiles) └─ core/brain.py (skill dispatcher, event bus) ``` ### Camera + face/place recognition data flow ``` CameraDaemon (parent, in-memory JPEG+b64 cache) ├─→ dashboard /api/recognition/frame.jpg ── snapshot_jpeg() └─→ GeminiSubprocess._frame_forwarder ── get_frame_b64() │ "frame:\n" over stdin ArmController ─emit→ event bus ─→ main.py ─→ live_sub.send_state() │ "state:\n" over stdin ▼ gemini/script.py _stdin_watcher thread ├─ frame: → _LATEST_FRAME → _send_frame_loop → │ session.send_realtime_input(video=Blob) └─ state: → _STATE_PENDING → _send_state_loop → session.send_realtime_input(text=…) Recognition toggles (vision / face-rec / zone-rec / movement) are written by the dashboard to data/.recognition_state.json and POLLED by the Gemini child at 1 Hz — so flipping a toggle takes effect mid-session with NO restart. ``` ## Quick start (on the robot) ```bash conda activate gemini_sdk cd ~/Sanad python3 main.py ``` Then open `http://:8000` in a browser. (The dashboard binds to the `wlan0` IP by default — see *Runtime selection* to override.) Fully-offline brain (no cloud): `SANAD_VOICE_BRAIN=local python3 main.py` (requires `ollama serve` + the local model env — see *Voice brains*). > **Gemini API key — required, none ships with the repo.** The `api_key` > fields in `config/core_config.json` (`gemini_defaults`) and > `data/motions/config.json` (`gemini`) are intentionally empty (`""`). > The voice loop cannot connect until you supply one, by any of: > - **Dashboard** → *Voice & Audio → Gemini API Key* — paste + save, hot-swaps live (no restart). Persists to `data/motions/config.json`. > - **Env var** — `export SANAD_GEMINI_API_KEY=AIza...` before `python3 main.py`. > - **Config file** — set `gemini_defaults.api_key` in `config/core_config.json`. > > Precedence (highest first): `data/motions/config.json` → `SANAD_GEMINI_API_KEY` → `config/core_config.json`. Get a key at . ## Dashboard features ### Operations Quick-fire SDK + JSONL arm actions (chip buttons), gestural-speaking toggle. ### Voice & Audio - **Live Voice Commands** — fire arm gestures from the *user's* transcript (wake-phrase → arm action). Master gate + Deferred-trigger toggle. - **Live Gemini Process** — start/stop the voice conversation subprocess, tail its log. Choose the Gemini cloud brain or the offline brain via `SANAD_VOICE_BRAIN`. - **Typed Replay** — Gemini reads typed text aloud (wrapped with a "repeat verbatim" prompt); optionally records the clip. - **Gemini API Key** — hot-swap the key without restart. - **Wake Phrase Manager** — add/remove phrase → action bindings. - **Audio Controls** — mic/speaker mute, G1 chest-speaker volume (DDS), device profile selection, PulseAudio soft-reset and Anker USB hard-reset. ### Motion & Replay - **Motion Control** — list SDK (built-in) + JSONL (recorded) actions, select + play. Cancel smoothly returns to `arm_home.jsonl`. - **Replay Manager** — upload `.jsonl` files, test-play with speed, Teaching Mode (kinesthetic record — limp the arm and hand-guide it). - **Macro Recorder** — record a new audio+motion pair, OR pick any WAV + any motion (SDK or JSONL) and play them in parallel. ### Controller *(locomotion)* Manual teleoperation of the G1's **legs** via the Unitree `LocoClient`. **Disarmed every boot**; all motion writes require Arm first. - **Move / Step** — continuous teleop (vx/vy/vyaw) or discrete one-shot steps. - **Postures & FSM modes** — zero-torque, damp, squat, sit, stand, balance, stand-height; prep/ready sequences; MotionSwitcher select-AI/release. - **Gemini Movement** — toggle voice-driven walking: the `MovementDispatcher` parses Gemini's *own spoken confirmation phrases* ("Turning right." / "أستدير يميناً.") and drives the legs (gated on this toggle + an E-STOP latch). - **E-STOP** — always available; `StopMove` + disarm + latch the dispatcher. > **Safety:** the arm and locomotion are **mutually exclusive** — > `arm.set_motion_block(loco.movement_active)` makes every arm > replay/gesture refuse while the robot is (or just was, within ~1.5 s) walking. ### Recognition Camera vision + Gemini-side **face** and **zone/place** recognition. All are **off by default**; each is a **hot toggle** (≈1 s to take effect, no restart). - **Camera Vision** — `CameraDaemon` captures from a RealSense (preferred) or USB camera; the supervisor streams JPEG frames to Gemini Live so it can answer "what do you see?". Live preview panel. Auto-reconnects on USB unplug/stall and warns if a RealSense negotiated USB 2.0 (Marcus-ported resilience). - **Face Recognition** — manage `data/faces/face_{id}/` galleries: enroll from the live camera or upload photos, rename, describe, download (per-photo or ZIP), delete. On session start (and on any gallery change) the child sends a **primer turn** carrying every enrolled face + a Khaleeji greeting instruction — **Gemini matches in-context, so there is no local face-recognition model**. Recognition needs vision on. - **Zones & Places** — `data/zones/zone_{zid}/place_{pid}/` two-level gallery: reference photos per place, optional linked face_ids, and a **"go here"** nav target (`nav_target_zone/place_id` in the recognition-state file) for place-aware navigation. - **Sync Gallery** — force-resend the face/zone primer to the live session. ### Recordings Skill Registry (predefined audio+motion+callback skills from `skills.json`) + Saved Records (captured Gemini turn recordings; play/pause/stop/rename/delete). ### Temperature Live **3D motor-temperature heatmap** — a standalone three.js viewer (`dashboard/static/temp3d/`) loads the G1 29-DoF URDF + STL meshes and colors each joint blue→red from the arm controller's throttled `rt/lowstate` snapshot, streamed over `/ws/motor-temps` at ~8 fps. No second DDS subscriber. ### Terminal In-browser **PTY shell** to the robot (`/ws/terminal`, xterm.js) — a `bash -i` as the dashboard's user, with resize + backpressure, bounded to 4 sessions. (See *Security* — this is full shell access to whoever reaches the URL.) ### Settings & Logs System info (host, network interfaces, DDS interface, bound dashboard host/port, per-subsystem status, audio devices), live log stream (`/ws/logs`), per-file tail, snapshot, and a one-blob "Copy All Logs" bundle. ## Directory layout | Path | Contents | |---|---| | `main.py` | Entry point — fault-isolated boot of all subsystems + the dashboard. Doubles as the service container (route handlers `import` its module globals). | | `config.py` | Runtime constants + layout-agnostic path resolution; layers `data/motions/config.json` over the JSON config at import. | | `config/` | Per-subsystem JSON: `core`, `voice`, `gemini`, `local`, `motion`, `dashboard`. | | `core/` | `brain.py` (skill dispatcher), `event_bus.py`, `skill_registry.py`, `config_loader.py`, `logger.py` (rotating + WS push), `asyncio_compat.py` (3.8 `to_thread` shim). | | `gemini/` | Gemini Live — `client.py` (one-shot), `script.py` (live brain: audio + video + motion-state), `subprocess.py` (supervisor + stdin frame/state push). | | `local/` | Fully-offline brain — `vad.py` (Silero), `stt.py` (faster-whisper), `llm.py` (Qwen via Ollama/llama.cpp), `tts.py` (CosyVoice2), `script.py` (the brain), `subprocess.py` (supervisor). Opt-in via `SANAD_VOICE_BRAIN=local`. | | `voice/` | `sanad_voice.py` (subprocess entry, model-agnostic), `audio_io.py` / `audio_manager.py` / `audio_devices.py` (mic/speaker), `local_tts.py` (SpeechT5 Arabic TTS), `live_voice_loop.py` (user-transcript → arm gesture), `movement_dispatch.py` (Gemini-phrase → locomotion), `typed_replay.py`, `wake_phrase_manager.py`, `text_utils.py` (Arabic normalization + phrase matching), `model_script.py` / `model_subprocess.py` (brain templates). | | `motion/` | `arm_controller.py` (production 5-phase JSONL replay engine, owns the single DDS init), `macro_player.py`, `macro_recorder.py`, `teaching.py`. (`sanad_arm_controller.py` is a legacy alternate — not wired by `main.py`.) | | `G1_Controller/` | `loco_controller.py` — locomotion via Unitree `LocoClient` (move/step/postures/FSM/E-STOP); reuses the arm's DDS participant. | | `vision/` | `camera.py` (RealSense/USB daemon, auto-reconnect), `face_gallery.py`, `zone_gallery.py`, `recognition_state.py` (atomic-JSON toggle IPC). | | `dashboard/` | `app.py` (FastAPI factory + fault-isolated router registration), `routes/*.py` (20 REST routers), `websockets/*.py` (logs, motor-temps, terminal), `static/index.html` (single-page UI), `static/temp3d/` (3D viewer). | | `scripts/` | Persona files — `sanad_script.txt` (voice persona "Bousandah"), `sanad_rule.txt`, `sanad_arm.txt` (voice→arm phrases). | | `data/` | Runtime state — `motions/*.jsonl` (arm trajectories) + `instruction.json` (locomotion phrase map) + `skills.json` + `config.json` (dashboard-editable), `recordings/` (captured turns + macros), `faces/face_{id}/` + `zones/zone_{zid}/place_{pid}/` (galleries), `audio/` (typed-replay WAVs + records index), `.recognition_state.json` (toggle IPC). | | `model/` | Local SpeechT5 / Whisper / CosyVoice2 weights when using the offline pipeline. | | `logs/` | Per-module rotating logs. | ## Voice brains The child `voice/sanad_voice.py` is model-agnostic and selects a brain via `SANAD_VOICE_BRAIN`. Every brain implements the same contract (`__init__(audio_io, recorder, voice, system_prompt)`, `async run()`, `stop()`) and ships a sibling supervisor that spawns the child and parses its `USER:` / `BOT:` / state log markers. | Value | Brain | Pipeline | |---|---|---| | `gemini` *(default)* | `gemini/script.py` | Gemini Live native-audio (full-duplex speech-to-speech, server-side VAD, vision frames, face/zone primers, voice→movement). Cloud. | | `local` | `local/script.py` | Silero VAD → faster-whisper (large-v3-turbo, CUDA int8) → Qwen2.5 (Ollama/llama.cpp) → CosyVoice2 streaming TTS. Fully on-device. | | `model` | `voice/model_script.py` | Template/stub for adding a new provider (OpenAI Realtime, Claude Voice, …). | To add a brain: drop a file in `voice/` or a new `/` folder and add a branch to `voice/sanad_voice.py:_build_brain()`; ship a supervisor modeled on `voice/model_subprocess.py`. ## Runtime selection (env vars) | Var | Values | Default | Effect | |---|---|---|---| | `SANAD_VOICE_BRAIN` | `gemini`, `local`, `model` | `gemini` | Which brain the subprocess loads (see `voice/sanad_voice.py:_build_brain`). | | `SANAD_AUDIO_PROFILE` | `builtin`, `anker`, `hollyland_builtin` | `builtin` | Mic + speaker pair. `builtin` = G1 UDP mic + G1 chest speaker via DDS. | | `SANAD_DDS_INTERFACE` | network iface | `eth0` | DDS network for G1 low-level comms (arm + locomotion + speaker). | | `SANAD_DASHBOARD_HOST` / `_INTERFACE` | IP / iface | `wlan0` IP | Dashboard bind address. | | `SANAD_GEMINI_API_KEY` | string | `""` (empty) | Gemini API key. No key ships in the repo — set this, paste one in the dashboard (**Voice & Audio → Gemini API Key**), or fill `gemini_defaults.api_key` in `config/core_config.json`. See [Quick start](#quick-start-on-the-robot). | | `SANAD_GEMINI_MODEL` / `_VOICE` | string | reads config | Override the Gemini model id / prebuilt voice. | | `SANAD_G1_VOLUME` | `0`–`100` | `100` | G1 chest-speaker volume; also scales the barge-in threshold. | | `SANAD_LIVE_SCRIPT` | path | auto | Override the subprocess entry script path. | | `SANAD_RECORD` | `0` or `1` | `1` | Record every Gemini turn to `data/recordings/`. | | `SANAD_AEC_ENABLE` | `0` or `1` | `1` | Enable WebRTC AEC3 (if the Python binding is installed). | | `SANAD_VISION_ENABLE` | `0` or `1` | `0` | Boot default for camera vision. **Runtime truth is the Recognition-tab toggle** → `data/.recognition_state.json`, hot-applied without a restart. | | `SANAD_FACE_RECOGNITION_ENABLE` | `0` or `1` | `0` | Boot default for Gemini-side face recognition. Also a hot toggle. | | `SANAD_VISION_SEND_HZ` | float | `2` | Frames/sec the Gemini child relays to Live. | | `SANAD_CAMERA_WIDTH` / `_HEIGHT` / `_FPS` | int | `424` / `240` / `15` | Capture profile. Also settable per-deploy in `config/core_config.json > camera`. | | `SANAD_CAMERA_USB_INDEX` | int | auto | Pin a `/dev/videoN` node (avoids picking a RealSense IR stream). | | `SANAD_FACES_MAX_SAMPLES` | int | `3` | Max photos per person fed into the gallery primer turn (token budget). | | `SANAD_PROJECT_ROOT` | path | auto | Override the project root (see *Dynamic paths*). | > All `SANAD_VISION_*` / `SANAD_CAMERA_*` / `SANAD_FACE_*` vars are **boot > defaults** forwarded to the Gemini child via `LIVE_TUNE`. Once running, the > Recognition tab's toggles (vision / face-rec / zone-rec / movement) are the > live source of truth in `data/.recognition_state.json`, polled at 1 Hz. CLI flags: `python3 main.py --host --port 8000 --network `; `--check-env` prints a subsystem/environment diagnostic and exits. ## API surface All routes are registered defensively — a router whose import fails is recorded (`GET /api/_dashboard_status`) and the server still boots without it. **REST** (prefix → controls): `/api` health · `/api/system` info · `/api/voice` Gemini/local generate+connect+key · `/api/motion` arm actions · `/api/skills` skill registry · `/api/macros` record/play · `/api/replay` JSONL CRUD + teaching · `/api/audio` mute/volume/devices/reset · `/api/scripts` persona files · `/api/records` saved WAVs · `/api/prompt` system prompt · `/api/wake-phrases` bindings · `/api/live-voice` arm-phrase dispatcher · `/api/live-subprocess` Gemini child · `/api/typed-replay` TTS · `/api/recognition` vision + face gallery · `/api/zones` zones/places + nav target · `/api/temp` motor map + snapshot · `/api/controller` locomotion (move/step/postures/modes/ E-STOP). **WebSockets**: `/ws/logs` (live log stream + 500-line replay) · `/ws/motor-temps` (3D heatmap data, ~8 fps) · `/ws/terminal` (PTY shell). ## Architecture notes - **Subprocess isolation**: `voice/sanad_voice.py` runs as a child of `main.py` via the supervisor. If the voice loop crashes, the dashboard + arm + legs stay up. - **Single DDS init**: `motion/arm_controller.py` owns the one `ChannelFactoryInitialize`; `LocoController` and the audio routes reuse that participant rather than re-initializing. - **Brain contract**: see `voice/model_script.py` — any new model implements `__init__(audio_io, recorder, voice, system_prompt)`, `async run()`, `stop()`. - **Supervisor contract**: each brain ships a sibling supervisor (e.g. `gemini/subprocess.py`) that spawns `sanad_voice.py` with its `SANAD_VOICE_BRAIN` and parses the brain's log markers. Template: `voice/model_subprocess.py`. - **Locomotion safety**: `LocoController` is disarmed every boot, has velocity caps + a `StopMove` watchdog, and is mutually exclusive with the arm. Voice-driven movement is **off by default** and gated by the Controller toggle. Distances/degrees in `data/motions/instruction.json` are **approximate and must be calibrated on the real robot** — there is no obstacle/abort stack. - **Audio routing**: the G1's platform-sound PulseAudio sink is NOT wired to a physical speaker. All dashboard-triggered playback (`play_wav`, typed-replay audio, record playback) routes through DDS `AudioClient.PlayStream` via `audio_manager._play_pcm_via_g1`. The PyAudio path is a desktop/dev fallback. - **Arm replay**: `motion/arm_controller.py:_replay_file_inner()` is a port of `G1_Lootah/Manual_Recorder/g1_replay_v4_stable.py:Run()` — ramp-in → settle hold → playback → smooth return → disable SDK. Body motors (0–14) lock to a live snapshot while arm motors (15–28) follow the file at 60 Hz. `_return_home()` runs unconditionally after a cancel for a jerk-free return. - **Camera frame transport (stdin push)**: the `CameraDaemon` lives in the parent and caches frames in memory. `GeminiSubprocess` base64-encodes the latest frame to the child's stdin (~2 fps); the child's `_stdin_watcher` relays it to Gemini Live with a staleness guard. Chosen over a file drop so the parent owns the camera once and the dashboard preview reads the same cache. - **Motion-state channel**: `arm_controller._execute()` emits `motion.action_started` / `_done` / `_error` on the event bus. `main.py` forwards each to the child as `state:\n`, injected to Gemini Live as silent `[STATE-START] wave_hand` / `[STATE-DONE] wave_hand (2.3s)` text so it can honestly answer "what are you doing?". - **Recognition is Gemini-side**: no dlib/insightface/onnxruntime. Galleries are pure file IO; `gemini/script.py:_send_gallery_primer()` builds one multimodal `send_client_content` turn — every enrolled face/place's photos + a greeting instruction — and Gemini matches incoming frames against it in-context. ## Camera vision on Jetson The Recognition tab needs `pyrealsense2` to talk to the Intel RealSense. **Do not `pip install pyrealsense2` on JetPack 5** — the PyPI wheel is built against glibc 2.32+ (Ubuntu 22.04) and fails to load on JetPack 5's glibc 2.31 with `ImportError: ... version 'GLIBC_2.32' not found`. The native runtime is already there (`apt`-installed `librealsense2`). Build just the Python binding from source against it, into the `gemini_sdk` env: ```bash rs-enumerate-devices # confirm the D435I shows up at OS level first source ~/miniconda3/etc/profile.d/conda.sh && conda activate gemini_sdk pip uninstall -y pyrealsense2 # remove the broken wheel if present sudo apt install -y cmake build-essential git python3-dev libusb-1.0-0-dev pkg-config libssl-dev cd /tmp && rm -rf librealsense git clone --depth=1 --branch v2.56.5 https://github.com/IntelRealSense/librealsense.git cd librealsense && mkdir -p build && cd build cmake .. -DBUILD_PYTHON_BINDINGS=ON -DPYTHON_EXECUTABLE=$(which python3) \ -DBUILD_EXAMPLES=OFF -DBUILD_GRAPHICAL_EXAMPLES=OFF \ -DBUILD_UNIT_TESTS=OFF -DCHECK_FOR_UPDATES=OFF -DCMAKE_BUILD_TYPE=Release make -j$(nproc) pyrealsense2 SITE=$(python3 -c "import sysconfig; print(sysconfig.get_paths()['purelib'])") mkdir -p "$SITE/pyrealsense2" cp wrappers/python/pyrealsense2*.so "$SITE/pyrealsense2/" cp ../wrappers/python/pyrealsense2/__init__.py "$SITE/pyrealsense2/" 2>/dev/null || true python3 -c 'import pyrealsense2 as rs; print([d.get_info(rs.camera_info.name) for d in rs.context().query_devices()])' ``` Match the `--branch` tag to the installed runtime (`dpkg -l | grep librealsense2`). If the build isn't worth it, `CameraDaemon` falls back to `cv2.VideoCapture(0)` automatically — fine for a plain USB webcam, but note a RealSense exposes its *depth* stream at `/dev/video0`, not RGB, so a real USB cam is the cleaner fallback (or pin `SANAD_CAMERA_USB_INDEX`). On x86_64 / Ubuntu 22.04+ desktops, `pip install pyrealsense2` just works. ## Dynamic paths Every path is derived at runtime — no hard-coded `/home/...` anywhere. Resolution order for `BASE_DIR` in `config.py`: 1. `SANAD_PROJECT_ROOT` env var (if set). 2. `PROJECT_BASE + PROJECT_NAME` from a `.env` file in `Sanad/` or its parent. 3. `Path(__file__).resolve().parent` — auto-detected. The project runs unchanged from either layout: - dev: `/Project/Sanad/` - deployed: `/home/unitree/Sanad/` ## Deployment (workstation → robot) ```bash rsync -av --delete \ --exclude=__pycache__ --exclude=logs --exclude=model --exclude=.git \ /path/to/Sanad/ \ unitree@192.168.123.164:/home/unitree/Sanad/ ``` Then on the robot: `Ctrl+C` the running `main.py` and re-run. ## Security The dashboard has **no authentication**. Anyone who can reach `http://:8000` gets full robot control — locomotion, arm, audio, file upload/delete — and, via the **Terminal tab**, an interactive shell as the dashboard's user. Bind it to a **trusted LAN only**; add auth before any wider exposure. ## Troubleshooting | Symptom | Fix | |---|---| | `No LowState received in 2s — refusing to replay` | `main.py` was re-executed as both `__main__` and `Project.Sanad.main`, creating two arm instances. Fix lives in the `sys.modules` alias near the top of `main.py`. Restart. | | `G1ArmActionClient not available — skipping` for SDK actions | Same duplicate-init issue as above. | | `No module named 'Project'` in subprocess | Bootstrap preamble in `voice/sanad_voice.py:~30` synthesises the `Project.Sanad` namespace when run as `__main__`. | | Controller moves rejected (409) | The Controller is **disarmed by default** — hit Arm first. Reads + E-STOP are always allowed. | | Arm action refused while "movement armed" | Arm ↔ locomotion are mutually exclusive. Disarm/stop locomotion, then trigger the arm. | | Voice-driven walking does nothing | "Gemini Movement" toggle off, or E-STOP latched. Toggle on; clear E-STOP. Distances are uncalibrated. | | Arm jumps at start of JSONL replay | `SETTLE_HOLD_SEC` (in `config/motion_config.json > arm_controller`) too low — try `0.7` or `1.0`. | | Record playback silent | `audio_mgr.play_wav` only routes to G1 DDS if the Unitree SDK is importable; on desktop it falls back to the PulseAudio sink. | | Live Voice Commands transcript stuck | Deferred trigger was queued but `trigger_enabled` toggle was off. Toggle on — or the pending-trigger poll fires it automatically once enabled. | | Gemini "no audio" on Typed Replay | Non-deterministic; the retry chain in `voice/typed_replay.py:generate_audio` tries three prompt variants. For reliable TTS, use the offline `local_tts` SpeechT5 path. | | Local brain exits immediately | `ollama serve` not running / model not pulled, or weights missing under `model/`. Check `logs/local_subprocess.log`. The Gemini brain is the safe default. | | Recognition tab: "Camera could not start (no backend)" | No camera backend acquired. Check `rs-enumerate-devices` (RealSense at OS level) and `python3 -c 'import pyrealsense2'` in the `gemini_sdk` env. The glibc `ImportError` means the pip wheel is incompatible — see "Camera vision on Jetson" above. | | Camera badge stuck on "reconnecting…" | `CameraDaemon` lost the device and is retrying with exponential backoff. Re-seat the USB 3 cable; check `logs/camera.log` for the USB-2.0 warning. | | Gemini doesn't greet an enrolled face | Face Recognition toggle on? Vision on? (Face rec needs frames.) Check `logs/gemini_brain.log` for `face gallery primed: N person(s)`. Hit "Sync Gallery" to force a re-prime. | | Gemini unaware of motion state | The `motion.action_*` → `send_state` chain only runs when Live Gemini is up. Check `logs/gemini_subprocess.log` and `logs/gemini_brain.log` for `STATE injected:` lines. | ## License / attribution Internal project for YS Lootah Technology. Reuses/ports patterns from: - `G1_Lootah/Manual_Recorder/g1_replay_v4_stable.py` (arm replay math) - `SanadVoice/gemini_interact` (arm-phrase dispatch, skill registry) - `SanadVoice/gemini_voice_v2` (local SpeechT5 TTS) - `Project/Marcus` — camera→Gemini stdin-push transport, motion-state injection, camera daemon resilience (auto-reconnect, USB-2.0 warning), the `API/camera_api.py` cache shape (`get_frame_b64` / `get_fresh_frame`), and the confirmation-phrase → locomotion pattern (`movement_dispatch`). - Unitree `unitree_sdk2py` (G1 low-level SDK, `LocoClient`, `G1ArmActionClient`, `AudioClient.PlayStream`).