2026-07-04 19:37:27 +00:00

26 KiB

Raw Permalink Blame History

Sanadv3

Voice + motion assistant for the Unitree G1 humanoid. Gemini Live (or a fully-offline pipeline) handles bilingual Arabic/English conversation; an arm controller plays built-in SDK poses and recorded JSONL macros; a locomotion controller walks/turns the robot; an optional camera feeds Gemini-side face & place recognition; everything is orchestrated through a fault-isolated FastAPI dashboard on http://<robot>:8000.

┌──────────────────────────────────────────────────────────────────────┐
│  Dashboard (FastAPI) ── http://<robot>:8000                            │
│  ├─ Operations         Quick-fire arm actions + gestural-speaking      │
│  ├─ Voice & Audio      Live Gemini, Typed Replay, Wake Phrases, Audio  │
│  ├─ Motion & Replay    SDK actions, JSONL replays, macros, teaching    │
│  ├─ Controller         Locomotion teleop, postures, FSM modes, E-STOP  │
│  ├─ Recognition        Camera vision + face gallery + zones/places     │
│  ├─ Recordings         Skill registry, saved Gemini turns              │
│  ├─ Temperature        Live 3D motor-temperature heatmap (three.js)    │
│  ├─ Terminal           In-browser shell (PTY) to the robot             │
│  └─ Settings & Logs    System info, tail/stream live logs              │
└──────────────────────────────────────────────────────────────────────┘
        │
        ├─ voice/sanad_voice.py      (subprocess — model-agnostic voice loop)
        │    ├─ gemini/script.py     (Gemini Live brain — audio+video+state)
        │    └─ local/script.py      (offline brain — VAD→STT→LLM→TTS)
        ├─ gemini/client.py          (short-session client for Typed Replay)
        ├─ gemini/subprocess.py      (spawns+supervises sanad_voice.py;
        │                             pushes camera frames + motion state
        │                             to the child over its stdin)
        ├─ voice/movement_dispatch.py(Gemini spoken phrase → locomotion)
        ├─ vision/camera.py          (RealSense/USB capture daemon)
        ├─ vision/face_gallery.py    (data/faces/ CRUD for the primer turn)
        ├─ vision/zone_gallery.py    (data/zones/ places + "go here" targets)
        ├─ motion/arm_controller.py  (G1 arm DDS publisher — owns DDS init)
        ├─ G1_Controller/loco_controller.py (G1 locomotion via LocoClient)
        ├─ voice/audio_io.py         (mic + speaker abstraction — 3 profiles)
        └─ core/brain.py             (skill dispatcher, event bus)

Camera + face/place recognition data flow

CameraDaemon (parent, in-memory JPEG+b64 cache)
  ├─→ dashboard /api/recognition/frame.jpg   ── snapshot_jpeg()
  └─→ GeminiSubprocess._frame_forwarder      ── get_frame_b64()
                                                 │ "frame:<b64>\n" over stdin
ArmController ─emit→ event bus ─→ main.py ─→ live_sub.send_state()
                                                 │ "state:<json>\n" over stdin
                                                 ▼
                          gemini/script.py  _stdin_watcher thread
                            ├─ frame: → _LATEST_FRAME → _send_frame_loop →
                            │             session.send_realtime_input(video=Blob)
                            └─ state: → _STATE_PENDING → _send_state_loop →
                                          session.send_realtime_input(text=…)

Recognition toggles (vision / face-rec / zone-rec / movement) are written by the
dashboard to data/.recognition_state.json and POLLED by the Gemini child at 1 Hz
— so flipping a toggle takes effect mid-session with NO restart.

Quick start (on the robot)

conda activate gemini_sdk
cd ~/Sanad
python3 main.py

Then open http://<robot-ip>:8000 in a browser. (The dashboard binds to the wlan0 IP by default — see Runtime selection to override.)

Fully-offline brain (no cloud): SANAD_VOICE_BRAIN=local python3 main.py (requires ollama serve + the local model env — see Voice brains).

Gemini API key — required, none ships with the repo. The api_key fields in config/core_config.json (gemini_defaults) and data/motions/config.json (gemini) are intentionally empty (""). The voice loop cannot connect until you supply one, by any of:

Dashboard → Voice & Audio → Gemini API Key — paste + save, hot-swaps live (no restart). Persists to data/motions/config.json.

Env var — export SANAD_GEMINI_API_KEY=AIza... before python3 main.py.

Config file — set gemini_defaults.api_key in config/core_config.json.

Precedence (highest first): data/motions/config.json → SANAD_GEMINI_API_KEY → config/core_config.json. Get a key at https://aistudio.google.com/apikey.

Dashboard features

Operations

Quick-fire SDK + JSONL arm actions (chip buttons), gestural-speaking toggle.

Voice & Audio

Live Voice Commands — fire arm gestures from the user's transcript (wake-phrase → arm action). Master gate + Deferred-trigger toggle.
Live Gemini Process — start/stop the voice conversation subprocess, tail its log. Choose the Gemini cloud brain or the offline brain via SANAD_VOICE_BRAIN.
Typed Replay — Gemini reads typed text aloud (wrapped with a "repeat verbatim" prompt); optionally records the clip.
Gemini API Key — hot-swap the key without restart.
Wake Phrase Manager — add/remove phrase → action bindings.
Audio Controls — mic/speaker mute, G1 chest-speaker volume (DDS), device profile selection, PulseAudio soft-reset and Anker USB hard-reset.

Motion & Replay

Motion Control — list SDK (built-in) + JSONL (recorded) actions, select + play. Cancel smoothly returns to arm_home.jsonl.
Replay Manager — upload .jsonl files, test-play with speed, Teaching Mode (kinesthetic record — limp the arm and hand-guide it).
Macro Recorder — record a new audio+motion pair, OR pick any WAV + any motion (SDK or JSONL) and play them in parallel.

Controller (locomotion)

Manual teleoperation of the G1's legs via the Unitree LocoClient. Disarmed every boot; all motion writes require Arm first.

Move / Step — continuous teleop (vx/vy/vyaw) or discrete one-shot steps.
Postures & FSM modes — zero-torque, damp, squat, sit, stand, balance, stand-height; prep/ready sequences; MotionSwitcher select-AI/release.
Gemini Movement — toggle voice-driven walking: the MovementDispatcher parses Gemini's own spoken confirmation phrases ("Turning right." / "أستدير يميناً.") and drives the legs (gated on this toggle + an E-STOP latch).
E-STOP — always available; StopMove + disarm + latch the dispatcher.

Safety: the arm and locomotion are mutually exclusive — arm.set_motion_block(loco.movement_active) makes every arm replay/gesture refuse while the robot is (or just was, within ~1.5 s) walking.

Recognition

Camera vision + Gemini-side face and zone/place recognition. All are off by default; each is a hot toggle (≈1 s to take effect, no restart).

Camera Vision — CameraDaemon captures from a RealSense (preferred) or USB camera; the supervisor streams JPEG frames to Gemini Live so it can answer "what do you see?". Live preview panel. Auto-reconnects on USB unplug/stall and warns if a RealSense negotiated USB 2.0 (Marcus-ported resilience).
Face Recognition — manage data/faces/face_{id}/ galleries: enroll from the live camera or upload photos, rename, describe, download (per-photo or ZIP), delete. On session start (and on any gallery change) the child sends a primer turn carrying every enrolled face + a Khaleeji greeting instruction — Gemini matches in-context, so there is no local face-recognition model. Recognition needs vision on.
Zones & Places — data/zones/zone_{zid}/place_{pid}/ two-level gallery: reference photos per place, optional linked face_ids, and a "go here" nav target (nav_target_zone/place_id in the recognition-state file) for place-aware navigation.
Sync Gallery — force-resend the face/zone primer to the live session.

Recordings

Skill Registry (predefined audio+motion+callback skills from skills.json) + Saved Records (captured Gemini turn recordings; play/pause/stop/rename/delete).

Temperature

Live 3D motor-temperature heatmap — a standalone three.js viewer (dashboard/static/temp3d/) loads the G1 29-DoF URDF + STL meshes and colors each joint blue→red from the arm controller's throttled rt/lowstate snapshot, streamed over /ws/motor-temps at ~8 fps. No second DDS subscriber.

Terminal

In-browser PTY shell to the robot (/ws/terminal, xterm.js) — a bash -i as the dashboard's user, with resize + backpressure, bounded to 4 sessions. (See Security — this is full shell access to whoever reaches the URL.)

Settings & Logs

System info (host, network interfaces, DDS interface, bound dashboard host/port, per-subsystem status, audio devices), live log stream (/ws/logs), per-file tail, snapshot, and a one-blob "Copy All Logs" bundle.

Directory layout

Path	Contents
`main.py`	Entry point — fault-isolated boot of all subsystems + the dashboard. Doubles as the service container (route handlers `import` its module globals).
`config.py`	Runtime constants + layout-agnostic path resolution; layers `data/motions/config.json` over the JSON config at import.
`config/`	Per-subsystem JSON: `core`, `voice`, `gemini`, `local`, `motion`, `dashboard`.
`core/`	`brain.py` (skill dispatcher), `event_bus.py`, `skill_registry.py`, `config_loader.py`, `logger.py` (rotating + WS push), `asyncio_compat.py` (3.8 `to_thread` shim).
`gemini/`	Gemini Live — `client.py` (one-shot), `script.py` (live brain: audio + video + motion-state), `subprocess.py` (supervisor + stdin frame/state push).
`local/`	Fully-offline brain — `vad.py` (Silero), `stt.py` (faster-whisper), `llm.py` (Qwen via Ollama/llama.cpp), `tts.py` (CosyVoice2), `script.py` (the brain), `subprocess.py` (supervisor). Opt-in via `SANAD_VOICE_BRAIN=local`.
`voice/`	`sanad_voice.py` (subprocess entry, model-agnostic), `audio_io.py` / `audio_manager.py` / `audio_devices.py` (mic/speaker), `local_tts.py` (SpeechT5 Arabic TTS), `live_voice_loop.py` (user-transcript → arm gesture), `movement_dispatch.py` (Gemini-phrase → locomotion), `typed_replay.py`, `wake_phrase_manager.py`, `text_utils.py` (Arabic normalization + phrase matching), `model_script.py` / `model_subprocess.py` (brain templates).
`motion/`	`arm_controller.py` (production 5-phase JSONL replay engine, owns the single DDS init), `macro_player.py`, `macro_recorder.py`, `teaching.py`. (`sanad_arm_controller.py` is a legacy alternate — not wired by `main.py`.)
`G1_Controller/`	`loco_controller.py` — locomotion via Unitree `LocoClient` (move/step/postures/FSM/E-STOP); reuses the arm's DDS participant.
`vision/`	`camera.py` (RealSense/USB daemon, auto-reconnect), `face_gallery.py`, `zone_gallery.py`, `recognition_state.py` (atomic-JSON toggle IPC).
`dashboard/`	`app.py` (FastAPI factory + fault-isolated router registration), `routes/.py` (20 REST routers), `websockets/.py` (logs, motor-temps, terminal), `static/index.html` (single-page UI), `static/temp3d/` (3D viewer).
`scripts/`	Persona files — `sanad_script.txt` (voice persona "Bousandah"), `sanad_rule.txt`, `sanad_arm.txt` (voice→arm phrases).
`data/`	Runtime state — `motions/*.jsonl` (arm trajectories) + `instruction.json` (locomotion phrase map) + `skills.json` + `config.json` (dashboard-editable), `recordings/` (captured turns + macros), `faces/face_{id}/` + `zones/zone_{zid}/place_{pid}/` (galleries), `audio/` (typed-replay WAVs + records index), `.recognition_state.json` (toggle IPC).
`model/`	Local SpeechT5 / Whisper / CosyVoice2 weights when using the offline pipeline.
`logs/`	Per-module rotating logs.

Voice brains

The child voice/sanad_voice.py is model-agnostic and selects a brain via SANAD_VOICE_BRAIN. Every brain implements the same contract (__init__(audio_io, recorder, voice, system_prompt), async run(), stop()) and ships a sibling supervisor that spawns the child and parses its USER: / BOT: / state log markers.

Value	Brain	Pipeline
`gemini` (default)	`gemini/script.py`	Gemini Live native-audio (full-duplex speech-to-speech, server-side VAD, vision frames, face/zone primers, voice→movement). Cloud.
`local`	`local/script.py`	Silero VAD → faster-whisper (large-v3-turbo, CUDA int8) → Qwen2.5 (Ollama/llama.cpp) → CosyVoice2 streaming TTS. Fully on-device.
`model`	`voice/model_script.py`	Template/stub for adding a new provider (OpenAI Realtime, Claude Voice, …).

To add a brain: drop a file in voice/ or a new <brand>/ folder and add a branch to voice/sanad_voice.py:_build_brain(); ship a supervisor modeled on voice/model_subprocess.py.

Runtime selection (env vars)

Var	Values	Default	Effect
`SANAD_VOICE_BRAIN`	`gemini`, `local`, `model`	`gemini`	Which brain the subprocess loads (see `voice/sanad_voice.py:_build_brain`).
`SANAD_AUDIO_PROFILE`	`builtin`, `anker`, `hollyland_builtin`	`builtin`	Mic + speaker pair. `builtin` = G1 UDP mic + G1 chest speaker via DDS.
`SANAD_DDS_INTERFACE`	network iface	`eth0`	DDS network for G1 low-level comms (arm + locomotion + speaker).
`SANAD_DASHBOARD_HOST` / `_INTERFACE`	IP / iface	`wlan0` IP	Dashboard bind address.
`SANAD_GEMINI_API_KEY`	string	`""` (empty)	Gemini API key. No key ships in the repo — set this, paste one in the dashboard (Voice & Audio → Gemini API Key), or fill `gemini_defaults.api_key` in `config/core_config.json`. See Quick start.
`SANAD_GEMINI_MODEL` / `_VOICE`	string	reads config	Override the Gemini model id / prebuilt voice.
`SANAD_G1_VOLUME`	`0`–`100`	`100`	G1 chest-speaker volume; also scales the barge-in threshold.
`SANAD_LIVE_SCRIPT`	path	auto	Override the subprocess entry script path.
`SANAD_RECORD`	`0` or `1`	`1`	Record every Gemini turn to `data/recordings/`.
`SANAD_AEC_ENABLE`	`0` or `1`	`1`	Enable WebRTC AEC3 (if the Python binding is installed).
`SANAD_VISION_ENABLE`	`0` or `1`	`0`	Boot default for camera vision. Runtime truth is the Recognition-tab toggle → `data/.recognition_state.json`, hot-applied without a restart.
`SANAD_FACE_RECOGNITION_ENABLE`	`0` or `1`	`0`	Boot default for Gemini-side face recognition. Also a hot toggle.
`SANAD_VISION_SEND_HZ`	float	`2`	Frames/sec the Gemini child relays to Live.
`SANAD_CAMERA_WIDTH` / `_HEIGHT` / `_FPS`	int	`424` / `240` / `15`	Capture profile. Also settable per-deploy in `config/core_config.json > camera`.
`SANAD_CAMERA_USB_INDEX`	int	auto	Pin a `/dev/videoN` node (avoids picking a RealSense IR stream).
`SANAD_FACES_MAX_SAMPLES`	int	`3`	Max photos per person fed into the gallery primer turn (token budget).
`SANAD_PROJECT_ROOT`	path	auto	Override the project root (see Dynamic paths).

All SANAD_VISION_* / SANAD_CAMERA_* / SANAD_FACE_* vars are boot defaults forwarded to the Gemini child via LIVE_TUNE. Once running, the Recognition tab's toggles (vision / face-rec / zone-rec / movement) are the live source of truth in data/.recognition_state.json, polled at 1 Hz.

CLI flags: python3 main.py --host <ip> --port 8000 --network <dds_iface>; --check-env prints a subsystem/environment diagnostic and exits.

API surface

All routes are registered defensively — a router whose import fails is recorded (GET /api/_dashboard_status) and the server still boots without it.

REST (prefix → controls): /api health · /api/system info · /api/voice Gemini/local generate+connect+key · /api/motion arm actions · /api/skills skill registry · /api/macros record/play · /api/replay JSONL CRUD + teaching · /api/audio mute/volume/devices/reset · /api/scripts persona files · /api/records saved WAVs · /api/prompt system prompt · /api/wake-phrases bindings · /api/live-voice arm-phrase dispatcher · /api/live-subprocess Gemini child · /api/typed-replay TTS · /api/recognition vision + face gallery · /api/zones zones/places + nav target · /api/temp motor map + snapshot · /api/controller locomotion (move/step/postures/modes/ E-STOP).

WebSockets: /ws/logs (live log stream + 500-line replay) · /ws/motor-temps (3D heatmap data, ~8 fps) · /ws/terminal (PTY shell).

Architecture notes

Subprocess isolation: voice/sanad_voice.py runs as a child of main.py via the supervisor. If the voice loop crashes, the dashboard + arm + legs stay up.
Single DDS init: motion/arm_controller.py owns the one ChannelFactoryInitialize; LocoController and the audio routes reuse that participant rather than re-initializing.
Brain contract: see voice/model_script.py — any new model implements __init__(audio_io, recorder, voice, system_prompt), async run(), stop().
Supervisor contract: each brain ships a sibling supervisor (e.g. gemini/subprocess.py) that spawns sanad_voice.py with its SANAD_VOICE_BRAIN and parses the brain's log markers. Template: voice/model_subprocess.py.
Locomotion safety: LocoController is disarmed every boot, has velocity caps + a StopMove watchdog, and is mutually exclusive with the arm. Voice-driven movement is off by default and gated by the Controller toggle. Distances/degrees in data/motions/instruction.json are approximate and must be calibrated on the real robot — there is no obstacle/abort stack.
Audio routing: the G1's platform-sound PulseAudio sink is NOT wired to a physical speaker. All dashboard-triggered playback (play_wav, typed-replay audio, record playback) routes through DDS AudioClient.PlayStream via audio_manager._play_pcm_via_g1. The PyAudio path is a desktop/dev fallback.
Arm replay: motion/arm_controller.py:_replay_file_inner() is a port of G1_Lootah/Manual_Recorder/g1_replay_v4_stable.py:Run() — ramp-in → settle hold → playback → smooth return → disable SDK. Body motors (0–14) lock to a live snapshot while arm motors (15–28) follow the file at 60 Hz. _return_home() runs unconditionally after a cancel for a jerk-free return.
Camera frame transport (stdin push): the CameraDaemon lives in the parent and caches frames in memory. GeminiSubprocess base64-encodes the latest frame to the child's stdin (~2 fps); the child's _stdin_watcher relays it to Gemini Live with a staleness guard. Chosen over a file drop so the parent owns the camera once and the dashboard preview reads the same cache.
Motion-state channel: arm_controller._execute() emits motion.action_started / _done / _error on the event bus. main.py forwards each to the child as state:<json>\n, injected to Gemini Live as silent [STATE-START] wave_hand / [STATE-DONE] wave_hand (2.3s) text so it can honestly answer "what are you doing?".
Recognition is Gemini-side: no dlib/insightface/onnxruntime. Galleries are pure file IO; gemini/script.py:_send_gallery_primer() builds one multimodal send_client_content turn — every enrolled face/place's photos + a greeting instruction — and Gemini matches incoming frames against it in-context.

Camera vision on Jetson

The Recognition tab needs pyrealsense2 to talk to the Intel RealSense. Do not pip install pyrealsense2 on JetPack 5 — the PyPI wheel is built against glibc 2.32+ (Ubuntu 22.04) and fails to load on JetPack 5's glibc 2.31 with ImportError: ... version 'GLIBC_2.32' not found.

The native runtime is already there (apt-installed librealsense2). Build just the Python binding from source against it, into the gemini_sdk env:

rs-enumerate-devices            # confirm the D435I shows up at OS level first

source ~/miniconda3/etc/profile.d/conda.sh && conda activate gemini_sdk
pip uninstall -y pyrealsense2   # remove the broken wheel if present
sudo apt install -y cmake build-essential git python3-dev libusb-1.0-0-dev pkg-config libssl-dev

cd /tmp && rm -rf librealsense
git clone --depth=1 --branch v2.56.5 https://github.com/IntelRealSense/librealsense.git
cd librealsense && mkdir -p build && cd build
cmake .. -DBUILD_PYTHON_BINDINGS=ON -DPYTHON_EXECUTABLE=$(which python3) \
         -DBUILD_EXAMPLES=OFF -DBUILD_GRAPHICAL_EXAMPLES=OFF \
         -DBUILD_UNIT_TESTS=OFF -DCHECK_FOR_UPDATES=OFF -DCMAKE_BUILD_TYPE=Release
make -j$(nproc) pyrealsense2
SITE=$(python3 -c "import sysconfig; print(sysconfig.get_paths()['purelib'])")
mkdir -p "$SITE/pyrealsense2"
cp wrappers/python/pyrealsense2*.so "$SITE/pyrealsense2/"
cp ../wrappers/python/pyrealsense2/__init__.py "$SITE/pyrealsense2/" 2>/dev/null || true

python3 -c 'import pyrealsense2 as rs; print([d.get_info(rs.camera_info.name) for d in rs.context().query_devices()])'

Match the --branch tag to the installed runtime (dpkg -l | grep librealsense2). If the build isn't worth it, CameraDaemon falls back to cv2.VideoCapture(0) automatically — fine for a plain USB webcam, but note a RealSense exposes its depth stream at /dev/video0, not RGB, so a real USB cam is the cleaner fallback (or pin SANAD_CAMERA_USB_INDEX). On x86_64 / Ubuntu 22.04+ desktops, pip install pyrealsense2 just works.

Dynamic paths

Every path is derived at runtime — no hard-coded /home/... anywhere. Resolution order for BASE_DIR in config.py:

SANAD_PROJECT_ROOT env var (if set).
PROJECT_BASE + PROJECT_NAME from a .env file in Sanad/ or its parent.
Path(__file__).resolve().parent — auto-detected.

The project runs unchanged from either layout:

dev: <anywhere>/Project/Sanad/
deployed: /home/unitree/Sanad/

Deployment (workstation → robot)

rsync -av --delete \
  --exclude=__pycache__ --exclude=logs --exclude=model --exclude=.git \
  /path/to/Sanad/ \
  unitree@192.168.123.164:/home/unitree/Sanad/

Then on the robot: Ctrl+C the running main.py and re-run.

Security

The dashboard has no authentication. Anyone who can reach http://<robot>:8000 gets full robot control — locomotion, arm, audio, file upload/delete — and, via the Terminal tab, an interactive shell as the dashboard's user. Bind it to a trusted LAN only; add auth before any wider exposure.

Troubleshooting

Symptom	Fix
`No LowState received in 2s — refusing to replay`	`main.py` was re-executed as both `__main__` and `Project.Sanad.main`, creating two arm instances. Fix lives in the `sys.modules` alias near the top of `main.py`. Restart.
`G1ArmActionClient not available — skipping` for SDK actions	Same duplicate-init issue as above.
`No module named 'Project'` in subprocess	Bootstrap preamble in `voice/sanad_voice.py:~30` synthesises the `Project.Sanad` namespace when run as `__main__`.
Controller moves rejected (409)	The Controller is disarmed by default — hit Arm first. Reads + E-STOP are always allowed.
Arm action refused while "movement armed"	Arm ↔ locomotion are mutually exclusive. Disarm/stop locomotion, then trigger the arm.
Voice-driven walking does nothing	"Gemini Movement" toggle off, or E-STOP latched. Toggle on; clear E-STOP. Distances are uncalibrated.
Arm jumps at start of JSONL replay	`SETTLE_HOLD_SEC` (in `config/motion_config.json > arm_controller`) too low — try `0.7` or `1.0`.
Record playback silent	`audio_mgr.play_wav` only routes to G1 DDS if the Unitree SDK is importable; on desktop it falls back to the PulseAudio sink.
Live Voice Commands transcript stuck	Deferred trigger was queued but `trigger_enabled` toggle was off. Toggle on — or the pending-trigger poll fires it automatically once enabled.
Gemini "no audio" on Typed Replay	Non-deterministic; the retry chain in `voice/typed_replay.py:generate_audio` tries three prompt variants. For reliable TTS, use the offline `local_tts` SpeechT5 path.
Local brain exits immediately	`ollama serve` not running / model not pulled, or weights missing under `model/`. Check `logs/local_subprocess.log`. The Gemini brain is the safe default.
Recognition tab: "Camera could not start (no backend)"	No camera backend acquired. Check `rs-enumerate-devices` (RealSense at OS level) and `python3 -c 'import pyrealsense2'` in the `gemini_sdk` env. The glibc `ImportError` means the pip wheel is incompatible — see "Camera vision on Jetson" above.
Camera badge stuck on "reconnecting…"	`CameraDaemon` lost the device and is retrying with exponential backoff. Re-seat the USB 3 cable; check `logs/camera.log` for the USB-2.0 warning.
Gemini doesn't greet an enrolled face	Face Recognition toggle on? Vision on? (Face rec needs frames.) Check `logs/gemini_brain.log` for `face gallery primed: N person(s)`. Hit "Sync Gallery" to force a re-prime.
Gemini unaware of motion state	The `motion.action_*` → `send_state` chain only runs when Live Gemini is up. Check `logs/gemini_subprocess.log` and `logs/gemini_brain.log` for `STATE injected:` lines.

License / attribution

Internal project for YS Lootah Technology. Reuses/ports patterns from:

G1_Lootah/Manual_Recorder/g1_replay_v4_stable.py (arm replay math)
SanadVoice/gemini_interact (arm-phrase dispatch, skill registry)
SanadVoice/gemini_voice_v2 (local SpeechT5 TTS)
Project/Marcus — camera→Gemini stdin-push transport, motion-state injection, camera daemon resilience (auto-reconnect, USB-2.0 warning), the API/camera_api.py cache shape (get_frame_b64 / get_fresh_frame), and the confirmation-phrase → locomotion pattern (movement_dispatch).
Unitree unitree_sdk2py (G1 low-level SDK, LocoClient, G1ArmActionClient, AudioClient.PlayStream).

26 KiB Raw Permalink Blame History Unescape Escape