26 KiB
Sanadv3
Voice + motion assistant for the Unitree G1 humanoid. Gemini Live (or a
fully-offline pipeline) handles bilingual Arabic/English conversation; an arm
controller plays built-in SDK poses and recorded JSONL macros; a locomotion
controller walks/turns the robot; an optional camera feeds Gemini-side face &
place recognition; everything is orchestrated through a fault-isolated
FastAPI dashboard on http://<robot>:8000.
┌──────────────────────────────────────────────────────────────────────┐
│ Dashboard (FastAPI) ── http://<robot>:8000 │
│ ├─ Operations Quick-fire arm actions + gestural-speaking │
│ ├─ Voice & Audio Live Gemini, Typed Replay, Wake Phrases, Audio │
│ ├─ Motion & Replay SDK actions, JSONL replays, macros, teaching │
│ ├─ Controller Locomotion teleop, postures, FSM modes, E-STOP │
│ ├─ Recognition Camera vision + face gallery + zones/places │
│ ├─ Recordings Skill registry, saved Gemini turns │
│ ├─ Temperature Live 3D motor-temperature heatmap (three.js) │
│ ├─ Terminal In-browser shell (PTY) to the robot │
│ └─ Settings & Logs System info, tail/stream live logs │
└──────────────────────────────────────────────────────────────────────┘
│
├─ voice/sanad_voice.py (subprocess — model-agnostic voice loop)
│ ├─ gemini/script.py (Gemini Live brain — audio+video+state)
│ └─ local/script.py (offline brain — VAD→STT→LLM→TTS)
├─ gemini/client.py (short-session client for Typed Replay)
├─ gemini/subprocess.py (spawns+supervises sanad_voice.py;
│ pushes camera frames + motion state
│ to the child over its stdin)
├─ voice/movement_dispatch.py(Gemini spoken phrase → locomotion)
├─ vision/camera.py (RealSense/USB capture daemon)
├─ vision/face_gallery.py (data/faces/ CRUD for the primer turn)
├─ vision/zone_gallery.py (data/zones/ places + "go here" targets)
├─ motion/arm_controller.py (G1 arm DDS publisher — owns DDS init)
├─ G1_Controller/loco_controller.py (G1 locomotion via LocoClient)
├─ voice/audio_io.py (mic + speaker abstraction — 3 profiles)
└─ core/brain.py (skill dispatcher, event bus)
Camera + face/place recognition data flow
CameraDaemon (parent, in-memory JPEG+b64 cache)
├─→ dashboard /api/recognition/frame.jpg ── snapshot_jpeg()
└─→ GeminiSubprocess._frame_forwarder ── get_frame_b64()
│ "frame:<b64>\n" over stdin
ArmController ─emit→ event bus ─→ main.py ─→ live_sub.send_state()
│ "state:<json>\n" over stdin
▼
gemini/script.py _stdin_watcher thread
├─ frame: → _LATEST_FRAME → _send_frame_loop →
│ session.send_realtime_input(video=Blob)
└─ state: → _STATE_PENDING → _send_state_loop →
session.send_realtime_input(text=…)
Recognition toggles (vision / face-rec / zone-rec / movement) are written by the
dashboard to data/.recognition_state.json and POLLED by the Gemini child at 1 Hz
— so flipping a toggle takes effect mid-session with NO restart.
Quick start (on the robot)
conda activate gemini_sdk
cd ~/Sanad
python3 main.py
Then open http://<robot-ip>:8000 in a browser. (The dashboard binds to the
wlan0 IP by default — see Runtime selection to override.)
Fully-offline brain (no cloud): SANAD_VOICE_BRAIN=local python3 main.py
(requires ollama serve + the local model env — see Voice brains).
Gemini API key — required, none ships with the repo. The
api_keyfields inconfig/core_config.json(gemini_defaults) anddata/motions/config.json(gemini) are intentionally empty (""). The voice loop cannot connect until you supply one, by any of:
- Dashboard → Voice & Audio → Gemini API Key — paste + save, hot-swaps live (no restart). Persists to
data/motions/config.json.- Env var —
export SANAD_GEMINI_API_KEY=AIza...beforepython3 main.py.- Config file — set
gemini_defaults.api_keyinconfig/core_config.json.Precedence (highest first):
data/motions/config.json→SANAD_GEMINI_API_KEY→config/core_config.json. Get a key at https://aistudio.google.com/apikey.
Dashboard features
Operations
Quick-fire SDK + JSONL arm actions (chip buttons), gestural-speaking toggle.
Voice & Audio
- Live Voice Commands — fire arm gestures from the user's transcript (wake-phrase → arm action). Master gate + Deferred-trigger toggle.
- Live Gemini Process — start/stop the voice conversation subprocess, tail
its log. Choose the Gemini cloud brain or the offline brain via
SANAD_VOICE_BRAIN. - Typed Replay — Gemini reads typed text aloud (wrapped with a "repeat verbatim" prompt); optionally records the clip.
- Gemini API Key — hot-swap the key without restart.
- Wake Phrase Manager — add/remove phrase → action bindings.
- Audio Controls — mic/speaker mute, G1 chest-speaker volume (DDS), device profile selection, PulseAudio soft-reset and Anker USB hard-reset.
Motion & Replay
- Motion Control — list SDK (built-in) + JSONL (recorded) actions, select +
play. Cancel smoothly returns to
arm_home.jsonl. - Replay Manager — upload
.jsonlfiles, test-play with speed, Teaching Mode (kinesthetic record — limp the arm and hand-guide it). - Macro Recorder — record a new audio+motion pair, OR pick any WAV + any motion (SDK or JSONL) and play them in parallel.
Controller (locomotion)
Manual teleoperation of the G1's legs via the Unitree LocoClient.
Disarmed every boot; all motion writes require Arm first.
- Move / Step — continuous teleop (vx/vy/vyaw) or discrete one-shot steps.
- Postures & FSM modes — zero-torque, damp, squat, sit, stand, balance, stand-height; prep/ready sequences; MotionSwitcher select-AI/release.
- Gemini Movement — toggle voice-driven walking: the
MovementDispatcherparses Gemini's own spoken confirmation phrases ("Turning right." / "أستدير يميناً.") and drives the legs (gated on this toggle + an E-STOP latch). - E-STOP — always available;
StopMove+ disarm + latch the dispatcher.
Safety: the arm and locomotion are mutually exclusive —
arm.set_motion_block(loco.movement_active)makes every arm replay/gesture refuse while the robot is (or just was, within ~1.5 s) walking.
Recognition
Camera vision + Gemini-side face and zone/place recognition. All are off by default; each is a hot toggle (≈1 s to take effect, no restart).
- Camera Vision —
CameraDaemoncaptures from a RealSense (preferred) or USB camera; the supervisor streams JPEG frames to Gemini Live so it can answer "what do you see?". Live preview panel. Auto-reconnects on USB unplug/stall and warns if a RealSense negotiated USB 2.0 (Marcus-ported resilience). - Face Recognition — manage
data/faces/face_{id}/galleries: enroll from the live camera or upload photos, rename, describe, download (per-photo or ZIP), delete. On session start (and on any gallery change) the child sends a primer turn carrying every enrolled face + a Khaleeji greeting instruction — Gemini matches in-context, so there is no local face-recognition model. Recognition needs vision on. - Zones & Places —
data/zones/zone_{zid}/place_{pid}/two-level gallery: reference photos per place, optional linked face_ids, and a "go here" nav target (nav_target_zone/place_idin the recognition-state file) for place-aware navigation. - Sync Gallery — force-resend the face/zone primer to the live session.
Recordings
Skill Registry (predefined audio+motion+callback skills from skills.json) +
Saved Records (captured Gemini turn recordings; play/pause/stop/rename/delete).
Temperature
Live 3D motor-temperature heatmap — a standalone three.js viewer
(dashboard/static/temp3d/) loads the G1 29-DoF URDF + STL meshes and colors
each joint blue→red from the arm controller's throttled rt/lowstate snapshot,
streamed over /ws/motor-temps at ~8 fps. No second DDS subscriber.
Terminal
In-browser PTY shell to the robot (/ws/terminal, xterm.js) — a bash -i
as the dashboard's user, with resize + backpressure, bounded to 4 sessions.
(See Security — this is full shell access to whoever reaches the URL.)
Settings & Logs
System info (host, network interfaces, DDS interface, bound dashboard host/port,
per-subsystem status, audio devices), live log stream (/ws/logs), per-file
tail, snapshot, and a one-blob "Copy All Logs" bundle.
Directory layout
| Path | Contents |
|---|---|
main.py |
Entry point — fault-isolated boot of all subsystems + the dashboard. Doubles as the service container (route handlers import its module globals). |
config.py |
Runtime constants + layout-agnostic path resolution; layers data/motions/config.json over the JSON config at import. |
config/ |
Per-subsystem JSON: core, voice, gemini, local, motion, dashboard. |
core/ |
brain.py (skill dispatcher), event_bus.py, skill_registry.py, config_loader.py, logger.py (rotating + WS push), asyncio_compat.py (3.8 to_thread shim). |
gemini/ |
Gemini Live — client.py (one-shot), script.py (live brain: audio + video + motion-state), subprocess.py (supervisor + stdin frame/state push). |
local/ |
Fully-offline brain — vad.py (Silero), stt.py (faster-whisper), llm.py (Qwen via Ollama/llama.cpp), tts.py (CosyVoice2), script.py (the brain), subprocess.py (supervisor). Opt-in via SANAD_VOICE_BRAIN=local. |
voice/ |
sanad_voice.py (subprocess entry, model-agnostic), audio_io.py / audio_manager.py / audio_devices.py (mic/speaker), local_tts.py (SpeechT5 Arabic TTS), live_voice_loop.py (user-transcript → arm gesture), movement_dispatch.py (Gemini-phrase → locomotion), typed_replay.py, wake_phrase_manager.py, text_utils.py (Arabic normalization + phrase matching), model_script.py / model_subprocess.py (brain templates). |
motion/ |
arm_controller.py (production 5-phase JSONL replay engine, owns the single DDS init), macro_player.py, macro_recorder.py, teaching.py. (sanad_arm_controller.py is a legacy alternate — not wired by main.py.) |
G1_Controller/ |
loco_controller.py — locomotion via Unitree LocoClient (move/step/postures/FSM/E-STOP); reuses the arm's DDS participant. |
vision/ |
camera.py (RealSense/USB daemon, auto-reconnect), face_gallery.py, zone_gallery.py, recognition_state.py (atomic-JSON toggle IPC). |
dashboard/ |
app.py (FastAPI factory + fault-isolated router registration), routes/*.py (20 REST routers), websockets/*.py (logs, motor-temps, terminal), static/index.html (single-page UI), static/temp3d/ (3D viewer). |
scripts/ |
Persona files — sanad_script.txt (voice persona "Bousandah"), sanad_rule.txt, sanad_arm.txt (voice→arm phrases). |
data/ |
Runtime state — motions/*.jsonl (arm trajectories) + instruction.json (locomotion phrase map) + skills.json + config.json (dashboard-editable), recordings/ (captured turns + macros), faces/face_{id}/ + zones/zone_{zid}/place_{pid}/ (galleries), audio/ (typed-replay WAVs + records index), .recognition_state.json (toggle IPC). |
model/ |
Local SpeechT5 / Whisper / CosyVoice2 weights when using the offline pipeline. |
logs/ |
Per-module rotating logs. |
Voice brains
The child voice/sanad_voice.py is model-agnostic and selects a brain via
SANAD_VOICE_BRAIN. Every brain implements the same contract
(__init__(audio_io, recorder, voice, system_prompt), async run(), stop())
and ships a sibling supervisor that spawns the child and parses its
USER: / BOT: / state log markers.
| Value | Brain | Pipeline |
|---|---|---|
gemini (default) |
gemini/script.py |
Gemini Live native-audio (full-duplex speech-to-speech, server-side VAD, vision frames, face/zone primers, voice→movement). Cloud. |
local |
local/script.py |
Silero VAD → faster-whisper (large-v3-turbo, CUDA int8) → Qwen2.5 (Ollama/llama.cpp) → CosyVoice2 streaming TTS. Fully on-device. |
model |
voice/model_script.py |
Template/stub for adding a new provider (OpenAI Realtime, Claude Voice, …). |
To add a brain: drop a file in voice/ or a new <brand>/ folder and add a
branch to voice/sanad_voice.py:_build_brain(); ship a supervisor modeled on
voice/model_subprocess.py.
Runtime selection (env vars)
| Var | Values | Default | Effect |
|---|---|---|---|
SANAD_VOICE_BRAIN |
gemini, local, model |
gemini |
Which brain the subprocess loads (see voice/sanad_voice.py:_build_brain). |
SANAD_AUDIO_PROFILE |
builtin, anker, hollyland_builtin |
builtin |
Mic + speaker pair. builtin = G1 UDP mic + G1 chest speaker via DDS. |
SANAD_DDS_INTERFACE |
network iface | eth0 |
DDS network for G1 low-level comms (arm + locomotion + speaker). |
SANAD_DASHBOARD_HOST / _INTERFACE |
IP / iface | wlan0 IP |
Dashboard bind address. |
SANAD_GEMINI_API_KEY |
string | "" (empty) |
Gemini API key. No key ships in the repo — set this, paste one in the dashboard (Voice & Audio → Gemini API Key), or fill gemini_defaults.api_key in config/core_config.json. See Quick start. |
SANAD_GEMINI_MODEL / _VOICE |
string | reads config | Override the Gemini model id / prebuilt voice. |
SANAD_G1_VOLUME |
0–100 |
100 |
G1 chest-speaker volume; also scales the barge-in threshold. |
SANAD_LIVE_SCRIPT |
path | auto | Override the subprocess entry script path. |
SANAD_RECORD |
0 or 1 |
1 |
Record every Gemini turn to data/recordings/. |
SANAD_AEC_ENABLE |
0 or 1 |
1 |
Enable WebRTC AEC3 (if the Python binding is installed). |
SANAD_VISION_ENABLE |
0 or 1 |
0 |
Boot default for camera vision. Runtime truth is the Recognition-tab toggle → data/.recognition_state.json, hot-applied without a restart. |
SANAD_FACE_RECOGNITION_ENABLE |
0 or 1 |
0 |
Boot default for Gemini-side face recognition. Also a hot toggle. |
SANAD_VISION_SEND_HZ |
float | 2 |
Frames/sec the Gemini child relays to Live. |
SANAD_CAMERA_WIDTH / _HEIGHT / _FPS |
int | 424 / 240 / 15 |
Capture profile. Also settable per-deploy in config/core_config.json > camera. |
SANAD_CAMERA_USB_INDEX |
int | auto | Pin a /dev/videoN node (avoids picking a RealSense IR stream). |
SANAD_FACES_MAX_SAMPLES |
int | 3 |
Max photos per person fed into the gallery primer turn (token budget). |
SANAD_PROJECT_ROOT |
path | auto | Override the project root (see Dynamic paths). |
All
SANAD_VISION_*/SANAD_CAMERA_*/SANAD_FACE_*vars are boot defaults forwarded to the Gemini child viaLIVE_TUNE. Once running, the Recognition tab's toggles (vision / face-rec / zone-rec / movement) are the live source of truth indata/.recognition_state.json, polled at 1 Hz.
CLI flags: python3 main.py --host <ip> --port 8000 --network <dds_iface>;
--check-env prints a subsystem/environment diagnostic and exits.
API surface
All routes are registered defensively — a router whose import fails is recorded
(GET /api/_dashboard_status) and the server still boots without it.
REST (prefix → controls): /api health · /api/system info ·
/api/voice Gemini/local generate+connect+key · /api/motion arm actions ·
/api/skills skill registry · /api/macros record/play · /api/replay JSONL
CRUD + teaching · /api/audio mute/volume/devices/reset · /api/scripts
persona files · /api/records saved WAVs · /api/prompt system prompt ·
/api/wake-phrases bindings · /api/live-voice arm-phrase dispatcher ·
/api/live-subprocess Gemini child · /api/typed-replay TTS · /api/recognition
vision + face gallery · /api/zones zones/places + nav target · /api/temp
motor map + snapshot · /api/controller locomotion (move/step/postures/modes/
E-STOP).
WebSockets: /ws/logs (live log stream + 500-line replay) ·
/ws/motor-temps (3D heatmap data, ~8 fps) · /ws/terminal (PTY shell).
Architecture notes
- Subprocess isolation:
voice/sanad_voice.pyruns as a child ofmain.pyvia the supervisor. If the voice loop crashes, the dashboard + arm + legs stay up. - Single DDS init:
motion/arm_controller.pyowns the oneChannelFactoryInitialize;LocoControllerand the audio routes reuse that participant rather than re-initializing. - Brain contract: see
voice/model_script.py— any new model implements__init__(audio_io, recorder, voice, system_prompt),async run(),stop(). - Supervisor contract: each brain ships a sibling supervisor (e.g.
gemini/subprocess.py) that spawnssanad_voice.pywith itsSANAD_VOICE_BRAINand parses the brain's log markers. Template:voice/model_subprocess.py. - Locomotion safety:
LocoControlleris disarmed every boot, has velocity caps + aStopMovewatchdog, and is mutually exclusive with the arm. Voice-driven movement is off by default and gated by the Controller toggle. Distances/degrees indata/motions/instruction.jsonare approximate and must be calibrated on the real robot — there is no obstacle/abort stack. - Audio routing: the G1's platform-sound PulseAudio sink is NOT wired to a
physical speaker. All dashboard-triggered playback (
play_wav, typed-replay audio, record playback) routes through DDSAudioClient.PlayStreamviaaudio_manager._play_pcm_via_g1. The PyAudio path is a desktop/dev fallback. - Arm replay:
motion/arm_controller.py:_replay_file_inner()is a port ofG1_Lootah/Manual_Recorder/g1_replay_v4_stable.py:Run()— ramp-in → settle hold → playback → smooth return → disable SDK. Body motors (0–14) lock to a live snapshot while arm motors (15–28) follow the file at 60 Hz._return_home()runs unconditionally after a cancel for a jerk-free return. - Camera frame transport (stdin push): the
CameraDaemonlives in the parent and caches frames in memory.GeminiSubprocessbase64-encodes the latest frame to the child's stdin (~2 fps); the child's_stdin_watcherrelays it to Gemini Live with a staleness guard. Chosen over a file drop so the parent owns the camera once and the dashboard preview reads the same cache. - Motion-state channel:
arm_controller._execute()emitsmotion.action_started/_done/_erroron the event bus.main.pyforwards each to the child asstate:<json>\n, injected to Gemini Live as silent[STATE-START] wave_hand/[STATE-DONE] wave_hand (2.3s)text so it can honestly answer "what are you doing?". - Recognition is Gemini-side: no dlib/insightface/onnxruntime. Galleries are
pure file IO;
gemini/script.py:_send_gallery_primer()builds one multimodalsend_client_contentturn — every enrolled face/place's photos + a greeting instruction — and Gemini matches incoming frames against it in-context.
Camera vision on Jetson
The Recognition tab needs pyrealsense2 to talk to the Intel RealSense.
Do not pip install pyrealsense2 on JetPack 5 — the PyPI wheel is built
against glibc 2.32+ (Ubuntu 22.04) and fails to load on JetPack 5's glibc
2.31 with ImportError: ... version 'GLIBC_2.32' not found.
The native runtime is already there (apt-installed librealsense2). Build
just the Python binding from source against it, into the gemini_sdk env:
rs-enumerate-devices # confirm the D435I shows up at OS level first
source ~/miniconda3/etc/profile.d/conda.sh && conda activate gemini_sdk
pip uninstall -y pyrealsense2 # remove the broken wheel if present
sudo apt install -y cmake build-essential git python3-dev libusb-1.0-0-dev pkg-config libssl-dev
cd /tmp && rm -rf librealsense
git clone --depth=1 --branch v2.56.5 https://github.com/IntelRealSense/librealsense.git
cd librealsense && mkdir -p build && cd build
cmake .. -DBUILD_PYTHON_BINDINGS=ON -DPYTHON_EXECUTABLE=$(which python3) \
-DBUILD_EXAMPLES=OFF -DBUILD_GRAPHICAL_EXAMPLES=OFF \
-DBUILD_UNIT_TESTS=OFF -DCHECK_FOR_UPDATES=OFF -DCMAKE_BUILD_TYPE=Release
make -j$(nproc) pyrealsense2
SITE=$(python3 -c "import sysconfig; print(sysconfig.get_paths()['purelib'])")
mkdir -p "$SITE/pyrealsense2"
cp wrappers/python/pyrealsense2*.so "$SITE/pyrealsense2/"
cp ../wrappers/python/pyrealsense2/__init__.py "$SITE/pyrealsense2/" 2>/dev/null || true
python3 -c 'import pyrealsense2 as rs; print([d.get_info(rs.camera_info.name) for d in rs.context().query_devices()])'
Match the --branch tag to the installed runtime (dpkg -l | grep librealsense2).
If the build isn't worth it, CameraDaemon falls back to cv2.VideoCapture(0)
automatically — fine for a plain USB webcam, but note a RealSense exposes its
depth stream at /dev/video0, not RGB, so a real USB cam is the cleaner
fallback (or pin SANAD_CAMERA_USB_INDEX). On x86_64 / Ubuntu 22.04+ desktops,
pip install pyrealsense2 just works.
Dynamic paths
Every path is derived at runtime — no hard-coded /home/... anywhere.
Resolution order for BASE_DIR in config.py:
SANAD_PROJECT_ROOTenv var (if set).PROJECT_BASE + PROJECT_NAMEfrom a.envfile inSanad/or its parent.Path(__file__).resolve().parent— auto-detected.
The project runs unchanged from either layout:
- dev:
<anywhere>/Project/Sanad/ - deployed:
/home/unitree/Sanad/
Deployment (workstation → robot)
rsync -av --delete \
--exclude=__pycache__ --exclude=logs --exclude=model --exclude=.git \
/path/to/Sanad/ \
unitree@192.168.123.164:/home/unitree/Sanad/
Then on the robot: Ctrl+C the running main.py and re-run.
Security
The dashboard has no authentication. Anyone who can reach
http://<robot>:8000 gets full robot control — locomotion, arm, audio, file
upload/delete — and, via the Terminal tab, an interactive shell as the
dashboard's user. Bind it to a trusted LAN only; add auth before any wider
exposure.
Troubleshooting
| Symptom | Fix |
|---|---|
No LowState received in 2s — refusing to replay |
main.py was re-executed as both __main__ and Project.Sanad.main, creating two arm instances. Fix lives in the sys.modules alias near the top of main.py. Restart. |
G1ArmActionClient not available — skipping for SDK actions |
Same duplicate-init issue as above. |
No module named 'Project' in subprocess |
Bootstrap preamble in voice/sanad_voice.py:~30 synthesises the Project.Sanad namespace when run as __main__. |
| Controller moves rejected (409) | The Controller is disarmed by default — hit Arm first. Reads + E-STOP are always allowed. |
| Arm action refused while "movement armed" | Arm ↔ locomotion are mutually exclusive. Disarm/stop locomotion, then trigger the arm. |
| Voice-driven walking does nothing | "Gemini Movement" toggle off, or E-STOP latched. Toggle on; clear E-STOP. Distances are uncalibrated. |
| Arm jumps at start of JSONL replay | SETTLE_HOLD_SEC (in config/motion_config.json > arm_controller) too low — try 0.7 or 1.0. |
| Record playback silent | audio_mgr.play_wav only routes to G1 DDS if the Unitree SDK is importable; on desktop it falls back to the PulseAudio sink. |
| Live Voice Commands transcript stuck | Deferred trigger was queued but trigger_enabled toggle was off. Toggle on — or the pending-trigger poll fires it automatically once enabled. |
| Gemini "no audio" on Typed Replay | Non-deterministic; the retry chain in voice/typed_replay.py:generate_audio tries three prompt variants. For reliable TTS, use the offline local_tts SpeechT5 path. |
| Local brain exits immediately | ollama serve not running / model not pulled, or weights missing under model/. Check logs/local_subprocess.log. The Gemini brain is the safe default. |
| Recognition tab: "Camera could not start (no backend)" | No camera backend acquired. Check rs-enumerate-devices (RealSense at OS level) and python3 -c 'import pyrealsense2' in the gemini_sdk env. The glibc ImportError means the pip wheel is incompatible — see "Camera vision on Jetson" above. |
| Camera badge stuck on "reconnecting…" | CameraDaemon lost the device and is retrying with exponential backoff. Re-seat the USB 3 cable; check logs/camera.log for the USB-2.0 warning. |
| Gemini doesn't greet an enrolled face | Face Recognition toggle on? Vision on? (Face rec needs frames.) Check logs/gemini_brain.log for face gallery primed: N person(s). Hit "Sync Gallery" to force a re-prime. |
| Gemini unaware of motion state | The motion.action_* → send_state chain only runs when Live Gemini is up. Check logs/gemini_subprocess.log and logs/gemini_brain.log for STATE injected: lines. |
License / attribution
Internal project for YS Lootah Technology. Reuses/ports patterns from:
G1_Lootah/Manual_Recorder/g1_replay_v4_stable.py(arm replay math)SanadVoice/gemini_interact(arm-phrase dispatch, skill registry)SanadVoice/gemini_voice_v2(local SpeechT5 TTS)Project/Marcus— camera→Gemini stdin-push transport, motion-state injection, camera daemon resilience (auto-reconnect, USB-2.0 warning), theAPI/camera_api.pycache shape (get_frame_b64/get_fresh_frame), and the confirmation-phrase → locomotion pattern (movement_dispatch).- Unitree
unitree_sdk2py(G1 low-level SDK,LocoClient,G1ArmActionClient,AudioClient.PlayStream).