413 lines
26 KiB
Markdown
413 lines
26 KiB
Markdown
# Sanadv3
|
||
|
||
Voice + motion assistant for the Unitree G1 humanoid. **Gemini Live** (or a
|
||
fully-offline pipeline) handles bilingual Arabic/English conversation; an arm
|
||
controller plays built-in SDK poses and recorded JSONL macros; a locomotion
|
||
controller walks/turns the robot; an optional camera feeds **Gemini-side face &
|
||
place recognition**; everything is orchestrated through a fault-isolated
|
||
**FastAPI dashboard** on `http://<robot>:8000`.
|
||
|
||
```
|
||
┌──────────────────────────────────────────────────────────────────────┐
|
||
│ Dashboard (FastAPI) ── http://<robot>:8000 │
|
||
│ ├─ Operations Quick-fire arm actions + gestural-speaking │
|
||
│ ├─ Voice & Audio Live Gemini, Typed Replay, Wake Phrases, Audio │
|
||
│ ├─ Motion & Replay SDK actions, JSONL replays, macros, teaching │
|
||
│ ├─ Controller Locomotion teleop, postures, FSM modes, E-STOP │
|
||
│ ├─ Recognition Camera vision + face gallery + zones/places │
|
||
│ ├─ Recordings Skill registry, saved Gemini turns │
|
||
│ ├─ Temperature Live 3D motor-temperature heatmap (three.js) │
|
||
│ ├─ Terminal In-browser shell (PTY) to the robot │
|
||
│ └─ Settings & Logs System info, tail/stream live logs │
|
||
└──────────────────────────────────────────────────────────────────────┘
|
||
│
|
||
├─ voice/sanad_voice.py (subprocess — model-agnostic voice loop)
|
||
│ ├─ gemini/script.py (Gemini Live brain — audio+video+state)
|
||
│ └─ local/script.py (offline brain — VAD→STT→LLM→TTS)
|
||
├─ gemini/client.py (short-session client for Typed Replay)
|
||
├─ gemini/subprocess.py (spawns+supervises sanad_voice.py;
|
||
│ pushes camera frames + motion state
|
||
│ to the child over its stdin)
|
||
├─ voice/movement_dispatch.py(Gemini spoken phrase → locomotion)
|
||
├─ vision/camera.py (RealSense/USB capture daemon)
|
||
├─ vision/face_gallery.py (data/faces/ CRUD for the primer turn)
|
||
├─ vision/zone_gallery.py (data/zones/ places + "go here" targets)
|
||
├─ motion/arm_controller.py (G1 arm DDS publisher — owns DDS init)
|
||
├─ G1_Controller/loco_controller.py (G1 locomotion via LocoClient)
|
||
├─ voice/audio_io.py (mic + speaker abstraction — 3 profiles)
|
||
└─ core/brain.py (skill dispatcher, event bus)
|
||
```
|
||
|
||
### Camera + face/place recognition data flow
|
||
|
||
```
|
||
CameraDaemon (parent, in-memory JPEG+b64 cache)
|
||
├─→ dashboard /api/recognition/frame.jpg ── snapshot_jpeg()
|
||
└─→ GeminiSubprocess._frame_forwarder ── get_frame_b64()
|
||
│ "frame:<b64>\n" over stdin
|
||
ArmController ─emit→ event bus ─→ main.py ─→ live_sub.send_state()
|
||
│ "state:<json>\n" over stdin
|
||
▼
|
||
gemini/script.py _stdin_watcher thread
|
||
├─ frame: → _LATEST_FRAME → _send_frame_loop →
|
||
│ session.send_realtime_input(video=Blob)
|
||
└─ state: → _STATE_PENDING → _send_state_loop →
|
||
session.send_realtime_input(text=…)
|
||
|
||
Recognition toggles (vision / face-rec / zone-rec / movement) are written by the
|
||
dashboard to data/.recognition_state.json and POLLED by the Gemini child at 1 Hz
|
||
— so flipping a toggle takes effect mid-session with NO restart.
|
||
```
|
||
|
||
|
||
## Quick start (on the robot)
|
||
|
||
```bash
|
||
conda activate gemini_sdk
|
||
cd ~/Sanad
|
||
python3 main.py
|
||
```
|
||
|
||
Then open `http://<robot-ip>:8000` in a browser. (The dashboard binds to the
|
||
`wlan0` IP by default — see *Runtime selection* to override.)
|
||
|
||
Fully-offline brain (no cloud): `SANAD_VOICE_BRAIN=local python3 main.py`
|
||
(requires `ollama serve` + the local model env — see *Voice brains*).
|
||
|
||
> **Gemini API key — required, none ships with the repo.** The `api_key`
|
||
> fields in `config/core_config.json` (`gemini_defaults`) and
|
||
> `data/motions/config.json` (`gemini`) are intentionally empty (`""`).
|
||
> The voice loop cannot connect until you supply one, by any of:
|
||
> - **Dashboard** → *Voice & Audio → Gemini API Key* — paste + save, hot-swaps live (no restart). Persists to `data/motions/config.json`.
|
||
> - **Env var** — `export SANAD_GEMINI_API_KEY=AIza...` before `python3 main.py`.
|
||
> - **Config file** — set `gemini_defaults.api_key` in `config/core_config.json`.
|
||
>
|
||
> Precedence (highest first): `data/motions/config.json` → `SANAD_GEMINI_API_KEY` → `config/core_config.json`. Get a key at <https://aistudio.google.com/apikey>.
|
||
|
||
|
||
## Dashboard features
|
||
|
||
### Operations
|
||
Quick-fire SDK + JSONL arm actions (chip buttons), gestural-speaking toggle.
|
||
|
||
### Voice & Audio
|
||
- **Live Voice Commands** — fire arm gestures from the *user's* transcript
|
||
(wake-phrase → arm action). Master gate + Deferred-trigger toggle.
|
||
- **Live Gemini Process** — start/stop the voice conversation subprocess, tail
|
||
its log. Choose the Gemini cloud brain or the offline brain via
|
||
`SANAD_VOICE_BRAIN`.
|
||
- **Typed Replay** — Gemini reads typed text aloud (wrapped with a
|
||
"repeat verbatim" prompt); optionally records the clip.
|
||
- **Gemini API Key** — hot-swap the key without restart.
|
||
- **Wake Phrase Manager** — add/remove phrase → action bindings.
|
||
- **Audio Controls** — mic/speaker mute, G1 chest-speaker volume (DDS), device
|
||
profile selection, PulseAudio soft-reset and Anker USB hard-reset.
|
||
|
||
### Motion & Replay
|
||
- **Motion Control** — list SDK (built-in) + JSONL (recorded) actions, select +
|
||
play. Cancel smoothly returns to `arm_home.jsonl`.
|
||
- **Replay Manager** — upload `.jsonl` files, test-play with speed, Teaching
|
||
Mode (kinesthetic record — limp the arm and hand-guide it).
|
||
- **Macro Recorder** — record a new audio+motion pair, OR pick any WAV + any
|
||
motion (SDK or JSONL) and play them in parallel.
|
||
|
||
### Controller *(locomotion)*
|
||
Manual teleoperation of the G1's **legs** via the Unitree `LocoClient`.
|
||
**Disarmed every boot**; all motion writes require Arm first.
|
||
- **Move / Step** — continuous teleop (vx/vy/vyaw) or discrete one-shot steps.
|
||
- **Postures & FSM modes** — zero-torque, damp, squat, sit, stand, balance,
|
||
stand-height; prep/ready sequences; MotionSwitcher select-AI/release.
|
||
- **Gemini Movement** — toggle voice-driven walking: the `MovementDispatcher`
|
||
parses Gemini's *own spoken confirmation phrases* ("Turning right." /
|
||
"أستدير يميناً.") and drives the legs (gated on this toggle + an E-STOP latch).
|
||
- **E-STOP** — always available; `StopMove` + disarm + latch the dispatcher.
|
||
|
||
> **Safety:** the arm and locomotion are **mutually exclusive** —
|
||
> `arm.set_motion_block(loco.movement_active)` makes every arm
|
||
> replay/gesture refuse while the robot is (or just was, within ~1.5 s) walking.
|
||
|
||
### Recognition
|
||
Camera vision + Gemini-side **face** and **zone/place** recognition. All are
|
||
**off by default**; each is a **hot toggle** (≈1 s to take effect, no restart).
|
||
- **Camera Vision** — `CameraDaemon` captures from a RealSense (preferred) or
|
||
USB camera; the supervisor streams JPEG frames to Gemini Live so it can answer
|
||
"what do you see?". Live preview panel. Auto-reconnects on USB unplug/stall
|
||
and warns if a RealSense negotiated USB 2.0 (Marcus-ported resilience).
|
||
- **Face Recognition** — manage `data/faces/face_{id}/` galleries: enroll from
|
||
the live camera or upload photos, rename, describe, download (per-photo or
|
||
ZIP), delete. On session start (and on any gallery change) the child sends a
|
||
**primer turn** carrying every enrolled face + a Khaleeji greeting
|
||
instruction — **Gemini matches in-context, so there is no local
|
||
face-recognition model**. Recognition needs vision on.
|
||
- **Zones & Places** — `data/zones/zone_{zid}/place_{pid}/` two-level gallery:
|
||
reference photos per place, optional linked face_ids, and a **"go here"** nav
|
||
target (`nav_target_zone/place_id` in the recognition-state file) for
|
||
place-aware navigation.
|
||
- **Sync Gallery** — force-resend the face/zone primer to the live session.
|
||
|
||
### Recordings
|
||
Skill Registry (predefined audio+motion+callback skills from `skills.json`) +
|
||
Saved Records (captured Gemini turn recordings; play/pause/stop/rename/delete).
|
||
|
||
### Temperature
|
||
Live **3D motor-temperature heatmap** — a standalone three.js viewer
|
||
(`dashboard/static/temp3d/`) loads the G1 29-DoF URDF + STL meshes and colors
|
||
each joint blue→red from the arm controller's throttled `rt/lowstate` snapshot,
|
||
streamed over `/ws/motor-temps` at ~8 fps. No second DDS subscriber.
|
||
|
||
### Terminal
|
||
In-browser **PTY shell** to the robot (`/ws/terminal`, xterm.js) — a `bash -i`
|
||
as the dashboard's user, with resize + backpressure, bounded to 4 sessions.
|
||
(See *Security* — this is full shell access to whoever reaches the URL.)
|
||
|
||
### Settings & Logs
|
||
System info (host, network interfaces, DDS interface, bound dashboard host/port,
|
||
per-subsystem status, audio devices), live log stream (`/ws/logs`), per-file
|
||
tail, snapshot, and a one-blob "Copy All Logs" bundle.
|
||
|
||
|
||
## Directory layout
|
||
|
||
| Path | Contents |
|
||
|---|---|
|
||
| `main.py` | Entry point — fault-isolated boot of all subsystems + the dashboard. Doubles as the service container (route handlers `import` its module globals). |
|
||
| `config.py` | Runtime constants + layout-agnostic path resolution; layers `data/motions/config.json` over the JSON config at import. |
|
||
| `config/` | Per-subsystem JSON: `core`, `voice`, `gemini`, `local`, `motion`, `dashboard`. |
|
||
| `core/` | `brain.py` (skill dispatcher), `event_bus.py`, `skill_registry.py`, `config_loader.py`, `logger.py` (rotating + WS push), `asyncio_compat.py` (3.8 `to_thread` shim). |
|
||
| `gemini/` | Gemini Live — `client.py` (one-shot), `script.py` (live brain: audio + video + motion-state), `subprocess.py` (supervisor + stdin frame/state push). |
|
||
| `local/` | Fully-offline brain — `vad.py` (Silero), `stt.py` (faster-whisper), `llm.py` (Qwen via Ollama/llama.cpp), `tts.py` (CosyVoice2), `script.py` (the brain), `subprocess.py` (supervisor). Opt-in via `SANAD_VOICE_BRAIN=local`. |
|
||
| `voice/` | `sanad_voice.py` (subprocess entry, model-agnostic), `audio_io.py` / `audio_manager.py` / `audio_devices.py` (mic/speaker), `local_tts.py` (SpeechT5 Arabic TTS), `live_voice_loop.py` (user-transcript → arm gesture), `movement_dispatch.py` (Gemini-phrase → locomotion), `typed_replay.py`, `wake_phrase_manager.py`, `text_utils.py` (Arabic normalization + phrase matching), `model_script.py` / `model_subprocess.py` (brain templates). |
|
||
| `motion/` | `arm_controller.py` (production 5-phase JSONL replay engine, owns the single DDS init), `macro_player.py`, `macro_recorder.py`, `teaching.py`. (`sanad_arm_controller.py` is a legacy alternate — not wired by `main.py`.) |
|
||
| `G1_Controller/` | `loco_controller.py` — locomotion via Unitree `LocoClient` (move/step/postures/FSM/E-STOP); reuses the arm's DDS participant. |
|
||
| `vision/` | `camera.py` (RealSense/USB daemon, auto-reconnect), `face_gallery.py`, `zone_gallery.py`, `recognition_state.py` (atomic-JSON toggle IPC). |
|
||
| `dashboard/` | `app.py` (FastAPI factory + fault-isolated router registration), `routes/*.py` (20 REST routers), `websockets/*.py` (logs, motor-temps, terminal), `static/index.html` (single-page UI), `static/temp3d/` (3D viewer). |
|
||
| `scripts/` | Persona files — `sanad_script.txt` (voice persona "Bousandah"), `sanad_rule.txt`, `sanad_arm.txt` (voice→arm phrases). |
|
||
| `data/` | Runtime state — `motions/*.jsonl` (arm trajectories) + `instruction.json` (locomotion phrase map) + `skills.json` + `config.json` (dashboard-editable), `recordings/` (captured turns + macros), `faces/face_{id}/` + `zones/zone_{zid}/place_{pid}/` (galleries), `audio/` (typed-replay WAVs + records index), `.recognition_state.json` (toggle IPC). |
|
||
| `model/` | Local SpeechT5 / Whisper / CosyVoice2 weights when using the offline pipeline. |
|
||
| `logs/` | Per-module rotating logs. |
|
||
|
||
|
||
## Voice brains
|
||
|
||
The child `voice/sanad_voice.py` is model-agnostic and selects a brain via
|
||
`SANAD_VOICE_BRAIN`. Every brain implements the same contract
|
||
(`__init__(audio_io, recorder, voice, system_prompt)`, `async run()`, `stop()`)
|
||
and ships a sibling supervisor that spawns the child and parses its
|
||
`USER:` / `BOT:` / state log markers.
|
||
|
||
| Value | Brain | Pipeline |
|
||
|---|---|---|
|
||
| `gemini` *(default)* | `gemini/script.py` | Gemini Live native-audio (full-duplex speech-to-speech, server-side VAD, vision frames, face/zone primers, voice→movement). Cloud. |
|
||
| `local` | `local/script.py` | Silero VAD → faster-whisper (large-v3-turbo, CUDA int8) → Qwen2.5 (Ollama/llama.cpp) → CosyVoice2 streaming TTS. Fully on-device. |
|
||
| `model` | `voice/model_script.py` | Template/stub for adding a new provider (OpenAI Realtime, Claude Voice, …). |
|
||
|
||
To add a brain: drop a file in `voice/` or a new `<brand>/` folder and add a
|
||
branch to `voice/sanad_voice.py:_build_brain()`; ship a supervisor modeled on
|
||
`voice/model_subprocess.py`.
|
||
|
||
|
||
## Runtime selection (env vars)
|
||
|
||
| Var | Values | Default | Effect |
|
||
|---|---|---|---|
|
||
| `SANAD_VOICE_BRAIN` | `gemini`, `local`, `model` | `gemini` | Which brain the subprocess loads (see `voice/sanad_voice.py:_build_brain`). |
|
||
| `SANAD_AUDIO_PROFILE` | `builtin`, `anker`, `hollyland_builtin` | `builtin` | Mic + speaker pair. `builtin` = G1 UDP mic + G1 chest speaker via DDS. |
|
||
| `SANAD_DDS_INTERFACE` | network iface | `eth0` | DDS network for G1 low-level comms (arm + locomotion + speaker). |
|
||
| `SANAD_DASHBOARD_HOST` / `_INTERFACE` | IP / iface | `wlan0` IP | Dashboard bind address. |
|
||
| `SANAD_GEMINI_API_KEY` | string | `""` (empty) | Gemini API key. No key ships in the repo — set this, paste one in the dashboard (**Voice & Audio → Gemini API Key**), or fill `gemini_defaults.api_key` in `config/core_config.json`. See [Quick start](#quick-start-on-the-robot). |
|
||
| `SANAD_GEMINI_MODEL` / `_VOICE` | string | reads config | Override the Gemini model id / prebuilt voice. |
|
||
| `SANAD_G1_VOLUME` | `0`–`100` | `100` | G1 chest-speaker volume; also scales the barge-in threshold. |
|
||
| `SANAD_LIVE_SCRIPT` | path | auto | Override the subprocess entry script path. |
|
||
| `SANAD_RECORD` | `0` or `1` | `1` | Record every Gemini turn to `data/recordings/`. |
|
||
| `SANAD_AEC_ENABLE` | `0` or `1` | `1` | Enable WebRTC AEC3 (if the Python binding is installed). |
|
||
| `SANAD_VISION_ENABLE` | `0` or `1` | `0` | Boot default for camera vision. **Runtime truth is the Recognition-tab toggle** → `data/.recognition_state.json`, hot-applied without a restart. |
|
||
| `SANAD_FACE_RECOGNITION_ENABLE` | `0` or `1` | `0` | Boot default for Gemini-side face recognition. Also a hot toggle. |
|
||
| `SANAD_VISION_SEND_HZ` | float | `2` | Frames/sec the Gemini child relays to Live. |
|
||
| `SANAD_CAMERA_WIDTH` / `_HEIGHT` / `_FPS` | int | `424` / `240` / `15` | Capture profile. Also settable per-deploy in `config/core_config.json > camera`. |
|
||
| `SANAD_CAMERA_USB_INDEX` | int | auto | Pin a `/dev/videoN` node (avoids picking a RealSense IR stream). |
|
||
| `SANAD_FACES_MAX_SAMPLES` | int | `3` | Max photos per person fed into the gallery primer turn (token budget). |
|
||
| `SANAD_PROJECT_ROOT` | path | auto | Override the project root (see *Dynamic paths*). |
|
||
|
||
> All `SANAD_VISION_*` / `SANAD_CAMERA_*` / `SANAD_FACE_*` vars are **boot
|
||
> defaults** forwarded to the Gemini child via `LIVE_TUNE`. Once running, the
|
||
> Recognition tab's toggles (vision / face-rec / zone-rec / movement) are the
|
||
> live source of truth in `data/.recognition_state.json`, polled at 1 Hz.
|
||
|
||
CLI flags: `python3 main.py --host <ip> --port 8000 --network <dds_iface>`;
|
||
`--check-env` prints a subsystem/environment diagnostic and exits.
|
||
|
||
|
||
## API surface
|
||
|
||
All routes are registered defensively — a router whose import fails is recorded
|
||
(`GET /api/_dashboard_status`) and the server still boots without it.
|
||
|
||
**REST** (prefix → controls): `/api` health · `/api/system` info ·
|
||
`/api/voice` Gemini/local generate+connect+key · `/api/motion` arm actions ·
|
||
`/api/skills` skill registry · `/api/macros` record/play · `/api/replay` JSONL
|
||
CRUD + teaching · `/api/audio` mute/volume/devices/reset · `/api/scripts`
|
||
persona files · `/api/records` saved WAVs · `/api/prompt` system prompt ·
|
||
`/api/wake-phrases` bindings · `/api/live-voice` arm-phrase dispatcher ·
|
||
`/api/live-subprocess` Gemini child · `/api/typed-replay` TTS · `/api/recognition`
|
||
vision + face gallery · `/api/zones` zones/places + nav target · `/api/temp`
|
||
motor map + snapshot · `/api/controller` locomotion (move/step/postures/modes/
|
||
E-STOP).
|
||
|
||
**WebSockets**: `/ws/logs` (live log stream + 500-line replay) ·
|
||
`/ws/motor-temps` (3D heatmap data, ~8 fps) · `/ws/terminal` (PTY shell).
|
||
|
||
|
||
## Architecture notes
|
||
|
||
- **Subprocess isolation**: `voice/sanad_voice.py` runs as a child of `main.py`
|
||
via the supervisor. If the voice loop crashes, the dashboard + arm + legs stay
|
||
up.
|
||
- **Single DDS init**: `motion/arm_controller.py` owns the one
|
||
`ChannelFactoryInitialize`; `LocoController` and the audio routes reuse that
|
||
participant rather than re-initializing.
|
||
- **Brain contract**: see `voice/model_script.py` — any new model implements
|
||
`__init__(audio_io, recorder, voice, system_prompt)`, `async run()`, `stop()`.
|
||
- **Supervisor contract**: each brain ships a sibling supervisor (e.g.
|
||
`gemini/subprocess.py`) that spawns `sanad_voice.py` with its
|
||
`SANAD_VOICE_BRAIN` and parses the brain's log markers. Template:
|
||
`voice/model_subprocess.py`.
|
||
- **Locomotion safety**: `LocoController` is disarmed every boot, has velocity
|
||
caps + a `StopMove` watchdog, and is mutually exclusive with the arm.
|
||
Voice-driven movement is **off by default** and gated by the Controller
|
||
toggle. Distances/degrees in `data/motions/instruction.json` are
|
||
**approximate and must be calibrated on the real robot** — there is no
|
||
obstacle/abort stack.
|
||
- **Audio routing**: the G1's platform-sound PulseAudio sink is NOT wired to a
|
||
physical speaker. All dashboard-triggered playback (`play_wav`, typed-replay
|
||
audio, record playback) routes through DDS `AudioClient.PlayStream` via
|
||
`audio_manager._play_pcm_via_g1`. The PyAudio path is a desktop/dev fallback.
|
||
- **Arm replay**: `motion/arm_controller.py:_replay_file_inner()` is a port of
|
||
`G1_Lootah/Manual_Recorder/g1_replay_v4_stable.py:Run()` — ramp-in → settle
|
||
hold → playback → smooth return → disable SDK. Body motors (0–14) lock to a
|
||
live snapshot while arm motors (15–28) follow the file at 60 Hz. `_return_home()`
|
||
runs unconditionally after a cancel for a jerk-free return.
|
||
- **Camera frame transport (stdin push)**: the `CameraDaemon` lives in the
|
||
parent and caches frames in memory. `GeminiSubprocess` base64-encodes the
|
||
latest frame to the child's stdin (~2 fps); the child's `_stdin_watcher`
|
||
relays it to Gemini Live with a staleness guard. Chosen over a file drop so
|
||
the parent owns the camera once and the dashboard preview reads the same cache.
|
||
- **Motion-state channel**: `arm_controller._execute()` emits
|
||
`motion.action_started` / `_done` / `_error` on the event bus. `main.py`
|
||
forwards each to the child as `state:<json>\n`, injected to Gemini Live as
|
||
silent `[STATE-START] wave_hand` / `[STATE-DONE] wave_hand (2.3s)` text so it
|
||
can honestly answer "what are you doing?".
|
||
- **Recognition is Gemini-side**: no dlib/insightface/onnxruntime. Galleries are
|
||
pure file IO; `gemini/script.py:_send_gallery_primer()` builds one multimodal
|
||
`send_client_content` turn — every enrolled face/place's photos + a greeting
|
||
instruction — and Gemini matches incoming frames against it in-context.
|
||
|
||
|
||
## Camera vision on Jetson
|
||
|
||
The Recognition tab needs `pyrealsense2` to talk to the Intel RealSense.
|
||
**Do not `pip install pyrealsense2` on JetPack 5** — the PyPI wheel is built
|
||
against glibc 2.32+ (Ubuntu 22.04) and fails to load on JetPack 5's glibc
|
||
2.31 with `ImportError: ... version 'GLIBC_2.32' not found`.
|
||
|
||
The native runtime is already there (`apt`-installed `librealsense2`). Build
|
||
just the Python binding from source against it, into the `gemini_sdk` env:
|
||
|
||
```bash
|
||
rs-enumerate-devices # confirm the D435I shows up at OS level first
|
||
|
||
source ~/miniconda3/etc/profile.d/conda.sh && conda activate gemini_sdk
|
||
pip uninstall -y pyrealsense2 # remove the broken wheel if present
|
||
sudo apt install -y cmake build-essential git python3-dev libusb-1.0-0-dev pkg-config libssl-dev
|
||
|
||
cd /tmp && rm -rf librealsense
|
||
git clone --depth=1 --branch v2.56.5 https://github.com/IntelRealSense/librealsense.git
|
||
cd librealsense && mkdir -p build && cd build
|
||
cmake .. -DBUILD_PYTHON_BINDINGS=ON -DPYTHON_EXECUTABLE=$(which python3) \
|
||
-DBUILD_EXAMPLES=OFF -DBUILD_GRAPHICAL_EXAMPLES=OFF \
|
||
-DBUILD_UNIT_TESTS=OFF -DCHECK_FOR_UPDATES=OFF -DCMAKE_BUILD_TYPE=Release
|
||
make -j$(nproc) pyrealsense2
|
||
SITE=$(python3 -c "import sysconfig; print(sysconfig.get_paths()['purelib'])")
|
||
mkdir -p "$SITE/pyrealsense2"
|
||
cp wrappers/python/pyrealsense2*.so "$SITE/pyrealsense2/"
|
||
cp ../wrappers/python/pyrealsense2/__init__.py "$SITE/pyrealsense2/" 2>/dev/null || true
|
||
|
||
python3 -c 'import pyrealsense2 as rs; print([d.get_info(rs.camera_info.name) for d in rs.context().query_devices()])'
|
||
```
|
||
|
||
Match the `--branch` tag to the installed runtime (`dpkg -l | grep librealsense2`).
|
||
If the build isn't worth it, `CameraDaemon` falls back to `cv2.VideoCapture(0)`
|
||
automatically — fine for a plain USB webcam, but note a RealSense exposes its
|
||
*depth* stream at `/dev/video0`, not RGB, so a real USB cam is the cleaner
|
||
fallback (or pin `SANAD_CAMERA_USB_INDEX`). On x86_64 / Ubuntu 22.04+ desktops,
|
||
`pip install pyrealsense2` just works.
|
||
|
||
|
||
## Dynamic paths
|
||
|
||
Every path is derived at runtime — no hard-coded `/home/...` anywhere.
|
||
Resolution order for `BASE_DIR` in `config.py`:
|
||
|
||
1. `SANAD_PROJECT_ROOT` env var (if set).
|
||
2. `PROJECT_BASE + PROJECT_NAME` from a `.env` file in `Sanad/` or its parent.
|
||
3. `Path(__file__).resolve().parent` — auto-detected.
|
||
|
||
The project runs unchanged from either layout:
|
||
- dev: `<anywhere>/Project/Sanad/`
|
||
- deployed: `/home/unitree/Sanad/`
|
||
|
||
|
||
## Deployment (workstation → robot)
|
||
|
||
```bash
|
||
rsync -av --delete \
|
||
--exclude=__pycache__ --exclude=logs --exclude=model --exclude=.git \
|
||
/path/to/Sanad/ \
|
||
unitree@192.168.123.164:/home/unitree/Sanad/
|
||
```
|
||
|
||
Then on the robot: `Ctrl+C` the running `main.py` and re-run.
|
||
|
||
|
||
## Security
|
||
|
||
The dashboard has **no authentication**. Anyone who can reach
|
||
`http://<robot>:8000` gets full robot control — locomotion, arm, audio, file
|
||
upload/delete — and, via the **Terminal tab**, an interactive shell as the
|
||
dashboard's user. Bind it to a **trusted LAN only**; add auth before any wider
|
||
exposure.
|
||
|
||
|
||
## Troubleshooting
|
||
|
||
| Symptom | Fix |
|
||
|---|---|
|
||
| `No LowState received in 2s — refusing to replay` | `main.py` was re-executed as both `__main__` and `Project.Sanad.main`, creating two arm instances. Fix lives in the `sys.modules` alias near the top of `main.py`. Restart. |
|
||
| `G1ArmActionClient not available — skipping` for SDK actions | Same duplicate-init issue as above. |
|
||
| `No module named 'Project'` in subprocess | Bootstrap preamble in `voice/sanad_voice.py:~30` synthesises the `Project.Sanad` namespace when run as `__main__`. |
|
||
| Controller moves rejected (409) | The Controller is **disarmed by default** — hit Arm first. Reads + E-STOP are always allowed. |
|
||
| Arm action refused while "movement armed" | Arm ↔ locomotion are mutually exclusive. Disarm/stop locomotion, then trigger the arm. |
|
||
| Voice-driven walking does nothing | "Gemini Movement" toggle off, or E-STOP latched. Toggle on; clear E-STOP. Distances are uncalibrated. |
|
||
| Arm jumps at start of JSONL replay | `SETTLE_HOLD_SEC` (in `config/motion_config.json > arm_controller`) too low — try `0.7` or `1.0`. |
|
||
| Record playback silent | `audio_mgr.play_wav` only routes to G1 DDS if the Unitree SDK is importable; on desktop it falls back to the PulseAudio sink. |
|
||
| Live Voice Commands transcript stuck | Deferred trigger was queued but `trigger_enabled` toggle was off. Toggle on — or the pending-trigger poll fires it automatically once enabled. |
|
||
| Gemini "no audio" on Typed Replay | Non-deterministic; the retry chain in `voice/typed_replay.py:generate_audio` tries three prompt variants. For reliable TTS, use the offline `local_tts` SpeechT5 path. |
|
||
| Local brain exits immediately | `ollama serve` not running / model not pulled, or weights missing under `model/`. Check `logs/local_subprocess.log`. The Gemini brain is the safe default. |
|
||
| Recognition tab: "Camera could not start (no backend)" | No camera backend acquired. Check `rs-enumerate-devices` (RealSense at OS level) and `python3 -c 'import pyrealsense2'` in the `gemini_sdk` env. The glibc `ImportError` means the pip wheel is incompatible — see "Camera vision on Jetson" above. |
|
||
| Camera badge stuck on "reconnecting…" | `CameraDaemon` lost the device and is retrying with exponential backoff. Re-seat the USB 3 cable; check `logs/camera.log` for the USB-2.0 warning. |
|
||
| Gemini doesn't greet an enrolled face | Face Recognition toggle on? Vision on? (Face rec needs frames.) Check `logs/gemini_brain.log` for `face gallery primed: N person(s)`. Hit "Sync Gallery" to force a re-prime. |
|
||
| Gemini unaware of motion state | The `motion.action_*` → `send_state` chain only runs when Live Gemini is up. Check `logs/gemini_subprocess.log` and `logs/gemini_brain.log` for `STATE injected:` lines. |
|
||
|
||
|
||
## License / attribution
|
||
|
||
Internal project for YS Lootah Technology. Reuses/ports patterns from:
|
||
- `G1_Lootah/Manual_Recorder/g1_replay_v4_stable.py` (arm replay math)
|
||
- `SanadVoice/gemini_interact` (arm-phrase dispatch, skill registry)
|
||
- `SanadVoice/gemini_voice_v2` (local SpeechT5 TTS)
|
||
- `Project/Marcus` — camera→Gemini stdin-push transport, motion-state
|
||
injection, camera daemon resilience (auto-reconnect, USB-2.0 warning), the
|
||
`API/camera_api.py` cache shape (`get_frame_b64` / `get_fresh_frame`), and the
|
||
confirmation-phrase → locomotion pattern (`movement_dispatch`).
|
||
- Unitree `unitree_sdk2py` (G1 low-level SDK, `LocoClient`, `G1ArmActionClient`,
|
||
`AudioClient.PlayStream`).
|