7.0 KiB
Sanad Package 1 — Basic Communication
Hands-free conversation in one operator-selected language (Gemini Live), audio via the G1 chest or any plugged USB mic/speaker (Anker). No voice-command motion, vision, recognition, or navigation. Dashboard on :8011.
What it ships
app_p1.py— launcher: bootstraps theProject.Sanadnamespace, constructs ONLY the comms subsystems (brain,audio_mgr,voice_client,local_tts,typed_replay,live_sub), injects a P1-scopedProject.Sanad.mainshim, and mounts ONLY the P1 dashboard routers (voice,audio,prompt,typed-replay,records,logs,live-subprocess,health,system) + the logs websocket. Serves the real Sanad SPA with non-P1 tabs hidden.entrypoint.sh— license gate (license_check P1; clean exit if unlicensed), resolves language/audio/port (env > license feature >config/p1_config.json).Dockerfile/requirements-p1.txt—FROM sanad-base, adds PortAudio +google-genai.config/p1_config.json— defaults (language, audio profile, port, tab set).docker-compose.p1.yml— standalone run; top-level compose wires--profile p1.
It does not fork Sanad — it reuses the canonical source baked into
sanad-base.
Run & stop P1
A) Docker (the productized way) — from Project/Packages on the robot:
docker compose --profile base build # build sanad-base once
docker compose --profile p1 up -d --build # run -> http://<robot>:8011
docker compose --profile p1 logs -f p1 # view logs
docker compose --profile p1 down # stop
# audio: SANAD_AUDIO_PROFILE=builtin (chest) | plugged (USB/Anker)
# language: license `language` feature, or SANAD_LANGUAGE=en docker compose --profile p1 up -d
B) Dev mode (no Docker) — run P1 in the robot's gemini_sdk conda env via the
control script (deployed to ~/sanad_deploy/Packages/Sanad_Package_1/p1ctl.sh):
cd ~/sanad_deploy/Packages/Sanad_Package_1
./p1ctl.sh start # launch on :8011 (coexists with Sanad on :8000)
./p1ctl.sh status # process + /api/health
./p1ctl.sh logs 80 # tail the P1 log
./p1ctl.sh restart
./p1ctl.sh stop
Deploy/update from the workstation first:
rsync -az --exclude __pycache__ Project/Packages Project/Sanad unitree@<robot>:~/sanad_deploy/
Logs: the dashboard's Logs card streams live (/ws/logs) and the ⬇ Download
button saves the full bundle (/api/logs/bundle) as sanad_p1_logs_<ts>.txt.
Endpoints (P1 subset)
/ (filtered SPA) · /api/package (manifest + license + api-key status) ·
/api/voice/* · /api/audio/* · /api/prompt/* · /api/typed-replay/* ·
/api/records/* · /api/logs/* · /api/live-subprocess/* · /api/health ·
/api/system/info · /ws/logs · /api/p1/* (P1 settings, see below).
The P1 dashboard (http://<robot>:8011)
/— a clean P1 control page with cards: Conversation (start/stop), Say-a-line, Persona (Save & Apply), Gemini API key, Audio (speaker profile + volume + mute + rescan), and a live Logs view. This is the everyday UI — no API knowledge needed./full— the complete Sanad SPA (advanced), with non-P1 tabs hidden (motion/recognition/nav/temperature/terminal belong to other packages).
What you can do (cards on /, also the matching endpoints):
| You want to… | Where / endpoint |
|---|---|
| Talk to the robot (start/stop the live conversation) | Voice tab · `POST /api/live-subprocess/start |
| Make it say a specific line | Voice/Typed-replay · POST /api/voice/generate, POST /api/typed-replay/say |
| Change the robot persona (who it is, tone, language/dialect) | Settings · GET/POST /api/p1/persona (or /api/prompt) |
| Set / update the Gemini API key | Settings · GET/POST /api/p1/api-key |
| Pick speaker/mic (chest vs Anker/USB), volume, mute | Audio · `/api/audio/devices |
| Manage saved recordings (save/replay/rename/delete) | Recordings · /api/records/*, /api/typed-replay/* |
| See logs / system / health | Settings · /api/logs, /ws/logs, /api/system/info, /api/health |
Change the robot persona
The persona is the system prompt at scripts/sanad_script.txt (who Sanad is,
tone, and the language/dialect it speaks). Edit it from the Settings tab or:
curl http://<robot>:8011/api/p1/persona # current persona + rules
curl -X POST http://<robot>:8011/api/p1/persona \
-H 'Content-Type: application/json' \
-d '{"content":"You are Sanad, a friendly Emirati guide. Speak Khaleeji Arabic..."}'
POST /api/p1/persona writes the persona and restarts the live session so it
takes effect immediately (the base /api/prompt/update writes the file but a
running session keeps the old persona until restarted). This is also how you
steer the conversation language (put the language directive in the persona).
Set / update the Gemini API key
Two ways, both available in P1:
- Base (Sanad):
GET/POST /api/voice/api-key— the SPA Voice/Settings tab uses this. POST persists todata/motions/config.json, hot-swaps the in-memory key, and disconnects the short-session client. The live Gemini subprocess must be restarted (Stop→Start) to pick it up. - P1 convenience:
GET/POST /api/p1/api-key— same persist + hot-swap, and also auto-restarts the live Gemini subprocess so the new key applies immediately.GET /api/p1/settingsreturns api-key status + language + audio profile + whether a live session is running.
# set or update the key (works for first-time set AND replacing an existing key)
curl -X POST http://<robot>:8011/api/p1/api-key \
-H 'Content-Type: application/json' -d '{"api_key":"AIza...."}'
# check status (masked; never returns the full key)
curl http://<robot>:8011/api/p1/api-key
Keys are validated (must start with AIza, length check), stored masked in any
response, and persisted to data/motions/config.json (highest precedence, ahead
of SANAD_GEMINI_API_KEY env and core_config.json).
Plug-and-play status
- Base:
python:3.10-slim(multi-arch) →google-genaiinstalls cleanly, no CUDA needed. Build on the Jetson (or x86) withdocker compose --profile base build. - Works out of the box with a plugged USB speaker/mic. The entrypoint runs a preflight (python / google-genai / pyaudio / Unitree-SDK / audio profile) and prints clear guidance if something's missing.
- Language is set via the Persona card (put the dialect/language directive in the system prompt — saving applies it to the live session immediately).
- Pending for true "pull-and-run": prebuilt
linux/arm64image in a registry; bundlingunitree_sdk2_pythonfor turnkey chest (builtin) audio (today: useplugged, or mount the SDK). In a multi-package deployment, audio output later routes through theSanad_Corehwbroker audio-lock (P1 standalone speaks directly).