Sanad_Package_1/README.md

7.0 KiB

Sanad Package 1 — Basic Communication

Hands-free conversation in one operator-selected language (Gemini Live), audio via the G1 chest or any plugged USB mic/speaker (Anker). No voice-command motion, vision, recognition, or navigation. Dashboard on :8011.

What it ships

  • app_p1.py — launcher: bootstraps the Project.Sanad namespace, constructs ONLY the comms subsystems (brain, audio_mgr, voice_client, local_tts, typed_replay, live_sub), injects a P1-scoped Project.Sanad.main shim, and mounts ONLY the P1 dashboard routers (voice, audio, prompt, typed-replay, records, logs, live-subprocess, health, system) + the logs websocket. Serves the real Sanad SPA with non-P1 tabs hidden.
  • entrypoint.sh — license gate (license_check P1; clean exit if unlicensed), resolves language/audio/port (env > license feature > config/p1_config.json).
  • Dockerfile / requirements-p1.txtFROM sanad-base, adds PortAudio + google-genai.
  • config/p1_config.json — defaults (language, audio profile, port, tab set).
  • docker-compose.p1.yml — standalone run; top-level compose wires --profile p1.

It does not fork Sanad — it reuses the canonical source baked into sanad-base.

Run & stop P1

A) Docker (the productized way) — from Project/Packages on the robot:

docker compose --profile base build           # build sanad-base once
docker compose --profile p1 up -d --build      # run  -> http://<robot>:8011
docker compose --profile p1 logs -f p1         # view logs
docker compose --profile p1 down               # stop
# audio: SANAD_AUDIO_PROFILE=builtin (chest) | plugged (USB/Anker)
# language: license `language` feature, or SANAD_LANGUAGE=en docker compose --profile p1 up -d

B) Dev mode (no Docker) — run P1 in the robot's gemini_sdk conda env via the control script (deployed to ~/sanad_deploy/Packages/Sanad_Package_1/p1ctl.sh):

cd ~/sanad_deploy/Packages/Sanad_Package_1
./p1ctl.sh start      # launch on :8011 (coexists with Sanad on :8000)
./p1ctl.sh status     # process + /api/health
./p1ctl.sh logs 80    # tail the P1 log
./p1ctl.sh restart
./p1ctl.sh stop

Deploy/update from the workstation first: rsync -az --exclude __pycache__ Project/Packages Project/Sanad unitree@<robot>:~/sanad_deploy/

Logs: the dashboard's Logs card streams live (/ws/logs) and the ⬇ Download button saves the full bundle (/api/logs/bundle) as sanad_p1_logs_<ts>.txt.

Endpoints (P1 subset)

/ (filtered SPA) · /api/package (manifest + license + api-key status) · /api/voice/* · /api/audio/* · /api/prompt/* · /api/typed-replay/* · /api/records/* · /api/logs/* · /api/live-subprocess/* · /api/health · /api/system/info · /ws/logs · /api/p1/* (P1 settings, see below).

The P1 dashboard (http://<robot>:8011)

  • / — a clean P1 control page with cards: Conversation (start/stop), Say-a-line, Persona (Save & Apply), Gemini API key, Audio (speaker profile + volume + mute + rescan), and a live Logs view. This is the everyday UI — no API knowledge needed.
  • /full — the complete Sanad SPA (advanced), with non-P1 tabs hidden (motion/recognition/nav/temperature/terminal belong to other packages).

What you can do (cards on /, also the matching endpoints):

You want to… Where / endpoint
Talk to the robot (start/stop the live conversation) Voice tab · `POST /api/live-subprocess/start
Make it say a specific line Voice/Typed-replay · POST /api/voice/generate, POST /api/typed-replay/say
Change the robot persona (who it is, tone, language/dialect) Settings · GET/POST /api/p1/persona (or /api/prompt)
Set / update the Gemini API key Settings · GET/POST /api/p1/api-key
Pick speaker/mic (chest vs Anker/USB), volume, mute Audio · `/api/audio/devices
Manage saved recordings (save/replay/rename/delete) Recordings · /api/records/*, /api/typed-replay/*
See logs / system / health Settings · /api/logs, /ws/logs, /api/system/info, /api/health

Change the robot persona

The persona is the system prompt at scripts/sanad_script.txt (who Sanad is, tone, and the language/dialect it speaks). Edit it from the Settings tab or:

curl http://<robot>:8011/api/p1/persona                       # current persona + rules
curl -X POST http://<robot>:8011/api/p1/persona \
     -H 'Content-Type: application/json' \
     -d '{"content":"You are Sanad, a friendly Emirati guide. Speak Khaleeji Arabic..."}'

POST /api/p1/persona writes the persona and restarts the live session so it takes effect immediately (the base /api/prompt/update writes the file but a running session keeps the old persona until restarted). This is also how you steer the conversation language (put the language directive in the persona).

Set / update the Gemini API key

Two ways, both available in P1:

  • Base (Sanad): GET/POST /api/voice/api-key — the SPA Voice/Settings tab uses this. POST persists to data/motions/config.json, hot-swaps the in-memory key, and disconnects the short-session client. The live Gemini subprocess must be restarted (Stop→Start) to pick it up.
  • P1 convenience: GET/POST /api/p1/api-key — same persist + hot-swap, and also auto-restarts the live Gemini subprocess so the new key applies immediately. GET /api/p1/settings returns api-key status + language + audio profile + whether a live session is running.
# set or update the key (works for first-time set AND replacing an existing key)
curl -X POST http://<robot>:8011/api/p1/api-key \
     -H 'Content-Type: application/json' -d '{"api_key":"AIza...."}'
# check status (masked; never returns the full key)
curl http://<robot>:8011/api/p1/api-key

Keys are validated (must start with AIza, length check), stored masked in any response, and persisted to data/motions/config.json (highest precedence, ahead of SANAD_GEMINI_API_KEY env and core_config.json).

Plug-and-play status

  • Base: python:3.10-slim (multi-arch) → google-genai installs cleanly, no CUDA needed. Build on the Jetson (or x86) with docker compose --profile base build.
  • Works out of the box with a plugged USB speaker/mic. The entrypoint runs a preflight (python / google-genai / pyaudio / Unitree-SDK / audio profile) and prints clear guidance if something's missing.
  • Language is set via the Persona card (put the dialect/language directive in the system prompt — saving applies it to the live session immediately).
  • Pending for true "pull-and-run": prebuilt linux/arm64 image in a registry; bundling unitree_sdk2_python for turnkey chest (builtin) audio (today: use plugged, or mount the SDK). In a multi-package deployment, audio output later routes through the Sanad_Core hwbroker audio-lock (P1 standalone speaks directly).