349 lines
22 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Sanad Package 2 — Premium Communication — PLAN (locked scope)
> **Status: IMPLEMENTED (self-contained) — built 2026-06-22, pending Docker build
> + on-robot test.** Vendored from SanadV3, mirrors P1. Dashboard port **:8012**.
> Structural validation passed (compile, import resolution, shim-symbol coverage,
> namespace bootstrap, `license_check P2` entitled, YAML).
>
> **Refinement (2026-06):** the engine is now vendored from **SanadV3**, not
> plain Sanad. SanadV3 already implements the mask/face subsystem and the
> evolved voice/audio/arm engine P2 needs — so P2 *vendors and wires* it rather
> than building the mask from scratch. See "Why vendor SanadV3".
P2 is a **superset of P1**: everything P1 does, plus multilingual auto-detect,
voice-commanded **arm gestures**, gestures while speaking, and a lip-syncing LED
face on the BLE "Shining Mask".
## Locked scope (decisions taken)
- **Motion = arm gestures only (first pass).** Voice-command **locomotion**
(robot walking/turning) is **DEFERRED** to a later pass — `voice/movement_dispatch.py`
+ `G1_Controller/loco_controller.py` and SanadV3's `dashboard/routes/_arbiter.py`
(loco/nav leg arbitration) are intentionally **out of scope here**.
- **Mask = included, BLE driven from INSIDE the P2 container** (vendor the Mask
controller; `bleak` + host BlueZ/D-Bus). No separate `sanad-mask` side-car.
- **Single self-contained container** (no `Sanad_Core`, no `sanad-base`) — like P1.
- **Keyless** ship; customer supplies their own Gemini key.
## Why vendor SanadV3 (not plain Sanad)
SanadV3 is **not a fork** of Sanad — it is plain Sanad **plus exactly the
subsystems P2 needs**, reachable through the identical `Project.Sanad.*` import
surface the dashboard routes already use. Verified in the tree:
- `face/mask_face.py``FaceController` (BLE asyncio-loop thread + reconnect
supervisor + `set_speaking`/`set_mouth` lip-sync inputs). Imports only
`Project.Sanad.config.BASE_DIR`, `core.config_loader`, `core.logger`, then the
flat Mask lib via `sys.path.insert(mask_dir)`.
- `face/face_motion.py``LifelikeFace` (saccades, varied blinks,
idle/listening/thinking/speaking states, timed reactions, smooth lip-sync),
with automatic fallback to the flat lib's `FaceAnimator`.
- `config/mask_config.json` → already env-driven
(`SANAD_MASK_DIR`/`ADDRESS`/`NAME_PREFIX`/`ADAPTER`), `brightness`/`fps`/
`lifelike`/`autostart`, persisted face colors.
- `dashboard/routes/mask.py` → a **failure-safe** `/api/mask` router; every
handler is `asyncio.to_thread`-wrapped and maps errors to 503/409/500 so a
missing mask never crashes the dashboard.
- Lip-sync chain → `gemini/script.py` emits `[[MOUTH:n]]` (03) RMS markers;
`gemini/subprocess.py` exposes `register_mouth_callback`; `core/brain.py`
`set_gestural_speaking` emits `brain.gestural_speaking_changed`.
- Evolved voice/audio → `voice/live_voice_loop.py` (position-based dedup),
`voice/text_utils.py` (Arabic normalize + `maybe_trigger_arm`),
`voice/audio_manager.py` (per-instance + TTL throttle) — supersets of plain
Sanad's race-prone equivalents.
Building the mask from plain Sanad (the prior plan) would mean **re-deriving all
of the above by hand** and inheriting plain Sanad's known bugs (content-based
dedup, no `[[MOUTH:n]]`, non-atomic `index.json` writes). The one thing SanadV3
does **not** solve is BLE-inside-a-container (it ran the mask on the host
`g1_env`) — but that is a Docker capability/mount problem P2 owns regardless of
base engine.
**Correction to the prior plan:** the lip-sync source of truth is **Gemini's
`[[MOUTH:n]]` markers over the event bus**, *not* a raw audio-amplitude tap, and
the driver is **`FaceController` + `LifelikeFace`**, *not* `Mask/talking.py
TalkingFace`. P2 adopts the marker path.
## Architecture
One self-contained container that **owns all the hardware it needs directly**:
the G1 DDS link, `rt/arm_sdk` (arm), chest **or** USB/Anker audio, and the BLE
mask. It runs the premium slice of the vendored **SanadV3** engine.
P2 is a **containerization wrapper** (like P1), not a fork: it
1. bootstraps the `Project.Sanad` namespace (deployed layout, P1's mechanism),
2. constructs ONLY the P2 superset subsystems,
3. injects a P2-scoped `Project.Sanad.main` shim exposing the singletons the
routers import lazily (`mask_face`, `brain`, `live_voice`, …),
4. mounts the P1 + premium routers + the logs websocket,
5. serves the real SanadV3 SPA with non-P2 tabs hidden,
6. runs uvicorn on **:8012**.
Why no `Sanad_Core`/`hwbroker`/`sanad-mask` split: that split exists only to stop
**multiple** package containers fighting over the one device set. A customer who
buys **just P2** has a single owner — no contention — so everything folds into one
container (exactly like P1 standalone, and like the original Sanad monolith which
did comms + arm + mask in one process). The ZMQ bus seam stays available for a
future P1+P2+P3 fleet SKU but is **not** part of standalone P2.
## Namespace bootstrap (reuse P1's, not SanadV3's self-alias)
P2 reuses **P1's exact bootstrap** (`app_p1.py` lines 3046): synthesize a
`Project` namespace package and alias `Project.Sanad` → the vendored `Sanad`
module, then inject a `Project.Sanad.main` shim holding the P2 singletons. The
mask + voice routes resolve their singletons via lazy `from Project.Sanad.main
import mask_face` inside handlers — so the shim must define them. (SanadV3's own
self-alias-by-folder-name is skipped automatically — the vendored tree is named
`Sanad`, so `main.py`'s `if _THIS_DIR.name != 'Sanad'` branch never fires — and
the wrapper-with-shim is what P1 ships and what the routes expect.)
## Features / capabilities
### Inherited from P1 (superset — same code, premium flags on)
Hands-free Gemini conversation · persona editor (who/tone/language) · **keyless**
Gemini key (customer adds own) · chest **or** USB/Anker audio (selectable,
hot-swap) · typed-replay / "say a line" · live logs + download · offline license
gate. **Same audio (mic+speaker) mechanism as P1**, via the evolved per-instance
`voice/audio_manager.py` + `voice/audio_devices.py`.
### New in P2 (this pass)
1. **Multilingual auto-detect** — Gemini natively detects the visitor's language
(Arabic Gulf/English) and replies in kind, via the bilingual system prompt in
`gemini/script.py` / `voice/sanad_voice.py`. No per-user flag.
2. **Voice-command arm gestures**`voice/live_voice_loop.py`: USER speech →
arm actions via `scripts/sanad_arm.txt` (**23 actions**, non-contiguous ids
`{015, 2328, 30}`; the file's nominal range is 028; hundreds of Arabic/EN
phrase variants) → `sanad_arm_controller.ARM.trigger_action_by_id()`.
**Instant** or **deferred** mode (0.65 s fallback so a silent user still
fires). Master **trigger-enabled** gate (default **OFF**). Position-based
dedup (`_last_snapshot` + `_trigger_lock`).
3. **Gestures while speaking**`core/brain.set_gestural_speaking`
`brain.gestural_speaking_changed``mask_face.set_speaking(True)` (mouth +
any gestural motion animate together while Gemini talks).
4. **Wake-phrase management** — phrase→action CRUD
(`voice/wake_phrase_manager.py`, persisted to `data/wake_phrases.json`),
folded into the live loop at runtime via `_merge_wake_phrases`.
5. **Skills registry** — skill CRUD, execute, upload-audio
(`dashboard/routes/skills.py`).
6. **Lip-sync on the LED "Shining Mask"** — vendored `face/mask_face.py`
`FaceController` + `face/face_motion.py` `LifelikeFace`:
- mouth driven by Gemini `[[MOUTH:n]]` (03) markers →
`gemini/subprocess.register_mouth_callback``mask_face.set_mouth(level)`;
- `set_speaking(on)` from the gestural-speaking event for auto-talk;
- state-aware idle/listening/thinking + timed reactions (smile/surprised/sad);
- falls back to the flat lib's `FaceAnimator` if `LifelikeFace`/Pillow/bleak
unavailable (`lifelike=false` in `mask_config.json`).
### Deferred (NOT in this pass)
- **Voice-command locomotion** (`voice/movement_dispatch.py` + `loco_controller`):
Gemini's spoken confirmation → discrete bounded steps, `movement_enabled` gate,
"stop" = E-STOP. Adds a walking-on-voice safety surface + on-robot calibration —
staged separately. SanadV3's `dashboard/routes/_arbiter.py` (Nav2 ↔ LocoController
leg arbitration) belongs to this pass, **not** P2's arm-only pass.
- Multi-package fleet via `Sanad_Core` (hwbroker/busd/shared `sanad-mask`).
## Dashboard (:8012) = all P1 tabs **+** (all routes VENDORED from SanadV3)
- **Voice** — adds the multilingual auto-detect toggle + per-language voice config.
- **Live-voice (commands)** — `dashboard/routes/live_voice.py`: start/stop,
deferred-mode toggle, **trigger-enabled** master gate, status, trigger history.
- **Wake-phrases** — phrase→action CRUD (AR dialects + EN).
- **Motion / Gestural** — `dashboard/routes/motion.py`: gestural-speaking toggle,
trigger / cancel arm actions. *(Arm only — loco controls present in the route
but unwired this pass.)*
- **Skills** — `dashboard/routes/skills.py`.
- **Mask / Lip-sync** — the existing SanadV3 SPA **Mask Face** tab + the vendored
`dashboard/routes/mask.py` (`/api/mask/*`): connect/disconnect, brightness,
face start/stop/return/color, speaking toggle, mouth slider, expressions,
text/image/animation overrides, status. **Mounted as-is — not authored.**
- **Logs**.
Non-P2 tabs (recognition, temp/3D, controller, navigation, terminal) are hidden
the same way P1 hides its non-P1 set.
## What it vendors / reuses (self-contained, like P1)
- `vendor/Sanad` — the **SanadV3** engine tree (rsync-excluding
`data/`, `Logs/`, `__pycache__/`, `tests/`, `static/temp3d/`). Includes
`face/`, evolved `voice/`, evolved `gemini/`, `motion/`, the mask + live-voice
+ motion + skills routes, and the SPA. *(Locomotion modules vendored but left
unwired this pass.)*
- `vendor/sanad_pkg` — IPC bus shim + offline license verification lib (P1's set).
- `vendor/mask` — the flat `shiningmask` library copied from `Project/Mask`
(`mask.py`, `faceanim.py`, `colorface.py`, `constants.py`, `protocol.py`,
`transport.py`, `bitmap.py`, `NotoSans-Regular.ttf`, …). `Project/Mask` uses
**flat imports** (`import faces`, `import mask`), so it goes on its **own**
path (`SANAD_MASK_DIR=/app/mask`, also on `PYTHONPATH`) — **NOT** under
`Sanad/`, to avoid collisions. `face/mask_face.py` and `face/face_motion.py`
both `sys.path.insert(mask_dir)` and `import mask / faceanim / colorface`.
- A `sync_vendor.sh` that refreshes **both** `vendor/Sanad` (from SanadV3) and
`vendor/mask` (from `Project/Mask`), and blanks any baked Gemini key.
## Wiring the lip-sync + gestures (in `app_p2.py` / the `Project.Sanad.main` shim)
1. Construct P1 comms singletons (brain, audio_mgr, voice_client,
`GeminiSubprocess`) exactly as P1 does.
2. Construct premium singletons: `FaceController()` (mask_face),
`LiveVoiceLoop(...)`, `WakePhraseManager()`, arm controllers, skills.
3. `brain.attach_live_voice(live_voice)`; **wire the arm⇄locomotion motion-block
predicate exactly as SanadV3's `main.py` does** (`arm_controller.set_motion_block(...)`).
This is load-bearing safety, not optional: **two `rt/arm_sdk` publishers
coexist in-process** — `motion/arm_controller.py` (publisher ~line 237) and
`motion/sanad_arm_controller.py` (~line 176). They stay collision-free ONLY via
(a) that `set_motion_block` interlock on `ArmController` and (b)
`sanad_arm_controller`'s `_is_busy`/`_busy_lock` atomic guard. A boot-time
'sole writer' assertion only covers cross-*container* contention — it does
**not** replace this intra-process interlock, which `app_p2.py` must reproduce.
4. **Lip-sync:** `gemini_subprocess.register_mouth_callback(mask_face.set_mouth)`.
5. **Gestures-while-speaking:** subscribe `brain.gestural_speaking_changed`
`mask_face.set_speaking(on)`. The event bus is **synchronous**
(`core/event_bus` `.on`/`.emit_sync`), so `set_speaking` runs on the caller's
thread — keep it non-blocking (it only flips a flag the BLE loop reads).
6. **Lifelike state:** wire voice events (connected→`set_listening`,
user_said→`set_thinking`, disconnected→`set_idle`,
`voice.error`/`motion.action_error``react('sad')`,
`skill.finished``react('smile')`). **SanadV3's `main.py` lines ~360427 are
the concrete reference implementation** — copy that wiring (incl.
`register_mouth_callback` ~383391 and `mask_face.shutdown()` ~587591) rather
than re-deriving it.
7. Expose `mask_face`, `brain`, `live_voice` on the `Project.Sanad.main` shim so
the lazy route accessors resolve them.
8. On shutdown: `mask_face.shutdown()` (BLE disconnect + stop loop) — handle
SIGTERM so the container exits cleanly.
## Container & hardware
- `FROM python:3.10-slim`, `WITH_UNITREE_SDK=1` (builds — the cyclonedds `idlc`
fix gives arm + chest audio).
- **System deps (added over P1):** `bluez`, `libdbus-1-3`/`libdbus-1-dev`,
`libglib2.0-0` for BlueZ/D-Bus; Pillow needs no extra apt on slim.
- **Python deps:** P1's set **+** `bleak==0.22.3` (**pinned** — bleak 3.x throws
`KeyError 'Roles'` on the Jetson's BlueZ 5.53 and every connect fails) **+**
`Pillow` (LifelikeFace frame rendering).
- **BLE for the mask (in-container):** mount `/var/run/dbus`,
`--cap-add NET_ADMIN`, `/dev/bus/usb`; `network_mode: host`. Free the mask
from the phone app before connecting. Set `SANAD_MASK_DIR=/app/mask`.
- `/dev/snd` (audio), license mount, **writable** `./data` + `./config` mounts
(mask color persistence), `restart: unless-stopped`.
- Port **:8012**. Ships **keyless** (`strip_key.py` blanks any baked key).
- **License features:** `multilingual`, `voice_command_motion` (arm gestures),
`lipsync`, `mask`. Entrypoint checks entitlement **P2**. *(A future
`voice_command_locomotion`/`navigation` feature gates the deferred walking.)*
## Package layout (to build later — mirrors P1)
```
Sanad_Package_2/
app_p2.py routes_p2.py entrypoint.sh strip_key.py p2ctl.sh
config/p2_config.json static/ Dockerfile docker-compose.yml requirements.txt
vendor/Sanad (from SanadV3)
vendor/sanad_pkg
vendor/mask (from Project/Mask — own PYTHONPATH)
license/(pubkey + example) data/(seed incl. wake_phrases.json) sync_vendor.sh
README.md PLAN.md NEW_ROBOT_SETUP.md
```
## Build sequence (when implemented)
1. Vendor the **SanadV3** engine + `sanad_pkg` + the flat `Project/Mask`
`vendor/mask`; merge requirements (P1 deps + `bleak==0.22.3` + `Pillow`).
2. Self-contained Dockerfile (P1's + BlueZ/D-Bus system deps; `COPY vendor/mask
/app/mask`; `ENV SANAD_MASK_DIR=/app/mask`).
3. `app_p2.py` — P1's namespace bootstrap + `Project.Sanad.main` shim; construct
P1 comms **+** `FaceController` + `LiveVoiceLoop` + wake-phrase mgr + skills;
wire `register_mouth_callback`→`set_mouth` and
`gestural_speaking_changed`→`set_speaking` + lifelike state hooks; mount P1 +
premium routers (incl. the vendored `mask.py`); serve :8012.
4. Mask: `FaceController` already runs its own BLE asyncio loop +
reconnect supervisor; with `autostart=true` it connects + uploads frames
(~3090 s one-time, persists on flash) in the background — never blocks boot.
5. Multilingual: ship SanadV3's bilingual system prompt (rename persona as
decided); enable per-language voice config in the Voice tab.
6. License gate `P2`; keyless; smoke test (P1 endpoints + live-voice + wake +
skills + `/api/mask/status` → connect → face/start → mouth slider); validate
on the robot.
## Risks / mitigations
- **BLE-from-container is fiddly** — prototype the mask connect early on the
target Jetson (BlueZ 5.53) with `bleak==0.22.3`; if in-container BLE
misbehaves, fall back to the mask controller on the **host** with a tiny socket
shim (contingency, not the plan). `mask.py` route already degrades to 503 if
the subsystem is down, so the dashboard never crashes.
- **Mask flat imports** — vendor onto `/app/mask` via `SANAD_MASK_DIR`; never
place under `Sanad/`.
- **LifelikeFace deps** — needs Pillow + bleak; if either is missing the
subsystem auto-falls-back to `FaceAnimator` (`lifelike=false`) and reports the
reason in the Mask tab. The rest of P2 is unaffected.
- **Lip-sync chain spans the voice subprocess** — `gemini/subprocess.py` spawns
`voice/sanad_voice.py` (the child, `SANAD_VOICE_BRAIN=gemini`), which is an
**orchestrator** that in turn runs `gemini/script.py` — the actual `GeminiBrain`.
The `[[MOUTH:n]]` markers are emitted in **`gemini/script.py` (~lines 563/578)**,
not `sanad_voice.py`. Vendor the full **`script.py → sanad_voice.py →
subprocess.py`** chain intact and register the parent callback
(`GeminiSubprocess.register_mouth_callback`) in the shim. P1 already vendors
`gemini/subprocess.py`, so this is additive wiring, not a new vendor.
- **Arm safety (two publishers + interlock)** — `trigger_enabled` defaults **OFF**;
arm actions are bounded. The single container is the sole *container* writing
`rt/arm_sdk`, but **two in-process publishers** (`arm_controller.py` +
`sanad_arm_controller.py`) coexist — collision-free ONLY via the
`set_motion_block` interlock + `_is_busy` busy-lock that `app_p2.py` must wire
(Wiring step 3). Assert sole-container-writer at boot **and** reproduce the interlock.
- **Arm⇄locomotion interlock is a no-op in this arm-only build** —
`sanad_arm_controller.trigger_action_by_id()` calls `_blocked()` (refuses arm
motion while the robot may be walking). With locomotion **deferred**, nothing
publishes the locomotion-state signals `_blocked()` reads, so it always permits
arm motion — safe by omission **now**, but it **must be re-armed before the
deferred voice-command-locomotion pass ships**, else voice-walking would enable
with a silently-disabled arm interlock.
- **Mask hardware presence** — lip-sync needs the physical BLE mask paired/in
range and freed from the phone app.
- **Locomotion creep** — `movement_dispatch.py` + `_arbiter.py` are present in
the vendored tree; keep them **unwired** this pass to avoid accidentally
shipping voice-command walking without the safety surface + calibration.
## Open decisions (resolve before/while building)
1. **In-container BLE vs host side-car** — confirm `bleak`+BlueZ/D-Bus actually
connects from inside `python:3.10-slim` with `/var/run/dbus` + `NET_ADMIN` +
`network_mode: host` on the target Jetson (BlueZ 5.53). Pin **`bleak==0.22.3`**
(3.x throws `KeyError 'Roles'` on BlueZ 5.53 — every connect fails). If flaky,
fall back to a host-side mask controller + tiny socket shim (contingency).
2. **LifelikeFace default** — SanadV3 defaults `lifelike=true` (needs
`face/face_motion.py` + Pillow + bleak). Confirm Pillow+bleak install cleanly
in the slim image; otherwise `mask_config.json lifelike=false` auto-falls back
to `FaceAnimator`.
3. **Gemini lip-sync chain vendoring** — vendor the evolved `gemini/script.py`
(emits `[[MOUTH:n]]`) **and** `gemini/subprocess.py` (`register_mouth_callback`),
and wire `register_mouth_callback → mask_face.set_mouth` in the shim. P1 already
vendors `subprocess.py`, so this is additive.
4. **Persona / robot name** — SanadV3's fallback prompt identifies the robot as
**"Marcus"** (bilingual Gulf-Arabic/English auto-detect — this prompt *is* the
multilingual engine). Decide P2's shipped persona name/dialect in
`scripts/sanad_script.txt` and whether to keep the bilingual prompt verbatim.
5. **First-boot frame-upload latency** — the one-time DIY frame upload (~3090 s,
persists on mask flash). Decide `autostart=true` (background, non-blocking) vs
gating `face/start` behind an explicit dashboard action with a progress indicator.
6. **Vendor `movement_dispatch.py` now (unwired) or omit** — keep for forward-compat
with the deferred locomotion pass, or drop it to keep the arm-only image lean.
7. **Writable mounts for persisted face colors** — `/face/color` persists eye/
mouth/sclera colors to `mask_config.json`; ensure `config/` (or the `data/`
seed) is on a writable volume so colors survive container restarts.
> **Carried-forward safety gate:** before the deferred *voice-command-locomotion*
> pass ships, re-arm the `_blocked()` arm⇄locomotion interlock (a no-op today) and
> revisit `_arbiter.py` (Nav2 ↔ LocoController leg arbitration).
## Container-runtime audit (2026-06-23)
A 3-reader audit of the vendored runtime paths (voice/audio, mask/face, arm+dashboard)
for in-container failure modes. **Verdict: container-safe on the G1 after the
`iproute2` fix** — no remaining crash-level landmine on the builtin+gemini+host-net path.
- **FIXED (crash, P1+P2):** chest-mic `voice/audio_io.py:_find_g1_local_ip()` shells
out to `ip` — added **`iproute2`** to both Dockerfiles. (This was the live-voice
crash we hit.)
- **Mitigated (crash, off-robot only):** `BuiltinMic.start()` calls
`_find_g1_local_ip()` **unguarded**; off a G1 (no `192.168.123.x` net) it raises
and kills the voice subprocess. On the G1 (network_mode host) the interface exists →
fine. Package-level safety valve: **`SANAD_AUDIO_PROFILE=plugged` on any non-G1 host**
(documented in compose + `.env.example`). Not fixed in the engine (shared with SanadV3).
- **FIXED (degraded, persistence):** mask face colors are written to
`config/mask_config.json` (a baked layer, lost on recreate) → added a pre-seeded
single-file mount `./config/mask_config.json:/app/Sanad/config/mask_config.json`.
- **Deploy-side (degraded, P2):** mask BLE cold-start can stall ~45 s (3×15 s scan) if
host `bluetoothd` is down / adapter missing — retry-bounded, supervised, background
thread; does **not** crash. Ensure `bluetoothd` is up; leave `SANAD_MASK_ADDRESS`
empty for auto-detect.
Verified-safe (NO action, to avoid over-fixing): `~/logs` FileHandler (root-writable,
no crash); `NotoSans-Regular.ttf` present + COPYed; `local_tts` torch never imported
(gemini brain); `pactl` with no server (guarded, and not on the builtin voice path);
`parec` capture (gated); arm/DDS init (degrades, catches); config path resolution
(robust fallback chain); `teaching.py` tempfile (writes under the `./data` mount).