# Sanad Package 2 — Premium Communication — PLAN (locked scope) > **Status: IMPLEMENTED (self-contained) — built 2026-06-22, pending Docker build > + on-robot test.** Vendored from SanadV3, mirrors P1. Dashboard port **:8012**. > Structural validation passed (compile, import resolution, shim-symbol coverage, > namespace bootstrap, `license_check P2` entitled, YAML). > > **Refinement (2026-06):** the engine is now vendored from **SanadV3**, not > plain Sanad. SanadV3 already implements the mask/face subsystem and the > evolved voice/audio/arm engine P2 needs — so P2 *vendors and wires* it rather > than building the mask from scratch. See "Why vendor SanadV3". P2 is a **superset of P1**: everything P1 does, plus multilingual auto-detect, voice-commanded **arm gestures**, gestures while speaking, and a lip-syncing LED face on the BLE "Shining Mask". ## Locked scope (decisions taken) - **Motion = arm gestures only (first pass).** Voice-command **locomotion** (robot walking/turning) is **DEFERRED** to a later pass — `voice/movement_dispatch.py` + `G1_Controller/loco_controller.py` and SanadV3's `dashboard/routes/_arbiter.py` (loco/nav leg arbitration) are intentionally **out of scope here**. - **Mask = included, BLE driven from INSIDE the P2 container** (vendor the Mask controller; `bleak` + host BlueZ/D-Bus). No separate `sanad-mask` side-car. - **Single self-contained container** (no `Sanad_Core`, no `sanad-base`) — like P1. - **Keyless** ship; customer supplies their own Gemini key. ## Why vendor SanadV3 (not plain Sanad) SanadV3 is **not a fork** of Sanad — it is plain Sanad **plus exactly the subsystems P2 needs**, reachable through the identical `Project.Sanad.*` import surface the dashboard routes already use. Verified in the tree: - `face/mask_face.py` → `FaceController` (BLE asyncio-loop thread + reconnect supervisor + `set_speaking`/`set_mouth` lip-sync inputs). Imports only `Project.Sanad.config.BASE_DIR`, `core.config_loader`, `core.logger`, then the flat Mask lib via `sys.path.insert(mask_dir)`. - `face/face_motion.py` → `LifelikeFace` (saccades, varied blinks, idle/listening/thinking/speaking states, timed reactions, smooth lip-sync), with automatic fallback to the flat lib's `FaceAnimator`. - `config/mask_config.json` → already env-driven (`SANAD_MASK_DIR`/`ADDRESS`/`NAME_PREFIX`/`ADAPTER`), `brightness`/`fps`/ `lifelike`/`autostart`, persisted face colors. - `dashboard/routes/mask.py` → a **failure-safe** `/api/mask` router; every handler is `asyncio.to_thread`-wrapped and maps errors to 503/409/500 so a missing mask never crashes the dashboard. - Lip-sync chain → `gemini/script.py` emits `[[MOUTH:n]]` (0–3) RMS markers; `gemini/subprocess.py` exposes `register_mouth_callback`; `core/brain.py` `set_gestural_speaking` emits `brain.gestural_speaking_changed`. - Evolved voice/audio → `voice/live_voice_loop.py` (position-based dedup), `voice/text_utils.py` (Arabic normalize + `maybe_trigger_arm`), `voice/audio_manager.py` (per-instance + TTL throttle) — supersets of plain Sanad's race-prone equivalents. Building the mask from plain Sanad (the prior plan) would mean **re-deriving all of the above by hand** and inheriting plain Sanad's known bugs (content-based dedup, no `[[MOUTH:n]]`, non-atomic `index.json` writes). The one thing SanadV3 does **not** solve is BLE-inside-a-container (it ran the mask on the host `g1_env`) — but that is a Docker capability/mount problem P2 owns regardless of base engine. **Correction to the prior plan:** the lip-sync source of truth is **Gemini's `[[MOUTH:n]]` markers over the event bus**, *not* a raw audio-amplitude tap, and the driver is **`FaceController` + `LifelikeFace`**, *not* `Mask/talking.py TalkingFace`. P2 adopts the marker path. ## Architecture One self-contained container that **owns all the hardware it needs directly**: the G1 DDS link, `rt/arm_sdk` (arm), chest **or** USB/Anker audio, and the BLE mask. It runs the premium slice of the vendored **SanadV3** engine. P2 is a **containerization wrapper** (like P1), not a fork: it 1. bootstraps the `Project.Sanad` namespace (deployed layout, P1's mechanism), 2. constructs ONLY the P2 superset subsystems, 3. injects a P2-scoped `Project.Sanad.main` shim exposing the singletons the routers import lazily (`mask_face`, `brain`, `live_voice`, …), 4. mounts the P1 + premium routers + the logs websocket, 5. serves the real SanadV3 SPA with non-P2 tabs hidden, 6. runs uvicorn on **:8012**. Why no `Sanad_Core`/`hwbroker`/`sanad-mask` split: that split exists only to stop **multiple** package containers fighting over the one device set. A customer who buys **just P2** has a single owner — no contention — so everything folds into one container (exactly like P1 standalone, and like the original Sanad monolith which did comms + arm + mask in one process). The ZMQ bus seam stays available for a future P1+P2+P3 fleet SKU but is **not** part of standalone P2. ## Namespace bootstrap (reuse P1's, not SanadV3's self-alias) P2 reuses **P1's exact bootstrap** (`app_p1.py` lines 30–46): synthesize a `Project` namespace package and alias `Project.Sanad` → the vendored `Sanad` module, then inject a `Project.Sanad.main` shim holding the P2 singletons. The mask + voice routes resolve their singletons via lazy `from Project.Sanad.main import mask_face` inside handlers — so the shim must define them. (SanadV3's own self-alias-by-folder-name is skipped automatically — the vendored tree is named `Sanad`, so `main.py`'s `if _THIS_DIR.name != 'Sanad'` branch never fires — and the wrapper-with-shim is what P1 ships and what the routes expect.) ## Features / capabilities ### Inherited from P1 (superset — same code, premium flags on) Hands-free Gemini conversation · persona editor (who/tone/language) · **keyless** Gemini key (customer adds own) · chest **or** USB/Anker audio (selectable, hot-swap) · typed-replay / "say a line" · live logs + download · offline license gate. **Same audio (mic+speaker) mechanism as P1**, via the evolved per-instance `voice/audio_manager.py` + `voice/audio_devices.py`. ### New in P2 (this pass) 1. **Multilingual auto-detect** — Gemini natively detects the visitor's language (Arabic Gulf/English) and replies in kind, via the bilingual system prompt in `gemini/script.py` / `voice/sanad_voice.py`. No per-user flag. 2. **Voice-command arm gestures** — `voice/live_voice_loop.py`: USER speech → arm actions via `scripts/sanad_arm.txt` (**23 actions**, non-contiguous ids `{0–15, 23–28, 30}`; the file's nominal range is 0–28; hundreds of Arabic/EN phrase variants) → `sanad_arm_controller.ARM.trigger_action_by_id()`. **Instant** or **deferred** mode (0.65 s fallback so a silent user still fires). Master **trigger-enabled** gate (default **OFF**). Position-based dedup (`_last_snapshot` + `_trigger_lock`). 3. **Gestures while speaking** — `core/brain.set_gestural_speaking` → `brain.gestural_speaking_changed` → `mask_face.set_speaking(True)` (mouth + any gestural motion animate together while Gemini talks). 4. **Wake-phrase management** — phrase→action CRUD (`voice/wake_phrase_manager.py`, persisted to `data/wake_phrases.json`), folded into the live loop at runtime via `_merge_wake_phrases`. 5. **Skills registry** — skill CRUD, execute, upload-audio (`dashboard/routes/skills.py`). 6. **Lip-sync on the LED "Shining Mask"** — vendored `face/mask_face.py` `FaceController` + `face/face_motion.py` `LifelikeFace`: - mouth driven by Gemini `[[MOUTH:n]]` (0–3) markers → `gemini/subprocess.register_mouth_callback` → `mask_face.set_mouth(level)`; - `set_speaking(on)` from the gestural-speaking event for auto-talk; - state-aware idle/listening/thinking + timed reactions (smile/surprised/sad); - falls back to the flat lib's `FaceAnimator` if `LifelikeFace`/Pillow/bleak unavailable (`lifelike=false` in `mask_config.json`). ### Deferred (NOT in this pass) - **Voice-command locomotion** (`voice/movement_dispatch.py` + `loco_controller`): Gemini's spoken confirmation → discrete bounded steps, `movement_enabled` gate, "stop" = E-STOP. Adds a walking-on-voice safety surface + on-robot calibration — staged separately. SanadV3's `dashboard/routes/_arbiter.py` (Nav2 ↔ LocoController leg arbitration) belongs to this pass, **not** P2's arm-only pass. - Multi-package fleet via `Sanad_Core` (hwbroker/busd/shared `sanad-mask`). ## Dashboard (:8012) = all P1 tabs **+** (all routes VENDORED from SanadV3) - **Voice** — adds the multilingual auto-detect toggle + per-language voice config. - **Live-voice (commands)** — `dashboard/routes/live_voice.py`: start/stop, deferred-mode toggle, **trigger-enabled** master gate, status, trigger history. - **Wake-phrases** — phrase→action CRUD (AR dialects + EN). - **Motion / Gestural** — `dashboard/routes/motion.py`: gestural-speaking toggle, trigger / cancel arm actions. *(Arm only — loco controls present in the route but unwired this pass.)* - **Skills** — `dashboard/routes/skills.py`. - **Mask / Lip-sync** — the existing SanadV3 SPA **Mask Face** tab + the vendored `dashboard/routes/mask.py` (`/api/mask/*`): connect/disconnect, brightness, face start/stop/return/color, speaking toggle, mouth slider, expressions, text/image/animation overrides, status. **Mounted as-is — not authored.** - **Logs**. Non-P2 tabs (recognition, temp/3D, controller, navigation, terminal) are hidden the same way P1 hides its non-P1 set. ## What it vendors / reuses (self-contained, like P1) - `vendor/Sanad` — the **SanadV3** engine tree (rsync-excluding `data/`, `Logs/`, `__pycache__/`, `tests/`, `static/temp3d/`). Includes `face/`, evolved `voice/`, evolved `gemini/`, `motion/`, the mask + live-voice + motion + skills routes, and the SPA. *(Locomotion modules vendored but left unwired this pass.)* - `vendor/sanad_pkg` — IPC bus shim + offline license verification lib (P1's set). - `vendor/mask` — the flat `shiningmask` library copied from `Project/Mask` (`mask.py`, `faceanim.py`, `colorface.py`, `constants.py`, `protocol.py`, `transport.py`, `bitmap.py`, `NotoSans-Regular.ttf`, …). `Project/Mask` uses **flat imports** (`import faces`, `import mask`), so it goes on its **own** path (`SANAD_MASK_DIR=/app/mask`, also on `PYTHONPATH`) — **NOT** under `Sanad/`, to avoid collisions. `face/mask_face.py` and `face/face_motion.py` both `sys.path.insert(mask_dir)` and `import mask / faceanim / colorface`. - A `sync_vendor.sh` that refreshes **both** `vendor/Sanad` (from SanadV3) and `vendor/mask` (from `Project/Mask`), and blanks any baked Gemini key. ## Wiring the lip-sync + gestures (in `app_p2.py` / the `Project.Sanad.main` shim) 1. Construct P1 comms singletons (brain, audio_mgr, voice_client, `GeminiSubprocess`) exactly as P1 does. 2. Construct premium singletons: `FaceController()` (mask_face), `LiveVoiceLoop(...)`, `WakePhraseManager()`, arm controllers, skills. 3. `brain.attach_live_voice(live_voice)`; **wire the arm⇄locomotion motion-block predicate exactly as SanadV3's `main.py` does** (`arm_controller.set_motion_block(...)`). This is load-bearing safety, not optional: **two `rt/arm_sdk` publishers coexist in-process** — `motion/arm_controller.py` (publisher ~line 237) and `motion/sanad_arm_controller.py` (~line 176). They stay collision-free ONLY via (a) that `set_motion_block` interlock on `ArmController` and (b) `sanad_arm_controller`'s `_is_busy`/`_busy_lock` atomic guard. A boot-time 'sole writer' assertion only covers cross-*container* contention — it does **not** replace this intra-process interlock, which `app_p2.py` must reproduce. 4. **Lip-sync:** `gemini_subprocess.register_mouth_callback(mask_face.set_mouth)`. 5. **Gestures-while-speaking:** subscribe `brain.gestural_speaking_changed` → `mask_face.set_speaking(on)`. The event bus is **synchronous** (`core/event_bus` `.on`/`.emit_sync`), so `set_speaking` runs on the caller's thread — keep it non-blocking (it only flips a flag the BLE loop reads). 6. **Lifelike state:** wire voice events (connected→`set_listening`, user_said→`set_thinking`, disconnected→`set_idle`, `voice.error`/`motion.action_error`→`react('sad')`, `skill.finished`→`react('smile')`). **SanadV3's `main.py` lines ~360–427 are the concrete reference implementation** — copy that wiring (incl. `register_mouth_callback` ~383–391 and `mask_face.shutdown()` ~587–591) rather than re-deriving it. 7. Expose `mask_face`, `brain`, `live_voice` on the `Project.Sanad.main` shim so the lazy route accessors resolve them. 8. On shutdown: `mask_face.shutdown()` (BLE disconnect + stop loop) — handle SIGTERM so the container exits cleanly. ## Container & hardware - `FROM python:3.10-slim`, `WITH_UNITREE_SDK=1` (builds — the cyclonedds `idlc` fix gives arm + chest audio). - **System deps (added over P1):** `bluez`, `libdbus-1-3`/`libdbus-1-dev`, `libglib2.0-0` for BlueZ/D-Bus; Pillow needs no extra apt on slim. - **Python deps:** P1's set **+** `bleak==0.22.3` (**pinned** — bleak 3.x throws `KeyError 'Roles'` on the Jetson's BlueZ 5.53 and every connect fails) **+** `Pillow` (LifelikeFace frame rendering). - **BLE for the mask (in-container):** mount `/var/run/dbus`, `--cap-add NET_ADMIN`, `/dev/bus/usb`; `network_mode: host`. Free the mask from the phone app before connecting. Set `SANAD_MASK_DIR=/app/mask`. - `/dev/snd` (audio), license mount, **writable** `./data` + `./config` mounts (mask color persistence), `restart: unless-stopped`. - Port **:8012**. Ships **keyless** (`strip_key.py` blanks any baked key). - **License features:** `multilingual`, `voice_command_motion` (arm gestures), `lipsync`, `mask`. Entrypoint checks entitlement **P2**. *(A future `voice_command_locomotion`/`navigation` feature gates the deferred walking.)* ## Package layout (to build later — mirrors P1) ``` Sanad_Package_2/ app_p2.py routes_p2.py entrypoint.sh strip_key.py p2ctl.sh config/p2_config.json static/ Dockerfile docker-compose.yml requirements.txt vendor/Sanad (from SanadV3) vendor/sanad_pkg vendor/mask (from Project/Mask — own PYTHONPATH) license/(pubkey + example) data/(seed incl. wake_phrases.json) sync_vendor.sh README.md PLAN.md NEW_ROBOT_SETUP.md ``` ## Build sequence (when implemented) 1. Vendor the **SanadV3** engine + `sanad_pkg` + the flat `Project/Mask` → `vendor/mask`; merge requirements (P1 deps + `bleak==0.22.3` + `Pillow`). 2. Self-contained Dockerfile (P1's + BlueZ/D-Bus system deps; `COPY vendor/mask /app/mask`; `ENV SANAD_MASK_DIR=/app/mask`). 3. `app_p2.py` — P1's namespace bootstrap + `Project.Sanad.main` shim; construct P1 comms **+** `FaceController` + `LiveVoiceLoop` + wake-phrase mgr + skills; wire `register_mouth_callback`→`set_mouth` and `gestural_speaking_changed`→`set_speaking` + lifelike state hooks; mount P1 + premium routers (incl. the vendored `mask.py`); serve :8012. 4. Mask: `FaceController` already runs its own BLE asyncio loop + reconnect supervisor; with `autostart=true` it connects + uploads frames (~30–90 s one-time, persists on flash) in the background — never blocks boot. 5. Multilingual: ship SanadV3's bilingual system prompt (rename persona as decided); enable per-language voice config in the Voice tab. 6. License gate `P2`; keyless; smoke test (P1 endpoints + live-voice + wake + skills + `/api/mask/status` → connect → face/start → mouth slider); validate on the robot. ## Risks / mitigations - **BLE-from-container is fiddly** — prototype the mask connect early on the target Jetson (BlueZ 5.53) with `bleak==0.22.3`; if in-container BLE misbehaves, fall back to the mask controller on the **host** with a tiny socket shim (contingency, not the plan). `mask.py` route already degrades to 503 if the subsystem is down, so the dashboard never crashes. - **Mask flat imports** — vendor onto `/app/mask` via `SANAD_MASK_DIR`; never place under `Sanad/`. - **LifelikeFace deps** — needs Pillow + bleak; if either is missing the subsystem auto-falls-back to `FaceAnimator` (`lifelike=false`) and reports the reason in the Mask tab. The rest of P2 is unaffected. - **Lip-sync chain spans the voice subprocess** — `gemini/subprocess.py` spawns `voice/sanad_voice.py` (the child, `SANAD_VOICE_BRAIN=gemini`), which is an **orchestrator** that in turn runs `gemini/script.py` — the actual `GeminiBrain`. The `[[MOUTH:n]]` markers are emitted in **`gemini/script.py` (~lines 563/578)**, not `sanad_voice.py`. Vendor the full **`script.py → sanad_voice.py → subprocess.py`** chain intact and register the parent callback (`GeminiSubprocess.register_mouth_callback`) in the shim. P1 already vendors `gemini/subprocess.py`, so this is additive wiring, not a new vendor. - **Arm safety (two publishers + interlock)** — `trigger_enabled` defaults **OFF**; arm actions are bounded. The single container is the sole *container* writing `rt/arm_sdk`, but **two in-process publishers** (`arm_controller.py` + `sanad_arm_controller.py`) coexist — collision-free ONLY via the `set_motion_block` interlock + `_is_busy` busy-lock that `app_p2.py` must wire (Wiring step 3). Assert sole-container-writer at boot **and** reproduce the interlock. - **Arm⇄locomotion interlock is a no-op in this arm-only build** — `sanad_arm_controller.trigger_action_by_id()` calls `_blocked()` (refuses arm motion while the robot may be walking). With locomotion **deferred**, nothing publishes the locomotion-state signals `_blocked()` reads, so it always permits arm motion — safe by omission **now**, but it **must be re-armed before the deferred voice-command-locomotion pass ships**, else voice-walking would enable with a silently-disabled arm interlock. - **Mask hardware presence** — lip-sync needs the physical BLE mask paired/in range and freed from the phone app. - **Locomotion creep** — `movement_dispatch.py` + `_arbiter.py` are present in the vendored tree; keep them **unwired** this pass to avoid accidentally shipping voice-command walking without the safety surface + calibration. ## Open decisions (resolve before/while building) 1. **In-container BLE vs host side-car** — confirm `bleak`+BlueZ/D-Bus actually connects from inside `python:3.10-slim` with `/var/run/dbus` + `NET_ADMIN` + `network_mode: host` on the target Jetson (BlueZ 5.53). Pin **`bleak==0.22.3`** (3.x throws `KeyError 'Roles'` on BlueZ 5.53 — every connect fails). If flaky, fall back to a host-side mask controller + tiny socket shim (contingency). 2. **LifelikeFace default** — SanadV3 defaults `lifelike=true` (needs `face/face_motion.py` + Pillow + bleak). Confirm Pillow+bleak install cleanly in the slim image; otherwise `mask_config.json lifelike=false` auto-falls back to `FaceAnimator`. 3. **Gemini lip-sync chain vendoring** — vendor the evolved `gemini/script.py` (emits `[[MOUTH:n]]`) **and** `gemini/subprocess.py` (`register_mouth_callback`), and wire `register_mouth_callback → mask_face.set_mouth` in the shim. P1 already vendors `subprocess.py`, so this is additive. 4. **Persona / robot name** — SanadV3's fallback prompt identifies the robot as **"Marcus"** (bilingual Gulf-Arabic/English auto-detect — this prompt *is* the multilingual engine). Decide P2's shipped persona name/dialect in `scripts/sanad_script.txt` and whether to keep the bilingual prompt verbatim. 5. **First-boot frame-upload latency** — the one-time DIY frame upload (~30–90 s, persists on mask flash). Decide `autostart=true` (background, non-blocking) vs gating `face/start` behind an explicit dashboard action with a progress indicator. 6. **Vendor `movement_dispatch.py` now (unwired) or omit** — keep for forward-compat with the deferred locomotion pass, or drop it to keep the arm-only image lean. 7. **Writable mounts for persisted face colors** — `/face/color` persists eye/ mouth/sclera colors to `mask_config.json`; ensure `config/` (or the `data/` seed) is on a writable volume so colors survive container restarts. > **Carried-forward safety gate:** before the deferred *voice-command-locomotion* > pass ships, re-arm the `_blocked()` arm⇄locomotion interlock (a no-op today) and > revisit `_arbiter.py` (Nav2 ↔ LocoController leg arbitration). ## Container-runtime audit (2026-06-23) A 3-reader audit of the vendored runtime paths (voice/audio, mask/face, arm+dashboard) for in-container failure modes. **Verdict: container-safe on the G1 after the `iproute2` fix** — no remaining crash-level landmine on the builtin+gemini+host-net path. - **FIXED (crash, P1+P2):** chest-mic `voice/audio_io.py:_find_g1_local_ip()` shells out to `ip` — added **`iproute2`** to both Dockerfiles. (This was the live-voice crash we hit.) - **Mitigated (crash, off-robot only):** `BuiltinMic.start()` calls `_find_g1_local_ip()` **unguarded**; off a G1 (no `192.168.123.x` net) it raises and kills the voice subprocess. On the G1 (network_mode host) the interface exists → fine. Package-level safety valve: **`SANAD_AUDIO_PROFILE=plugged` on any non-G1 host** (documented in compose + `.env.example`). Not fixed in the engine (shared with SanadV3). - **FIXED (degraded, persistence):** mask face colors are written to `config/mask_config.json` (a baked layer, lost on recreate) → added a pre-seeded single-file mount `./config/mask_config.json:/app/Sanad/config/mask_config.json`. - **Deploy-side (degraded, P2):** mask BLE cold-start can stall ~45 s (3×15 s scan) if host `bluetoothd` is down / adapter missing — retry-bounded, supervised, background thread; does **not** crash. Ensure `bluetoothd` is up; leave `SANAD_MASK_ADDRESS` empty for auto-detect. Verified-safe (NO action, to avoid over-fixing): `~/logs` FileHandler (root-writable, no crash); `NotoSans-Regular.ttf` present + COPYed; `local_tts` torch never imported (gemini brain); `pactl` with no server (guarded, and not on the builtin voice path); `parec` capture (gated); arm/DDS init (degrades, catches); config path resolution (robust fallback chain); `teaching.py` tempfile (writes under the `./data` mount).