22 KiB
Raw Permalink Blame History

Sanad Package 2 — Premium Communication — PLAN (locked scope)

**Status: IMPLEMENTED (self-contained) — built 2026-06-22, pending Docker build

  • on-robot test.** Vendored from SanadV3, mirrors P1. Dashboard port :8012. Structural validation passed (compile, import resolution, shim-symbol coverage, namespace bootstrap, license_check P2 entitled, YAML).

Refinement (2026-06): the engine is now vendored from SanadV3, not plain Sanad. SanadV3 already implements the mask/face subsystem and the evolved voice/audio/arm engine P2 needs — so P2 vendors and wires it rather than building the mask from scratch. See "Why vendor SanadV3".

P2 is a superset of P1: everything P1 does, plus multilingual auto-detect, voice-commanded arm gestures, gestures while speaking, and a lip-syncing LED face on the BLE "Shining Mask".

Locked scope (decisions taken)

  • Motion = arm gestures only (first pass). Voice-command locomotion (robot walking/turning) is DEFERRED to a later pass — voice/movement_dispatch.py
    • G1_Controller/loco_controller.py and SanadV3's dashboard/routes/_arbiter.py (loco/nav leg arbitration) are intentionally out of scope here.
  • Mask = included, BLE driven from INSIDE the P2 container (vendor the Mask controller; bleak + host BlueZ/D-Bus). No separate sanad-mask side-car.
  • Single self-contained container (no Sanad_Core, no sanad-base) — like P1.
  • Keyless ship; customer supplies their own Gemini key.

Why vendor SanadV3 (not plain Sanad)

SanadV3 is not a fork of Sanad — it is plain Sanad plus exactly the subsystems P2 needs, reachable through the identical Project.Sanad.* import surface the dashboard routes already use. Verified in the tree:

  • face/mask_face.pyFaceController (BLE asyncio-loop thread + reconnect supervisor + set_speaking/set_mouth lip-sync inputs). Imports only Project.Sanad.config.BASE_DIR, core.config_loader, core.logger, then the flat Mask lib via sys.path.insert(mask_dir).
  • face/face_motion.pyLifelikeFace (saccades, varied blinks, idle/listening/thinking/speaking states, timed reactions, smooth lip-sync), with automatic fallback to the flat lib's FaceAnimator.
  • config/mask_config.json → already env-driven (SANAD_MASK_DIR/ADDRESS/NAME_PREFIX/ADAPTER), brightness/fps/ lifelike/autostart, persisted face colors.
  • dashboard/routes/mask.py → a failure-safe /api/mask router; every handler is asyncio.to_thread-wrapped and maps errors to 503/409/500 so a missing mask never crashes the dashboard.
  • Lip-sync chain → gemini/script.py emits [[MOUTH:n]] (03) RMS markers; gemini/subprocess.py exposes register_mouth_callback; core/brain.py set_gestural_speaking emits brain.gestural_speaking_changed.
  • Evolved voice/audio → voice/live_voice_loop.py (position-based dedup), voice/text_utils.py (Arabic normalize + maybe_trigger_arm), voice/audio_manager.py (per-instance + TTL throttle) — supersets of plain Sanad's race-prone equivalents.

Building the mask from plain Sanad (the prior plan) would mean re-deriving all of the above by hand and inheriting plain Sanad's known bugs (content-based dedup, no [[MOUTH:n]], non-atomic index.json writes). The one thing SanadV3 does not solve is BLE-inside-a-container (it ran the mask on the host g1_env) — but that is a Docker capability/mount problem P2 owns regardless of base engine.

Correction to the prior plan: the lip-sync source of truth is Gemini's [[MOUTH:n]] markers over the event bus, not a raw audio-amplitude tap, and the driver is FaceController + LifelikeFace, not Mask/talking.py TalkingFace. P2 adopts the marker path.

Architecture

One self-contained container that owns all the hardware it needs directly: the G1 DDS link, rt/arm_sdk (arm), chest or USB/Anker audio, and the BLE mask. It runs the premium slice of the vendored SanadV3 engine.

P2 is a containerization wrapper (like P1), not a fork: it

  1. bootstraps the Project.Sanad namespace (deployed layout, P1's mechanism),
  2. constructs ONLY the P2 superset subsystems,
  3. injects a P2-scoped Project.Sanad.main shim exposing the singletons the routers import lazily (mask_face, brain, live_voice, …),
  4. mounts the P1 + premium routers + the logs websocket,
  5. serves the real SanadV3 SPA with non-P2 tabs hidden,
  6. runs uvicorn on :8012.

Why no Sanad_Core/hwbroker/sanad-mask split: that split exists only to stop multiple package containers fighting over the one device set. A customer who buys just P2 has a single owner — no contention — so everything folds into one container (exactly like P1 standalone, and like the original Sanad monolith which did comms + arm + mask in one process). The ZMQ bus seam stays available for a future P1+P2+P3 fleet SKU but is not part of standalone P2.

Namespace bootstrap (reuse P1's, not SanadV3's self-alias)

P2 reuses P1's exact bootstrap (app_p1.py lines 3046): synthesize a Project namespace package and alias Project.Sanad → the vendored Sanad module, then inject a Project.Sanad.main shim holding the P2 singletons. The mask + voice routes resolve their singletons via lazy from Project.Sanad.main import mask_face inside handlers — so the shim must define them. (SanadV3's own self-alias-by-folder-name is skipped automatically — the vendored tree is named Sanad, so main.py's if _THIS_DIR.name != 'Sanad' branch never fires — and the wrapper-with-shim is what P1 ships and what the routes expect.)

Features / capabilities

Inherited from P1 (superset — same code, premium flags on)

Hands-free Gemini conversation · persona editor (who/tone/language) · keyless Gemini key (customer adds own) · chest or USB/Anker audio (selectable, hot-swap) · typed-replay / "say a line" · live logs + download · offline license gate. Same audio (mic+speaker) mechanism as P1, via the evolved per-instance voice/audio_manager.py + voice/audio_devices.py.

New in P2 (this pass)

  1. Multilingual auto-detect — Gemini natively detects the visitor's language (Arabic Gulf/English) and replies in kind, via the bilingual system prompt in gemini/script.py / voice/sanad_voice.py. No per-user flag.
  2. Voice-command arm gesturesvoice/live_voice_loop.py: USER speech → arm actions via scripts/sanad_arm.txt (23 actions, non-contiguous ids {015, 2328, 30}; the file's nominal range is 028; hundreds of Arabic/EN phrase variants) → sanad_arm_controller.ARM.trigger_action_by_id(). Instant or deferred mode (0.65 s fallback so a silent user still fires). Master trigger-enabled gate (default OFF). Position-based dedup (_last_snapshot + _trigger_lock).
  3. Gestures while speakingcore/brain.set_gestural_speakingbrain.gestural_speaking_changedmask_face.set_speaking(True) (mouth + any gestural motion animate together while Gemini talks).
  4. Wake-phrase management — phrase→action CRUD (voice/wake_phrase_manager.py, persisted to data/wake_phrases.json), folded into the live loop at runtime via _merge_wake_phrases.
  5. Skills registry — skill CRUD, execute, upload-audio (dashboard/routes/skills.py).
  6. Lip-sync on the LED "Shining Mask" — vendored face/mask_face.py FaceController + face/face_motion.py LifelikeFace:
    • mouth driven by Gemini [[MOUTH:n]] (03) markers → gemini/subprocess.register_mouth_callbackmask_face.set_mouth(level);
    • set_speaking(on) from the gestural-speaking event for auto-talk;
    • state-aware idle/listening/thinking + timed reactions (smile/surprised/sad);
    • falls back to the flat lib's FaceAnimator if LifelikeFace/Pillow/bleak unavailable (lifelike=false in mask_config.json).

Deferred (NOT in this pass)

  • Voice-command locomotion (voice/movement_dispatch.py + loco_controller): Gemini's spoken confirmation → discrete bounded steps, movement_enabled gate, "stop" = E-STOP. Adds a walking-on-voice safety surface + on-robot calibration — staged separately. SanadV3's dashboard/routes/_arbiter.py (Nav2 ↔ LocoController leg arbitration) belongs to this pass, not P2's arm-only pass.
  • Multi-package fleet via Sanad_Core (hwbroker/busd/shared sanad-mask).

Dashboard (:8012) = all P1 tabs + (all routes VENDORED from SanadV3)

  • Voice — adds the multilingual auto-detect toggle + per-language voice config.
  • Live-voice (commands)dashboard/routes/live_voice.py: start/stop, deferred-mode toggle, trigger-enabled master gate, status, trigger history.
  • Wake-phrases — phrase→action CRUD (AR dialects + EN).
  • Motion / Gesturaldashboard/routes/motion.py: gestural-speaking toggle, trigger / cancel arm actions. (Arm only — loco controls present in the route but unwired this pass.)
  • Skillsdashboard/routes/skills.py.
  • Mask / Lip-sync — the existing SanadV3 SPA Mask Face tab + the vendored dashboard/routes/mask.py (/api/mask/*): connect/disconnect, brightness, face start/stop/return/color, speaking toggle, mouth slider, expressions, text/image/animation overrides, status. Mounted as-is — not authored.
  • Logs.

Non-P2 tabs (recognition, temp/3D, controller, navigation, terminal) are hidden the same way P1 hides its non-P1 set.

What it vendors / reuses (self-contained, like P1)

  • vendor/Sanad — the SanadV3 engine tree (rsync-excluding data/, Logs/, __pycache__/, tests/, static/temp3d/). Includes face/, evolved voice/, evolved gemini/, motion/, the mask + live-voice
    • motion + skills routes, and the SPA. (Locomotion modules vendored but left unwired this pass.)
  • vendor/sanad_pkg — IPC bus shim + offline license verification lib (P1's set).
  • vendor/mask — the flat shiningmask library copied from Project/Mask (mask.py, faceanim.py, colorface.py, constants.py, protocol.py, transport.py, bitmap.py, NotoSans-Regular.ttf, …). Project/Mask uses flat imports (import faces, import mask), so it goes on its own path (SANAD_MASK_DIR=/app/mask, also on PYTHONPATH) — NOT under Sanad/, to avoid collisions. face/mask_face.py and face/face_motion.py both sys.path.insert(mask_dir) and import mask / faceanim / colorface.
  • A sync_vendor.sh that refreshes both vendor/Sanad (from SanadV3) and vendor/mask (from Project/Mask), and blanks any baked Gemini key.

Wiring the lip-sync + gestures (in app_p2.py / the Project.Sanad.main shim)

  1. Construct P1 comms singletons (brain, audio_mgr, voice_client, GeminiSubprocess) exactly as P1 does.
  2. Construct premium singletons: FaceController() (mask_face), LiveVoiceLoop(...), WakePhraseManager(), arm controllers, skills.
  3. brain.attach_live_voice(live_voice); wire the arm⇄locomotion motion-block predicate exactly as SanadV3's main.py does (arm_controller.set_motion_block(...)). This is load-bearing safety, not optional: two rt/arm_sdk publishers coexist in-processmotion/arm_controller.py (publisher ~line 237) and motion/sanad_arm_controller.py (~line 176). They stay collision-free ONLY via (a) that set_motion_block interlock on ArmController and (b) sanad_arm_controller's _is_busy/_busy_lock atomic guard. A boot-time 'sole writer' assertion only covers cross-container contention — it does not replace this intra-process interlock, which app_p2.py must reproduce.
  4. Lip-sync: gemini_subprocess.register_mouth_callback(mask_face.set_mouth).
  5. Gestures-while-speaking: subscribe brain.gestural_speaking_changedmask_face.set_speaking(on). The event bus is synchronous (core/event_bus .on/.emit_sync), so set_speaking runs on the caller's thread — keep it non-blocking (it only flips a flag the BLE loop reads).
  6. Lifelike state: wire voice events (connected→set_listening, user_said→set_thinking, disconnected→set_idle, voice.error/motion.action_errorreact('sad'), skill.finishedreact('smile')). SanadV3's main.py lines ~360427 are the concrete reference implementation — copy that wiring (incl. register_mouth_callback ~383391 and mask_face.shutdown() ~587591) rather than re-deriving it.
  7. Expose mask_face, brain, live_voice on the Project.Sanad.main shim so the lazy route accessors resolve them.
  8. On shutdown: mask_face.shutdown() (BLE disconnect + stop loop) — handle SIGTERM so the container exits cleanly.

Container & hardware

  • FROM python:3.10-slim, WITH_UNITREE_SDK=1 (builds — the cyclonedds idlc fix gives arm + chest audio).
  • System deps (added over P1): bluez, libdbus-1-3/libdbus-1-dev, libglib2.0-0 for BlueZ/D-Bus; Pillow needs no extra apt on slim.
  • Python deps: P1's set + bleak==0.22.3 (pinned — bleak 3.x throws KeyError 'Roles' on the Jetson's BlueZ 5.53 and every connect fails) + Pillow (LifelikeFace frame rendering).
  • BLE for the mask (in-container): mount /var/run/dbus, --cap-add NET_ADMIN, /dev/bus/usb; network_mode: host. Free the mask from the phone app before connecting. Set SANAD_MASK_DIR=/app/mask.
  • /dev/snd (audio), license mount, writable ./data + ./config mounts (mask color persistence), restart: unless-stopped.
  • Port :8012. Ships keyless (strip_key.py blanks any baked key).
  • License features: multilingual, voice_command_motion (arm gestures), lipsync, mask. Entrypoint checks entitlement P2. (A future voice_command_locomotion/navigation feature gates the deferred walking.)

Package layout (to build later — mirrors P1)

Sanad_Package_2/
  app_p2.py  routes_p2.py  entrypoint.sh  strip_key.py  p2ctl.sh
  config/p2_config.json  static/  Dockerfile  docker-compose.yml  requirements.txt
  vendor/Sanad   (from SanadV3)
  vendor/sanad_pkg
  vendor/mask    (from Project/Mask — own PYTHONPATH)
  license/(pubkey + example)  data/(seed incl. wake_phrases.json)  sync_vendor.sh
  README.md  PLAN.md  NEW_ROBOT_SETUP.md

Build sequence (when implemented)

  1. Vendor the SanadV3 engine + sanad_pkg + the flat Project/Maskvendor/mask; merge requirements (P1 deps + bleak==0.22.3 + Pillow).
  2. Self-contained Dockerfile (P1's + BlueZ/D-Bus system deps; COPY vendor/mask /app/mask; ENV SANAD_MASK_DIR=/app/mask).
  3. app_p2.py — P1's namespace bootstrap + Project.Sanad.main shim; construct P1 comms + FaceController + LiveVoiceLoop + wake-phrase mgr + skills; wire register_mouth_callbackset_mouth and gestural_speaking_changedset_speaking + lifelike state hooks; mount P1 + premium routers (incl. the vendored mask.py); serve :8012.
  4. Mask: FaceController already runs its own BLE asyncio loop + reconnect supervisor; with autostart=true it connects + uploads frames (~3090 s one-time, persists on flash) in the background — never blocks boot.
  5. Multilingual: ship SanadV3's bilingual system prompt (rename persona as decided); enable per-language voice config in the Voice tab.
  6. License gate P2; keyless; smoke test (P1 endpoints + live-voice + wake + skills + /api/mask/status → connect → face/start → mouth slider); validate on the robot.

Risks / mitigations

  • BLE-from-container is fiddly — prototype the mask connect early on the target Jetson (BlueZ 5.53) with bleak==0.22.3; if in-container BLE misbehaves, fall back to the mask controller on the host with a tiny socket shim (contingency, not the plan). mask.py route already degrades to 503 if the subsystem is down, so the dashboard never crashes.
  • Mask flat imports — vendor onto /app/mask via SANAD_MASK_DIR; never place under Sanad/.
  • LifelikeFace deps — needs Pillow + bleak; if either is missing the subsystem auto-falls-back to FaceAnimator (lifelike=false) and reports the reason in the Mask tab. The rest of P2 is unaffected.
  • Lip-sync chain spans the voice subprocessgemini/subprocess.py spawns voice/sanad_voice.py (the child, SANAD_VOICE_BRAIN=gemini), which is an orchestrator that in turn runs gemini/script.py — the actual GeminiBrain. The [[MOUTH:n]] markers are emitted in gemini/script.py (~lines 563/578), not sanad_voice.py. Vendor the full script.py → sanad_voice.py → subprocess.py chain intact and register the parent callback (GeminiSubprocess.register_mouth_callback) in the shim. P1 already vendors gemini/subprocess.py, so this is additive wiring, not a new vendor.
  • Arm safety (two publishers + interlock)trigger_enabled defaults OFF; arm actions are bounded. The single container is the sole container writing rt/arm_sdk, but two in-process publishers (arm_controller.py + sanad_arm_controller.py) coexist — collision-free ONLY via the set_motion_block interlock + _is_busy busy-lock that app_p2.py must wire (Wiring step 3). Assert sole-container-writer at boot and reproduce the interlock.
  • Arm⇄locomotion interlock is a no-op in this arm-only buildsanad_arm_controller.trigger_action_by_id() calls _blocked() (refuses arm motion while the robot may be walking). With locomotion deferred, nothing publishes the locomotion-state signals _blocked() reads, so it always permits arm motion — safe by omission now, but it must be re-armed before the deferred voice-command-locomotion pass ships, else voice-walking would enable with a silently-disabled arm interlock.
  • Mask hardware presence — lip-sync needs the physical BLE mask paired/in range and freed from the phone app.
  • Locomotion creepmovement_dispatch.py + _arbiter.py are present in the vendored tree; keep them unwired this pass to avoid accidentally shipping voice-command walking without the safety surface + calibration.

Open decisions (resolve before/while building)

  1. In-container BLE vs host side-car — confirm bleak+BlueZ/D-Bus actually connects from inside python:3.10-slim with /var/run/dbus + NET_ADMIN + network_mode: host on the target Jetson (BlueZ 5.53). Pin bleak==0.22.3 (3.x throws KeyError 'Roles' on BlueZ 5.53 — every connect fails). If flaky, fall back to a host-side mask controller + tiny socket shim (contingency).
  2. LifelikeFace default — SanadV3 defaults lifelike=true (needs face/face_motion.py + Pillow + bleak). Confirm Pillow+bleak install cleanly in the slim image; otherwise mask_config.json lifelike=false auto-falls back to FaceAnimator.
  3. Gemini lip-sync chain vendoring — vendor the evolved gemini/script.py (emits [[MOUTH:n]]) and gemini/subprocess.py (register_mouth_callback), and wire register_mouth_callback → mask_face.set_mouth in the shim. P1 already vendors subprocess.py, so this is additive.
  4. Persona / robot name — SanadV3's fallback prompt identifies the robot as "Marcus" (bilingual Gulf-Arabic/English auto-detect — this prompt is the multilingual engine). Decide P2's shipped persona name/dialect in scripts/sanad_script.txt and whether to keep the bilingual prompt verbatim.
  5. First-boot frame-upload latency — the one-time DIY frame upload (~3090 s, persists on mask flash). Decide autostart=true (background, non-blocking) vs gating face/start behind an explicit dashboard action with a progress indicator.
  6. Vendor movement_dispatch.py now (unwired) or omit — keep for forward-compat with the deferred locomotion pass, or drop it to keep the arm-only image lean.
  7. Writable mounts for persisted face colors/face/color persists eye/ mouth/sclera colors to mask_config.json; ensure config/ (or the data/ seed) is on a writable volume so colors survive container restarts.

Carried-forward safety gate: before the deferred voice-command-locomotion pass ships, re-arm the _blocked() arm⇄locomotion interlock (a no-op today) and revisit _arbiter.py (Nav2 ↔ LocoController leg arbitration).

Container-runtime audit (2026-06-23)

A 3-reader audit of the vendored runtime paths (voice/audio, mask/face, arm+dashboard) for in-container failure modes. Verdict: container-safe on the G1 after the iproute2 fix — no remaining crash-level landmine on the builtin+gemini+host-net path.

  • FIXED (crash, P1+P2): chest-mic voice/audio_io.py:_find_g1_local_ip() shells out to ip — added iproute2 to both Dockerfiles. (This was the live-voice crash we hit.)
  • Mitigated (crash, off-robot only): BuiltinMic.start() calls _find_g1_local_ip() unguarded; off a G1 (no 192.168.123.x net) it raises and kills the voice subprocess. On the G1 (network_mode host) the interface exists → fine. Package-level safety valve: SANAD_AUDIO_PROFILE=plugged on any non-G1 host (documented in compose + .env.example). Not fixed in the engine (shared with SanadV3).
  • FIXED (degraded, persistence): mask face colors are written to config/mask_config.json (a baked layer, lost on recreate) → added a pre-seeded single-file mount ./config/mask_config.json:/app/Sanad/config/mask_config.json.
  • Deploy-side (degraded, P2): mask BLE cold-start can stall ~45 s (3×15 s scan) if host bluetoothd is down / adapter missing — retry-bounded, supervised, background thread; does not crash. Ensure bluetoothd is up; leave SANAD_MASK_ADDRESS empty for auto-detect.

Verified-safe (NO action, to avoid over-fixing): ~/logs FileHandler (root-writable, no crash); NotoSans-Regular.ttf present + COPYed; local_tts torch never imported (gemini brain); pactl with no server (guarded, and not on the builtin voice path); parec capture (gated); arm/DDS init (degrades, catches); config path resolution (robust fallback chain); teaching.py tempfile (writes under the ./data mount).