22 KiB
Sanad Package 2 — Premium Communication — PLAN (locked scope)
**Status: IMPLEMENTED (self-contained) — built 2026-06-22, pending Docker build
- on-robot test.** Vendored from SanadV3, mirrors P1. Dashboard port :8012. Structural validation passed (compile, import resolution, shim-symbol coverage, namespace bootstrap,
license_check P2entitled, YAML).Refinement (2026-06): the engine is now vendored from SanadV3, not plain Sanad. SanadV3 already implements the mask/face subsystem and the evolved voice/audio/arm engine P2 needs — so P2 vendors and wires it rather than building the mask from scratch. See "Why vendor SanadV3".
P2 is a superset of P1: everything P1 does, plus multilingual auto-detect, voice-commanded arm gestures, gestures while speaking, and a lip-syncing LED face on the BLE "Shining Mask".
Locked scope (decisions taken)
- Motion = arm gestures only (first pass). Voice-command locomotion
(robot walking/turning) is DEFERRED to a later pass —
voice/movement_dispatch.pyG1_Controller/loco_controller.pyand SanadV3'sdashboard/routes/_arbiter.py(loco/nav leg arbitration) are intentionally out of scope here.
- Mask = included, BLE driven from INSIDE the P2 container (vendor the Mask
controller;
bleak+ host BlueZ/D-Bus). No separatesanad-maskside-car. - Single self-contained container (no
Sanad_Core, nosanad-base) — like P1. - Keyless ship; customer supplies their own Gemini key.
Why vendor SanadV3 (not plain Sanad)
SanadV3 is not a fork of Sanad — it is plain Sanad plus exactly the
subsystems P2 needs, reachable through the identical Project.Sanad.* import
surface the dashboard routes already use. Verified in the tree:
face/mask_face.py→FaceController(BLE asyncio-loop thread + reconnect supervisor +set_speaking/set_mouthlip-sync inputs). Imports onlyProject.Sanad.config.BASE_DIR,core.config_loader,core.logger, then the flat Mask lib viasys.path.insert(mask_dir).face/face_motion.py→LifelikeFace(saccades, varied blinks, idle/listening/thinking/speaking states, timed reactions, smooth lip-sync), with automatic fallback to the flat lib'sFaceAnimator.config/mask_config.json→ already env-driven (SANAD_MASK_DIR/ADDRESS/NAME_PREFIX/ADAPTER),brightness/fps/lifelike/autostart, persisted face colors.dashboard/routes/mask.py→ a failure-safe/api/maskrouter; every handler isasyncio.to_thread-wrapped and maps errors to 503/409/500 so a missing mask never crashes the dashboard.- Lip-sync chain →
gemini/script.pyemits[[MOUTH:n]](0–3) RMS markers;gemini/subprocess.pyexposesregister_mouth_callback;core/brain.pyset_gestural_speakingemitsbrain.gestural_speaking_changed. - Evolved voice/audio →
voice/live_voice_loop.py(position-based dedup),voice/text_utils.py(Arabic normalize +maybe_trigger_arm),voice/audio_manager.py(per-instance + TTL throttle) — supersets of plain Sanad's race-prone equivalents.
Building the mask from plain Sanad (the prior plan) would mean re-deriving all
of the above by hand and inheriting plain Sanad's known bugs (content-based
dedup, no [[MOUTH:n]], non-atomic index.json writes). The one thing SanadV3
does not solve is BLE-inside-a-container (it ran the mask on the host
g1_env) — but that is a Docker capability/mount problem P2 owns regardless of
base engine.
Correction to the prior plan: the lip-sync source of truth is Gemini's
[[MOUTH:n]] markers over the event bus, not a raw audio-amplitude tap, and
the driver is FaceController + LifelikeFace, not Mask/talking.py TalkingFace. P2 adopts the marker path.
Architecture
One self-contained container that owns all the hardware it needs directly:
the G1 DDS link, rt/arm_sdk (arm), chest or USB/Anker audio, and the BLE
mask. It runs the premium slice of the vendored SanadV3 engine.
P2 is a containerization wrapper (like P1), not a fork: it
- bootstraps the
Project.Sanadnamespace (deployed layout, P1's mechanism), - constructs ONLY the P2 superset subsystems,
- injects a P2-scoped
Project.Sanad.mainshim exposing the singletons the routers import lazily (mask_face,brain,live_voice, …), - mounts the P1 + premium routers + the logs websocket,
- serves the real SanadV3 SPA with non-P2 tabs hidden,
- runs uvicorn on :8012.
Why no Sanad_Core/hwbroker/sanad-mask split: that split exists only to stop
multiple package containers fighting over the one device set. A customer who
buys just P2 has a single owner — no contention — so everything folds into one
container (exactly like P1 standalone, and like the original Sanad monolith which
did comms + arm + mask in one process). The ZMQ bus seam stays available for a
future P1+P2+P3 fleet SKU but is not part of standalone P2.
Namespace bootstrap (reuse P1's, not SanadV3's self-alias)
P2 reuses P1's exact bootstrap (app_p1.py lines 30–46): synthesize a
Project namespace package and alias Project.Sanad → the vendored Sanad
module, then inject a Project.Sanad.main shim holding the P2 singletons. The
mask + voice routes resolve their singletons via lazy from Project.Sanad.main import mask_face inside handlers — so the shim must define them. (SanadV3's own
self-alias-by-folder-name is skipped automatically — the vendored tree is named
Sanad, so main.py's if _THIS_DIR.name != 'Sanad' branch never fires — and
the wrapper-with-shim is what P1 ships and what the routes expect.)
Features / capabilities
Inherited from P1 (superset — same code, premium flags on)
Hands-free Gemini conversation · persona editor (who/tone/language) · keyless
Gemini key (customer adds own) · chest or USB/Anker audio (selectable,
hot-swap) · typed-replay / "say a line" · live logs + download · offline license
gate. Same audio (mic+speaker) mechanism as P1, via the evolved per-instance
voice/audio_manager.py + voice/audio_devices.py.
New in P2 (this pass)
- Multilingual auto-detect — Gemini natively detects the visitor's language
(Arabic Gulf/English) and replies in kind, via the bilingual system prompt in
gemini/script.py/voice/sanad_voice.py. No per-user flag. - Voice-command arm gestures —
voice/live_voice_loop.py: USER speech → arm actions viascripts/sanad_arm.txt(23 actions, non-contiguous ids{0–15, 23–28, 30}; the file's nominal range is 0–28; hundreds of Arabic/EN phrase variants) →sanad_arm_controller.ARM.trigger_action_by_id(). Instant or deferred mode (0.65 s fallback so a silent user still fires). Master trigger-enabled gate (default OFF). Position-based dedup (_last_snapshot+_trigger_lock). - Gestures while speaking —
core/brain.set_gestural_speaking→brain.gestural_speaking_changed→mask_face.set_speaking(True)(mouth + any gestural motion animate together while Gemini talks). - Wake-phrase management — phrase→action CRUD
(
voice/wake_phrase_manager.py, persisted todata/wake_phrases.json), folded into the live loop at runtime via_merge_wake_phrases. - Skills registry — skill CRUD, execute, upload-audio
(
dashboard/routes/skills.py). - Lip-sync on the LED "Shining Mask" — vendored
face/mask_face.pyFaceController+face/face_motion.pyLifelikeFace:- mouth driven by Gemini
[[MOUTH:n]](0–3) markers →gemini/subprocess.register_mouth_callback→mask_face.set_mouth(level); set_speaking(on)from the gestural-speaking event for auto-talk;- state-aware idle/listening/thinking + timed reactions (smile/surprised/sad);
- falls back to the flat lib's
FaceAnimatorifLifelikeFace/Pillow/bleak unavailable (lifelike=falseinmask_config.json).
- mouth driven by Gemini
Deferred (NOT in this pass)
- Voice-command locomotion (
voice/movement_dispatch.py+loco_controller): Gemini's spoken confirmation → discrete bounded steps,movement_enabledgate, "stop" = E-STOP. Adds a walking-on-voice safety surface + on-robot calibration — staged separately. SanadV3'sdashboard/routes/_arbiter.py(Nav2 ↔ LocoController leg arbitration) belongs to this pass, not P2's arm-only pass. - Multi-package fleet via
Sanad_Core(hwbroker/busd/sharedsanad-mask).
Dashboard (:8012) = all P1 tabs + (all routes VENDORED from SanadV3)
- Voice — adds the multilingual auto-detect toggle + per-language voice config.
- Live-voice (commands) —
dashboard/routes/live_voice.py: start/stop, deferred-mode toggle, trigger-enabled master gate, status, trigger history. - Wake-phrases — phrase→action CRUD (AR dialects + EN).
- Motion / Gestural —
dashboard/routes/motion.py: gestural-speaking toggle, trigger / cancel arm actions. (Arm only — loco controls present in the route but unwired this pass.) - Skills —
dashboard/routes/skills.py. - Mask / Lip-sync — the existing SanadV3 SPA Mask Face tab + the vendored
dashboard/routes/mask.py(/api/mask/*): connect/disconnect, brightness, face start/stop/return/color, speaking toggle, mouth slider, expressions, text/image/animation overrides, status. Mounted as-is — not authored. - Logs.
Non-P2 tabs (recognition, temp/3D, controller, navigation, terminal) are hidden the same way P1 hides its non-P1 set.
What it vendors / reuses (self-contained, like P1)
vendor/Sanad— the SanadV3 engine tree (rsync-excludingdata/,Logs/,__pycache__/,tests/,static/temp3d/). Includesface/, evolvedvoice/, evolvedgemini/,motion/, the mask + live-voice- motion + skills routes, and the SPA. (Locomotion modules vendored but left unwired this pass.)
vendor/sanad_pkg— IPC bus shim + offline license verification lib (P1's set).vendor/mask— the flatshiningmasklibrary copied fromProject/Mask(mask.py,faceanim.py,colorface.py,constants.py,protocol.py,transport.py,bitmap.py,NotoSans-Regular.ttf, …).Project/Maskuses flat imports (import faces,import mask), so it goes on its own path (SANAD_MASK_DIR=/app/mask, also onPYTHONPATH) — NOT underSanad/, to avoid collisions.face/mask_face.pyandface/face_motion.pybothsys.path.insert(mask_dir)andimport mask / faceanim / colorface.- A
sync_vendor.shthat refreshes bothvendor/Sanad(from SanadV3) andvendor/mask(fromProject/Mask), and blanks any baked Gemini key.
Wiring the lip-sync + gestures (in app_p2.py / the Project.Sanad.main shim)
- Construct P1 comms singletons (brain, audio_mgr, voice_client,
GeminiSubprocess) exactly as P1 does. - Construct premium singletons:
FaceController()(mask_face),LiveVoiceLoop(...),WakePhraseManager(), arm controllers, skills. brain.attach_live_voice(live_voice); wire the arm⇄locomotion motion-block predicate exactly as SanadV3'smain.pydoes (arm_controller.set_motion_block(...)). This is load-bearing safety, not optional: twort/arm_sdkpublishers coexist in-process —motion/arm_controller.py(publisher ~line 237) andmotion/sanad_arm_controller.py(~line 176). They stay collision-free ONLY via (a) thatset_motion_blockinterlock onArmControllerand (b)sanad_arm_controller's_is_busy/_busy_lockatomic guard. A boot-time 'sole writer' assertion only covers cross-container contention — it does not replace this intra-process interlock, whichapp_p2.pymust reproduce.- Lip-sync:
gemini_subprocess.register_mouth_callback(mask_face.set_mouth). - Gestures-while-speaking: subscribe
brain.gestural_speaking_changed→mask_face.set_speaking(on). The event bus is synchronous (core/event_bus.on/.emit_sync), soset_speakingruns on the caller's thread — keep it non-blocking (it only flips a flag the BLE loop reads). - Lifelike state: wire voice events (connected→
set_listening, user_said→set_thinking, disconnected→set_idle,voice.error/motion.action_error→react('sad'),skill.finished→react('smile')). SanadV3'smain.pylines ~360–427 are the concrete reference implementation — copy that wiring (incl.register_mouth_callback~383–391 andmask_face.shutdown()~587–591) rather than re-deriving it. - Expose
mask_face,brain,live_voiceon theProject.Sanad.mainshim so the lazy route accessors resolve them. - On shutdown:
mask_face.shutdown()(BLE disconnect + stop loop) — handle SIGTERM so the container exits cleanly.
Container & hardware
FROM python:3.10-slim,WITH_UNITREE_SDK=1(builds — the cycloneddsidlcfix gives arm + chest audio).- System deps (added over P1):
bluez,libdbus-1-3/libdbus-1-dev,libglib2.0-0for BlueZ/D-Bus; Pillow needs no extra apt on slim. - Python deps: P1's set +
bleak==0.22.3(pinned — bleak 3.x throwsKeyError 'Roles'on the Jetson's BlueZ 5.53 and every connect fails) +Pillow(LifelikeFace frame rendering). - BLE for the mask (in-container): mount
/var/run/dbus,--cap-add NET_ADMIN,/dev/bus/usb;network_mode: host. Free the mask from the phone app before connecting. SetSANAD_MASK_DIR=/app/mask. /dev/snd(audio), license mount, writable./data+./configmounts (mask color persistence),restart: unless-stopped.- Port :8012. Ships keyless (
strip_key.pyblanks any baked key). - License features:
multilingual,voice_command_motion(arm gestures),lipsync,mask. Entrypoint checks entitlement P2. (A futurevoice_command_locomotion/navigationfeature gates the deferred walking.)
Package layout (to build later — mirrors P1)
Sanad_Package_2/
app_p2.py routes_p2.py entrypoint.sh strip_key.py p2ctl.sh
config/p2_config.json static/ Dockerfile docker-compose.yml requirements.txt
vendor/Sanad (from SanadV3)
vendor/sanad_pkg
vendor/mask (from Project/Mask — own PYTHONPATH)
license/(pubkey + example) data/(seed incl. wake_phrases.json) sync_vendor.sh
README.md PLAN.md NEW_ROBOT_SETUP.md
Build sequence (when implemented)
- Vendor the SanadV3 engine +
sanad_pkg+ the flatProject/Mask→vendor/mask; merge requirements (P1 deps +bleak==0.22.3+Pillow). - Self-contained Dockerfile (P1's + BlueZ/D-Bus system deps;
COPY vendor/mask /app/mask;ENV SANAD_MASK_DIR=/app/mask). app_p2.py— P1's namespace bootstrap +Project.Sanad.mainshim; construct P1 comms +FaceController+LiveVoiceLoop+ wake-phrase mgr + skills; wireregister_mouth_callback→set_mouthandgestural_speaking_changed→set_speaking+ lifelike state hooks; mount P1 + premium routers (incl. the vendoredmask.py); serve :8012.- Mask:
FaceControlleralready runs its own BLE asyncio loop + reconnect supervisor; withautostart=trueit connects + uploads frames (~30–90 s one-time, persists on flash) in the background — never blocks boot. - Multilingual: ship SanadV3's bilingual system prompt (rename persona as decided); enable per-language voice config in the Voice tab.
- License gate
P2; keyless; smoke test (P1 endpoints + live-voice + wake + skills +/api/mask/status→ connect → face/start → mouth slider); validate on the robot.
Risks / mitigations
- BLE-from-container is fiddly — prototype the mask connect early on the
target Jetson (BlueZ 5.53) with
bleak==0.22.3; if in-container BLE misbehaves, fall back to the mask controller on the host with a tiny socket shim (contingency, not the plan).mask.pyroute already degrades to 503 if the subsystem is down, so the dashboard never crashes. - Mask flat imports — vendor onto
/app/maskviaSANAD_MASK_DIR; never place underSanad/. - LifelikeFace deps — needs Pillow + bleak; if either is missing the
subsystem auto-falls-back to
FaceAnimator(lifelike=false) and reports the reason in the Mask tab. The rest of P2 is unaffected. - Lip-sync chain spans the voice subprocess —
gemini/subprocess.pyspawnsvoice/sanad_voice.py(the child,SANAD_VOICE_BRAIN=gemini), which is an orchestrator that in turn runsgemini/script.py— the actualGeminiBrain. The[[MOUTH:n]]markers are emitted ingemini/script.py(~lines 563/578), notsanad_voice.py. Vendor the fullscript.py → sanad_voice.py → subprocess.pychain intact and register the parent callback (GeminiSubprocess.register_mouth_callback) in the shim. P1 already vendorsgemini/subprocess.py, so this is additive wiring, not a new vendor. - Arm safety (two publishers + interlock) —
trigger_enableddefaults OFF; arm actions are bounded. The single container is the sole container writingrt/arm_sdk, but two in-process publishers (arm_controller.py+sanad_arm_controller.py) coexist — collision-free ONLY via theset_motion_blockinterlock +_is_busybusy-lock thatapp_p2.pymust wire (Wiring step 3). Assert sole-container-writer at boot and reproduce the interlock. - Arm⇄locomotion interlock is a no-op in this arm-only build —
sanad_arm_controller.trigger_action_by_id()calls_blocked()(refuses arm motion while the robot may be walking). With locomotion deferred, nothing publishes the locomotion-state signals_blocked()reads, so it always permits arm motion — safe by omission now, but it must be re-armed before the deferred voice-command-locomotion pass ships, else voice-walking would enable with a silently-disabled arm interlock. - Mask hardware presence — lip-sync needs the physical BLE mask paired/in range and freed from the phone app.
- Locomotion creep —
movement_dispatch.py+_arbiter.pyare present in the vendored tree; keep them unwired this pass to avoid accidentally shipping voice-command walking without the safety surface + calibration.
Open decisions (resolve before/while building)
- In-container BLE vs host side-car — confirm
bleak+BlueZ/D-Bus actually connects from insidepython:3.10-slimwith/var/run/dbus+NET_ADMIN+network_mode: hoston the target Jetson (BlueZ 5.53). Pinbleak==0.22.3(3.x throwsKeyError 'Roles'on BlueZ 5.53 — every connect fails). If flaky, fall back to a host-side mask controller + tiny socket shim (contingency). - LifelikeFace default — SanadV3 defaults
lifelike=true(needsface/face_motion.py+ Pillow + bleak). Confirm Pillow+bleak install cleanly in the slim image; otherwisemask_config.json lifelike=falseauto-falls back toFaceAnimator. - Gemini lip-sync chain vendoring — vendor the evolved
gemini/script.py(emits[[MOUTH:n]]) andgemini/subprocess.py(register_mouth_callback), and wireregister_mouth_callback → mask_face.set_mouthin the shim. P1 already vendorssubprocess.py, so this is additive. - Persona / robot name — SanadV3's fallback prompt identifies the robot as
"Marcus" (bilingual Gulf-Arabic/English auto-detect — this prompt is the
multilingual engine). Decide P2's shipped persona name/dialect in
scripts/sanad_script.txtand whether to keep the bilingual prompt verbatim. - First-boot frame-upload latency — the one-time DIY frame upload (~30–90 s,
persists on mask flash). Decide
autostart=true(background, non-blocking) vs gatingface/startbehind an explicit dashboard action with a progress indicator. - Vendor
movement_dispatch.pynow (unwired) or omit — keep for forward-compat with the deferred locomotion pass, or drop it to keep the arm-only image lean. - Writable mounts for persisted face colors —
/face/colorpersists eye/ mouth/sclera colors tomask_config.json; ensureconfig/(or thedata/seed) is on a writable volume so colors survive container restarts.
Carried-forward safety gate: before the deferred voice-command-locomotion pass ships, re-arm the
_blocked()arm⇄locomotion interlock (a no-op today) and revisit_arbiter.py(Nav2 ↔ LocoController leg arbitration).
Container-runtime audit (2026-06-23)
A 3-reader audit of the vendored runtime paths (voice/audio, mask/face, arm+dashboard)
for in-container failure modes. Verdict: container-safe on the G1 after the
iproute2 fix — no remaining crash-level landmine on the builtin+gemini+host-net path.
- FIXED (crash, P1+P2): chest-mic
voice/audio_io.py:_find_g1_local_ip()shells out toip— addediproute2to both Dockerfiles. (This was the live-voice crash we hit.) - Mitigated (crash, off-robot only):
BuiltinMic.start()calls_find_g1_local_ip()unguarded; off a G1 (no192.168.123.xnet) it raises and kills the voice subprocess. On the G1 (network_mode host) the interface exists → fine. Package-level safety valve:SANAD_AUDIO_PROFILE=pluggedon any non-G1 host (documented in compose +.env.example). Not fixed in the engine (shared with SanadV3). - FIXED (degraded, persistence): mask face colors are written to
config/mask_config.json(a baked layer, lost on recreate) → added a pre-seeded single-file mount./config/mask_config.json:/app/Sanad/config/mask_config.json. - Deploy-side (degraded, P2): mask BLE cold-start can stall ~45 s (3×15 s scan) if
host
bluetoothdis down / adapter missing — retry-bounded, supervised, background thread; does not crash. Ensurebluetoothdis up; leaveSANAD_MASK_ADDRESSempty for auto-detect.
Verified-safe (NO action, to avoid over-fixing): ~/logs FileHandler (root-writable,
no crash); NotoSans-Regular.ttf present + COPYed; local_tts torch never imported
(gemini brain); pactl with no server (guarded, and not on the builtin voice path);
parec capture (gated); arm/DDS init (degrades, catches); config path resolution
(robust fallback chain); teaching.py tempfile (writes under the ./data mount).