6.1 KiB
6.1 KiB
SanadV3 — Feature Catalog
Sanad is a bilingual (Arabic/English) humanoid receptionist/assistant on a Unitree G1 (Jetson Orin NX, ROS 2 Foxy, Livox MID-360). This catalogs what's built today (Part A) and what's on the roadmap (Part B).
Part A — Current features (built & running)
Verified from the live subsystem registry (19 subsystems), dashboard tabs (12), and API routers (22).
1. Voice & Conversation
- Gemini live voice — real-time bilingual AR/EN spoken conversation (native-audio model)
- Offline brain — local pipeline via
ollama(SANAD_VOICE_BRAIN=local), no cloud - Wake phrases — configurable wake-word manager
- Typed replay — type text, robot speaks it (with speaker-monitor capture)
- Local TTS — on-device text-to-speech engine
- Prompt management — edit the system prompt from the dashboard
- Lip-sync — mask mouth driven by TTS
MOUTHmarkers - Barge-in — interrupt speech (volume-scaled threshold)
2. Motion & Arm
- 35 arm actions — 16 SDK built-ins + 19 custom JSONL motions
- Macro record / playback — capture and replay motion sequences
- Teaching mode — kinesthetic teach-and-repeat
- Skills — composed higher-level behaviors (skills.json)
- Movement dispatch — voice → motion (53 fixed + 10 parametric phrases, cooldown-gated)
- Arm motion-block — auto-inhibits arm moves while locomotion is active (safety interlock)
3. Locomotion
- LocoClient + MotionSwitcher — walk / pose control via Unitree SDK (eth0)
- E-STOP — dashboard kill button
- Single Ctrl+C teardown — one signal cleanly stops every subsystem (~2s)
4. LED Face Mask
- Animated expressions — neutral, smile, blink, look L/R, talk1–3, surprised, sad
- Gestural-speaking events — face reacts while speaking
- Lip-sync — mouth animates to speech
5. Vision & Recognition
- Face recognition — identify people via camera
- Face gallery — enroll/manage known faces
- Zone gallery / zones — visual zone recognition
- Camera feed — attached to the live voice subprocess (vision-in-the-loop)
6. Navigation (web_nav3 integration)
- Live Map tab — full embedded web_nav3 dashboard (set-pose, goals, bringup)
- Navigation tab — native canvas viewer (saved/live map, places, missions)
- map_relay — re-publishes the latched
/map@1Hz so the map renders even when stationary - Saved maps — load & view a pre-built
.db(localize mode) - Places — save named poses, one-click "Go"
- Missions — multi-waypoint routes (defined in web_nav3)
- Cancel goal — stop an active goal without tearing down bringup
- SLAM — RTABMap LiDAR-ICP, drift-corrected mapping/localization
7. Audio
- Device manager — sink/source selection, live refresh
- Audio profiles — builtin / anker / hollyland_builtin (auto-switch on plug/unplug)
8. Operations, System & Diagnostics
- System control — start/stop subsystems, status
- Temperature monitor — motor temps (live websocket stream)
- Controller — gamepad/teleop input
- Web terminal — shell in the browser (websocket)
- Logs — live log stream
- Recordings & replay — record/playback sessions
- Scripts — run saved scripts
Dashboard infrastructure
- 12 tabs, fault-isolated routers (one broken module never breaks the dashboard)
- WebSocket streams: log_stream, motor_temps, terminal
- No-store HTML (no stale-cache 404s after deploy)
- Lazy subsystem imports (missing dep → that subsystem unavailable, rest runs)
Part B — Roadmap (to add)
Tiers = priority. 🏗️ = load-bearing · ⚠️ = Foxy constraint.
Tier 1 — Autonomous behaviors (the product)
- Voice-driven navigation — "Sanad, go to the lobby" → nav goal
- Greeter mission — recognized face → navigate → greet → express
- Named-person greeting — identity → personalized line
- Patrol / guided tours — ordered places, speech at each stop
- Return-to-base / dock-on-idle — auto-home on idle/low battery
Tier 2 — Navigation & map (harden + edit)
- 🏗️ Map republish relay — ✅ DONE (map_relay)
- Click-to-goal on Nav tab canvas
- Live nav telemetry — distance/ETA/waypoint, "arrived" toast
- Battery + nav-state status bar
- Geofence zones on the map
- Cancel-goal button — ✅ DONE
Map editing & annotation (all build on #6)
- Erase tool — paint cells free; wipe ghost obstacles + the SLAM "spokes"
- Obstacle paint ("black points" / virtual walls) — ⚠️ Foxy-safe KeepoutFilter substitute
- Shape tools + brush size — line/rectangle/polygon
- Non-destructive overlay + undo/redo
- Persist & auto-reload edits per map
- Crop / trim map bounds
Tier 3 — Voice & interaction
- Barge-in from dashboard
- Quick-phrase soundboard
- Conversation memory / visitor log
- Per-speaker AR/EN auto-detect
- Scheduled announcements
- Bake edited map → PGM/YAML (static map_server deploy)
Tier 4 — Face & presence
- Gaze / head-track recognized face
- Emotion-from-context (sentiment → expression)
- Idle breathing / look-around
- Lip-sync to TTS amplitude (enhance existing markers)
Tier 5 — Operator, fleet & reliability
- 🏗️ Global E-STOP button — ✅ exists; surface consistently
- Health watchdog — auto-restart dead subsystem + alert
- Per-subsystem enable/disable toggles
- Behavior recorder → replay (nav+voice timelines)
- Mission editor UI (visual sequence builder)
- Remote access / tunnel
- Reverse-proxy web_nav3 through :8001 — one origin, no iframe cross-port issues
Tier 6 — Future / blocked
- Speed / caution zones — needs Galactic SpeedFilter or custom layer
- Multi-robot fleet (SanadV3 ↔ BotBrain) — needs LocoClient arbitration + coordinator
Recommended next build order
#1 voice→nav → #2 greeter mission (the product), then #12/#13 map editing (clean the spokes + virtual walls). #6 republish relay and #11 cancel are already done.