6.1 KiB
Raw Blame History

SanadV3 — Feature Catalog

Sanad is a bilingual (Arabic/English) humanoid receptionist/assistant on a Unitree G1 (Jetson Orin NX, ROS 2 Foxy, Livox MID-360). This catalogs what's built today (Part A) and what's on the roadmap (Part B).


Part A — Current features (built & running)

Verified from the live subsystem registry (19 subsystems), dashboard tabs (12), and API routers (22).

1. Voice & Conversation

  • Gemini live voice — real-time bilingual AR/EN spoken conversation (native-audio model)
  • Offline brain — local pipeline via ollama (SANAD_VOICE_BRAIN=local), no cloud
  • Wake phrases — configurable wake-word manager
  • Typed replay — type text, robot speaks it (with speaker-monitor capture)
  • Local TTS — on-device text-to-speech engine
  • Prompt management — edit the system prompt from the dashboard
  • Lip-sync — mask mouth driven by TTS MOUTH markers
  • Barge-in — interrupt speech (volume-scaled threshold)

2. Motion & Arm

  • 35 arm actions — 16 SDK built-ins + 19 custom JSONL motions
  • Macro record / playback — capture and replay motion sequences
  • Teaching mode — kinesthetic teach-and-repeat
  • Skills — composed higher-level behaviors (skills.json)
  • Movement dispatch — voice → motion (53 fixed + 10 parametric phrases, cooldown-gated)
  • Arm motion-block — auto-inhibits arm moves while locomotion is active (safety interlock)

3. Locomotion

  • LocoClient + MotionSwitcher — walk / pose control via Unitree SDK (eth0)
  • E-STOP — dashboard kill button
  • Single Ctrl+C teardown — one signal cleanly stops every subsystem (~2s)

4. LED Face Mask

  • Animated expressions — neutral, smile, blink, look L/R, talk13, surprised, sad
  • Gestural-speaking events — face reacts while speaking
  • Lip-sync — mouth animates to speech

5. Vision & Recognition

  • Face recognition — identify people via camera
  • Face gallery — enroll/manage known faces
  • Zone gallery / zones — visual zone recognition
  • Camera feed — attached to the live voice subprocess (vision-in-the-loop)

6. Navigation (web_nav3 integration)

  • Live Map tab — full embedded web_nav3 dashboard (set-pose, goals, bringup)
  • Navigation tab — native canvas viewer (saved/live map, places, missions)
  • map_relay — re-publishes the latched /map @1Hz so the map renders even when stationary
  • Saved maps — load & view a pre-built .db (localize mode)
  • Places — save named poses, one-click "Go"
  • Missions — multi-waypoint routes (defined in web_nav3)
  • Cancel goal — stop an active goal without tearing down bringup
  • SLAM — RTABMap LiDAR-ICP, drift-corrected mapping/localization

7. Audio

  • Device manager — sink/source selection, live refresh
  • Audio profiles — builtin / anker / hollyland_builtin (auto-switch on plug/unplug)

8. Operations, System & Diagnostics

  • System control — start/stop subsystems, status
  • Temperature monitor — motor temps (live websocket stream)
  • Controller — gamepad/teleop input
  • Web terminal — shell in the browser (websocket)
  • Logs — live log stream
  • Recordings & replay — record/playback sessions
  • Scripts — run saved scripts

Dashboard infrastructure

  • 12 tabs, fault-isolated routers (one broken module never breaks the dashboard)
  • WebSocket streams: log_stream, motor_temps, terminal
  • No-store HTML (no stale-cache 404s after deploy)
  • Lazy subsystem imports (missing dep → that subsystem unavailable, rest runs)

Part B — Roadmap (to add)

Tiers = priority. 🏗️ = load-bearing · ⚠️ = Foxy constraint.

Tier 1 — Autonomous behaviors (the product)

  1. Voice-driven navigation — "Sanad, go to the lobby" → nav goal
  2. Greeter mission — recognized face → navigate → greet → express
  3. Named-person greeting — identity → personalized line
  4. Patrol / guided tours — ordered places, speech at each stop
  5. Return-to-base / dock-on-idle — auto-home on idle/low battery

Tier 2 — Navigation & map (harden + edit)

  1. 🏗️ Map republish relay DONE (map_relay)
  2. Click-to-goal on Nav tab canvas
  3. Live nav telemetry — distance/ETA/waypoint, "arrived" toast
  4. Battery + nav-state status bar
  5. Geofence zones on the map
  6. Cancel-goal button DONE

Map editing & annotation (all build on #6)

  1. Erase tool — paint cells free; wipe ghost obstacles + the SLAM "spokes"
  2. Obstacle paint ("black points" / virtual walls)⚠️ Foxy-safe KeepoutFilter substitute
  3. Shape tools + brush size — line/rectangle/polygon
  4. Non-destructive overlay + undo/redo
  5. Persist & auto-reload edits per map
  6. Crop / trim map bounds

Tier 3 — Voice & interaction

  1. Barge-in from dashboard
  2. Quick-phrase soundboard
  3. Conversation memory / visitor log
  4. Per-speaker AR/EN auto-detect
  5. Scheduled announcements
  6. Bake edited map → PGM/YAML (static map_server deploy)

Tier 4 — Face & presence

  1. Gaze / head-track recognized face
  2. Emotion-from-context (sentiment → expression)
  3. Idle breathing / look-around
  4. Lip-sync to TTS amplitude (enhance existing markers)

Tier 5 — Operator, fleet & reliability

  1. 🏗️ Global E-STOP button exists; surface consistently
  2. Health watchdog — auto-restart dead subsystem + alert
  3. Per-subsystem enable/disable toggles
  4. Behavior recorder → replay (nav+voice timelines)
  5. Mission editor UI (visual sequence builder)
  6. Remote access / tunnel
  7. Reverse-proxy web_nav3 through :8001 — one origin, no iframe cross-port issues

Tier 6 — Future / blocked

  1. Speed / caution zones — needs Galactic SpeedFilter or custom layer
  2. Multi-robot fleet (SanadV3 ↔ BotBrain) — needs LocoClient arbitration + coordinator

#1 voice→nav#2 greeter mission (the product), then #12/#13 map editing (clean the spokes + virtual walls). #6 republish relay and #11 cancel are already done.