Sanad_lite/README.md

13 KiB

Sanad_lite

Multi-user, browser-audio fork of Sanad. The full Sanad robot stack (arm, macros, camera, live conversation subprocess) was stripped out; what remains is a small FastAPI dashboard for typed-replay TTS and saved-record management where all audio plays in each user's own browser, not on the host machine.

┌────────────────────────────────────────────────────────────────────┐
│  Dashboard (FastAPI) ── http://<host>:8000                         │
│  ├─ /login                Cookie-session auth                      │
│  ├─ Voice & Audio         Gemini API key, Typed Replay (TTS)       │
│  ├─ Recordings            Saved WAVs — Play / Raw / Download / Del │
│  │                        plus "Delete All"                        │
│  └─ Settings & Logs       Scripts, system prompt, live log tail    │
└────────────────────────────────────────────────────────────────────┘

Run on your laptop

pip install --user \
  fastapi 'uvicorn[standard]' itsdangerous python-multipart pydantic \
  websockets

cd /home/zedx/Robotics_workspace/yslootahtech/Project/Sanad_lite
SANAD_DASHBOARD_HOST=127.0.0.1 python3 main.py

Open http://127.0.0.1:8000 and sign in with:

Username: lkasjda213h Password: kj812bf@jdon

Setting SANAD_DASHBOARD_HOST=127.0.0.1 keeps the server bound to localhost; omit it to auto-bind to wlan0's IP so colleagues on the LAN can reach it at http://<your-ip>:8000.

The websockets package is needed because the Gemini Live TTS used by Typed Replay opens a WebSocket to Google. Everything else (records list, records delete-all, login, logs) works without it.

Gemini API key — required, none ships with the repo. The api_key in config/core_config.json (gemini_defaults) is intentionally empty (""). Typed Replay / Gemini TTS won't work until you supply one:

  • paste it in the dashboard → Voice & Audio → Gemini API Key (hot-swap, no restart), or
  • export SANAD_GEMINI_API_KEY=AIza... before python3 main.py, or
  • set gemini_defaults.api_key in config/core_config.json.

Get a key at https://aistudio.google.com/apikey.

The other heavy deps (pyaudio, transformers, torch) are listed in requirements.txt but are not required for the lite dashboard. They were leftovers from the parent Sanad project and may still be imported lazily by voice/audio_manager.py / voice/local_tts.py on construction — failures are caught silently in main.py.

Run on the server

Replace the SSH/IP/path placeholders with your server's values:

# 1. Install deps once on the server
ssh <user>@<server-ip> 'pip install itsdangerous fastapi "uvicorn[standard]" python-multipart pydantic websockets'

# 2. Push the lite tree
rsync -av --delete \
  --exclude=__pycache__ --exclude=logs --exclude=data \
  /home/zedx/Robotics_workspace/yslootahtech/Project/Sanad_lite/ \
  <user>@<server-ip>:~/Sanad_lite/

# 3. Start it on the server (SSH in first, then run)
ssh <user>@<server-ip>
cd ~/Sanad_lite
python3 main.py

Then open http://<server-ip>:8000 and sign in with lkasjda213h / kj812bf@jdon.

To leave it running after you log out, use tmux, screen, nohup, or the systemd unit at shell_scripts/sanad.service (edit the paths inside to match your install).

Login

Credentials are in config/core_config.json:

"auth": {
  "username": "lkasjda213h",
  "password": "kj812bf@jdon"
}

Change them before any non-LAN deployment. The session cookie is signed with a fresh secret each time main.py starts, so a restart logs every user out.

For a stronger setup, replace the plaintext check with a bcrypt hash in dashboard/routes/auth.py.

Audio architecture — who plays what, where

Action Where audio plays
Recordings → Play each viewing user's browser
Recordings → Raw each viewing user's browser
Recordings → Download saves WAV to viewing user's device
Recordings → Delete All wipes data/audio/*.wav on the server
Voice & Audio → Typed Replay → Generate & Play each viewing user's browser
Voice & Audio → Typed Replay → Replay Last each viewing user's browser

Server-side ALSA / PulseAudio is not touched for any of the above. Both audio paths use the same pattern:

  1. Server generates / loads the WAV bytes.
  2. Server returns them as audio/wav from an HTTP endpoint (/api/records/audio/{name} or /api/typed-replay/audio/last).
  3. Browser fetches the response into new Audio(url) and calls .play().

So if you host the dashboard on machine A and a colleague on machine B opens http://A:8000 and clicks Play, the sound comes out of B's speakers. Machine A stays silent.

Directory layout

Path Contents
main.py Entry point — boots subsystems + dashboard.
config.py Runtime constants derived from config/*_config.json.
config/ Per-subsystem JSON: core, voice, gemini, dashboard.
core/ Brain (callback whitelist + status), skill registry, event bus, config loader, logger.
gemini/ client.py — Gemini Live WebSocket client used by typed_replay for one-shot TTS calls.
voice/ typed_replay.py (server generates, browser plays), audio_manager.py (host PyAudio — only used to share a PyAudio instance with typed_replay; degrades gracefully if PyAudio is missing), local_tts.py (offline SpeechT5 — unused in the lite UI but kept for the /api/voice/generate legacy route), audio_devices.py, text_utils.py.
dashboard/ app.py (FastAPI + SessionMiddleware + auth gate), routes/*.py, static/index.html, static/login.html.
dashboard/routes/ auth.py, health.py, system.py, voice.py, logs.py, audio_control.py, scripts.py, records.py, prompt.py, typed_replay.py, plus websockets/log_stream.py.
scripts/ sanad_script.txt (persona), sanad_rule.txt (rules).
data/audio/ Generated WAVs from Typed Replay → Save Last. Wiped by "Delete All".
data/motions/ Persisted dashboard settings (Gemini API key, G1 volume) — back-compat path.
logs/ Per-module rotating logs.
tests/ test_smoke.py — Brain whitelist, skill registry, wake-phrase matching, atomic IO, audio devices, isolation.

Runtime env vars

Var Values Default Effect
SANAD_DASHBOARD_HOST IP or hostname wlan0's IP Override the bind address. Use 127.0.0.1 for localhost-only.
SANAD_DASHBOARD_INTERFACE iface name wlan0 Pick which interface's IP to auto-bind to.
SANAD_GEMINI_API_KEY string "" (empty) Gemini API key. No key ships in the repo — set this, paste one in the dashboard (Voice & Audio → Gemini API Key), or fill gemini_defaults.api_key in config/core_config.json.

What was stripped vs Sanad (full)

Removed because the lite dashboard never needed them:

  • Motion / arm: motion/, scripts/sanad_arm.txt, config/motion_config.json, dashboard/routes/{motion,macros,replay,skills}.py.
  • Live voice conversation: voice/sanad_voice.py, voice/audio_io.py, voice/live_voice_loop.py, voice/wake_phrase_manager.py, voice/model_script.py, voice/model_subprocess.py, gemini/subprocess.py, gemini/script.py, dashboard/routes/{live_voice,live_subprocess,wake_phrases}.py.
  • Offline brain: local/ (LLM, STT, TTS, VAD), config/local_config.json.
  • Camera / vision: dashboard/routes/vision.py and all /api/vision/* endpoints, the camera tab UI.
  • Examples / demos: examples/.
  • Tabs: Operations, Motion & Replay, Camera & Vision (deprecated), Live Voice Commands card, Wake Phrase Manager card, Live Gemini Process card.

Added by lite:

  • Login page + session cookie auth (dashboard/routes/auth.py, dashboard/static/login.html, SessionMiddleware).
  • Browser-side audio streamingGET /api/records/audio/{name}?kind={speaker,raw} and GET /api/typed-replay/audio/last.
  • Download button on each saved record.
  • Delete All button that wipes every WAV under data/audio/.

Troubleshooting

Symptom Fix
ModuleNotFoundError: itsdangerous at startup pip install itsdangerous — required by Starlette's SessionMiddleware.
ModuleNotFoundError: websockets when generating typed-replay audio pip install websocketsgemini/client.py uses it.
Redirected to /login on every API call Session cookie cleared on server restart by design — sign in again.
Failed to construct audio_mgr — pyaudio not installed warning at startup Harmless on a laptop. voice/audio_manager.py requires PyAudio + portaudio headers; not needed for any user-facing button. Install with sudo apt install portaudio19-dev && pip install pyaudio if you want it gone.
ALSA / PortAudio noise at startup (pcm_dmix.c, Cannot connect to JACK) Pre-init probe of PortAudio inside pyaudio.PyAudio(). Cosmetic — the lite dashboard never actually opens an ALSA stream. To silence it, drop PyAudio entirely (uninstall + add a _safe_import guard for voice.audio_manager).
Gemini TTS attempt N returned no audio — parts: … then 503 Gemini Live is non-deterministic on short Arabic snippets — it sometimes returns reasoning text instead of audio. The retry chain in voice/typed_replay.py:generate_audio tries 3 prompt variants. Lengthen the text or add diacritics if it persists.
cannot import name 'X' from 'Project.Sanad.main' A route is trying to import a global that lite removed. Add a try/except ImportError in that route or drop the route from dashboard/app.py:_REST_ROUTES.

Endpoints

GET    /                                    → / dashboard (auth-gated)
GET    /login                               → login page
POST   /api/auth/login                      → {username,password} → set cookie
POST   /api/auth/logout                     → clear cookie
GET    /api/auth/me                         → {authenticated, user}

GET    /api/health                          → {status, brain}
GET    /api/status                          → {brain, voice}
GET    /api/system/info                     → host / interfaces / subsystems

GET    /api/voice/status                    → Gemini connection state
POST   /api/voice/connect                   → connect Gemini Live socket
POST   /api/voice/disconnect                → disconnect
GET    /api/voice/api-key                   → masked current key
POST   /api/voice/api-key                   → {key} → persist new key

POST   /api/typed-replay/say                → {text,record,record_name} → generates, caches
GET    /api/typed-replay/audio/last         → streams cached WAV (browser plays it)
POST   /api/typed-replay/replay-last        → bumps replay counter (audio still client-side)
POST   /api/typed-replay/save-last          → persists cached generation to records
GET    /api/typed-replay/status             → engine + session state
GET    /api/typed-replay/records            → list
DELETE /api/typed-replay/records/{name}     → delete one
POST   /api/typed-replay/records/{name}/rename

GET    /api/records/                        → list saved records
GET    /api/records/audio/{name}?kind=...   → stream a record's WAV
POST   /api/records/delete                  → {record_name} → delete one
POST   /api/records/delete-all              → wipe data/audio/*.wav + reset index

GET    /api/scripts/                        → list persona/rule files
POST   /api/scripts/load                    → {name} → file contents
POST   /api/scripts/save                    → {name,content}
POST   /api/scripts/create                  → {name,content}
POST   /api/scripts/delete                  → {name}

GET    /api/prompt/                         → resolved system prompt
POST   /api/prompt/update                   → {content}
POST   /api/prompt/reload                   → re-read from disk

GET    /api/logs/{module}/tail              → last N log lines
POST   /api/logs/snapshot                   → save snapshot bundle
GET    /api/logs/bundle                     → download all logs as a zip
GET    /api/audio/status                    → mic/spk mute state (server-side, informational)
WS     /ws/logs                             → live log stream

License / attribution

Internal project for YS Lootah Technology. Trimmed from Sanad — original Sanad reuses patterns from SanadVoice/gemini_interact and Unitree unitree_sdk2py.