12 KiB
Sanad_lite
Multi-user, browser-audio fork of Sanad. The full Sanad robot stack (arm, macros, camera, live conversation subprocess) was stripped out; what remains is a small FastAPI dashboard for typed-replay TTS and saved-record management where all audio plays in each user's own browser, not on the host machine.
┌────────────────────────────────────────────────────────────────────┐
│ Dashboard (FastAPI) ── http://<host>:8000 │
│ ├─ /login Cookie-session auth │
│ ├─ Voice & Audio Gemini API key, Typed Replay (TTS) │
│ ├─ Recordings Saved WAVs — Play / Raw / Download / Del │
│ │ plus "Delete All" │
│ └─ Settings & Logs Scripts, system prompt, live log tail │
└────────────────────────────────────────────────────────────────────┘
Run on your laptop
pip install --user \
fastapi 'uvicorn[standard]' itsdangerous python-multipart pydantic \
websockets
cd /home/zedx/Robotics_workspace/yslootahtech/Project/Sanad_lite
SANAD_DASHBOARD_HOST=127.0.0.1 python3 main.py
Open http://127.0.0.1:8000 and sign in with:
Username:
lkasjda213hPassword:kj812bf@jdon
Setting SANAD_DASHBOARD_HOST=127.0.0.1 keeps the server bound to
localhost; omit it to auto-bind to wlan0's IP so colleagues on the LAN
can reach it at http://<your-ip>:8000.
The websockets package is needed because the Gemini Live TTS used by
Typed Replay opens a WebSocket to Google. Everything else (records list,
records delete-all, login, logs) works without it.
The other heavy deps (
pyaudio,transformers,torch) are listed inrequirements.txtbut are not required for the lite dashboard. They were leftovers from the parent Sanad project and may still be imported lazily byvoice/audio_manager.py/voice/local_tts.pyon construction — failures are caught silently inmain.py.
Run on the server
Replace the SSH/IP/path placeholders with your server's values:
# 1. Install deps once on the server
ssh <user>@<server-ip> 'pip install itsdangerous fastapi "uvicorn[standard]" python-multipart pydantic websockets'
# 2. Push the lite tree
rsync -av --delete \
--exclude=__pycache__ --exclude=logs --exclude=data \
/home/zedx/Robotics_workspace/yslootahtech/Project/Sanad_lite/ \
<user>@<server-ip>:~/Sanad_lite/
# 3. Start it on the server (SSH in first, then run)
ssh <user>@<server-ip>
cd ~/Sanad_lite
python3 main.py
Then open http://<server-ip>:8000 and sign in with lkasjda213h /
kj812bf@jdon.
To leave it running after you log out, use tmux, screen, nohup, or
the systemd unit at shell_scripts/sanad.service (edit the paths inside
to match your install).
Login
Credentials are in config/core_config.json:
"auth": {
"username": "lkasjda213h",
"password": "kj812bf@jdon"
}
Change them before any non-LAN deployment. The session cookie is signed
with a fresh secret each time main.py starts, so a restart logs every
user out.
For a stronger setup, replace the plaintext check with a bcrypt hash in
dashboard/routes/auth.py.
Audio architecture — who plays what, where
| Action | Where audio plays |
|---|---|
| Recordings → Play | each viewing user's browser |
| Recordings → Raw | each viewing user's browser |
| Recordings → Download | saves WAV to viewing user's device |
| Recordings → Delete All | wipes data/audio/*.wav on the server |
| Voice & Audio → Typed Replay → Generate & Play | each viewing user's browser |
| Voice & Audio → Typed Replay → Replay Last | each viewing user's browser |
Server-side ALSA / PulseAudio is not touched for any of the above. Both audio paths use the same pattern:
- Server generates / loads the WAV bytes.
- Server returns them as
audio/wavfrom an HTTP endpoint (/api/records/audio/{name}or/api/typed-replay/audio/last). - Browser fetches the response into
new Audio(url)and calls.play().
So if you host the dashboard on machine A and a colleague on machine
B opens http://A:8000 and clicks Play, the sound comes out of B's
speakers. Machine A stays silent.
Directory layout
| Path | Contents |
|---|---|
main.py |
Entry point — boots subsystems + dashboard. |
config.py |
Runtime constants derived from config/*_config.json. |
config/ |
Per-subsystem JSON: core, voice, gemini, dashboard. |
core/ |
Brain (callback whitelist + status), skill registry, event bus, config loader, logger. |
gemini/ |
client.py — Gemini Live WebSocket client used by typed_replay for one-shot TTS calls. |
voice/ |
typed_replay.py (server generates, browser plays), audio_manager.py (host PyAudio — only used to share a PyAudio instance with typed_replay; degrades gracefully if PyAudio is missing), local_tts.py (offline SpeechT5 — unused in the lite UI but kept for the /api/voice/generate legacy route), audio_devices.py, text_utils.py. |
dashboard/ |
app.py (FastAPI + SessionMiddleware + auth gate), routes/*.py, static/index.html, static/login.html. |
dashboard/routes/ |
auth.py, health.py, system.py, voice.py, logs.py, audio_control.py, scripts.py, records.py, prompt.py, typed_replay.py, plus websockets/log_stream.py. |
scripts/ |
sanad_script.txt (persona), sanad_rule.txt (rules). |
data/audio/ |
Generated WAVs from Typed Replay → Save Last. Wiped by "Delete All". |
data/motions/ |
Persisted dashboard settings (Gemini API key, G1 volume) — back-compat path. |
logs/ |
Per-module rotating logs. |
tests/ |
test_smoke.py — Brain whitelist, skill registry, wake-phrase matching, atomic IO, audio devices, isolation. |
Runtime env vars
| Var | Values | Default | Effect |
|---|---|---|---|
SANAD_DASHBOARD_HOST |
IP or hostname | wlan0's IP | Override the bind address. Use 127.0.0.1 for localhost-only. |
SANAD_DASHBOARD_INTERFACE |
iface name | wlan0 |
Pick which interface's IP to auto-bind to. |
SANAD_GEMINI_API_KEY |
string | reads from data/motions/config.json |
Override the Gemini API key. |
What was stripped vs Sanad (full)
Removed because the lite dashboard never needed them:
- Motion / arm:
motion/,scripts/sanad_arm.txt,config/motion_config.json,dashboard/routes/{motion,macros,replay,skills}.py. - Live voice conversation:
voice/sanad_voice.py,voice/audio_io.py,voice/live_voice_loop.py,voice/wake_phrase_manager.py,voice/model_script.py,voice/model_subprocess.py,gemini/subprocess.py,gemini/script.py,dashboard/routes/{live_voice,live_subprocess,wake_phrases}.py. - Offline brain:
local/(LLM, STT, TTS, VAD),config/local_config.json. - Camera / vision:
dashboard/routes/vision.pyand all/api/vision/*endpoints, the camera tab UI. - Examples / demos:
examples/. - Tabs: Operations, Motion & Replay, Camera & Vision (deprecated), Live Voice Commands card, Wake Phrase Manager card, Live Gemini Process card.
Added by lite:
- Login page + session cookie auth (
dashboard/routes/auth.py,dashboard/static/login.html,SessionMiddleware). - Browser-side audio streaming —
GET /api/records/audio/{name}?kind={speaker,raw}andGET /api/typed-replay/audio/last. - Download button on each saved record.
- Delete All button that wipes every WAV under
data/audio/.
Troubleshooting
| Symptom | Fix |
|---|---|
ModuleNotFoundError: itsdangerous at startup |
pip install itsdangerous — required by Starlette's SessionMiddleware. |
ModuleNotFoundError: websockets when generating typed-replay audio |
pip install websockets — gemini/client.py uses it. |
Redirected to /login on every API call |
Session cookie cleared on server restart by design — sign in again. |
Failed to construct audio_mgr — pyaudio not installed warning at startup |
Harmless on a laptop. voice/audio_manager.py requires PyAudio + portaudio headers; not needed for any user-facing button. Install with sudo apt install portaudio19-dev && pip install pyaudio if you want it gone. |
ALSA / PortAudio noise at startup (pcm_dmix.c, Cannot connect to JACK) |
Pre-init probe of PortAudio inside pyaudio.PyAudio(). Cosmetic — the lite dashboard never actually opens an ALSA stream. To silence it, drop PyAudio entirely (uninstall + add a _safe_import guard for voice.audio_manager). |
Gemini TTS attempt N returned no audio — parts: … then 503 |
Gemini Live is non-deterministic on short Arabic snippets — it sometimes returns reasoning text instead of audio. The retry chain in voice/typed_replay.py:generate_audio tries 3 prompt variants. Lengthen the text or add diacritics if it persists. |
cannot import name 'X' from 'Project.Sanad.main' |
A route is trying to import a global that lite removed. Add a try/except ImportError in that route or drop the route from dashboard/app.py:_REST_ROUTES. |
Endpoints
GET / → / dashboard (auth-gated)
GET /login → login page
POST /api/auth/login → {username,password} → set cookie
POST /api/auth/logout → clear cookie
GET /api/auth/me → {authenticated, user}
GET /api/health → {status, brain}
GET /api/status → {brain, voice}
GET /api/system/info → host / interfaces / subsystems
GET /api/voice/status → Gemini connection state
POST /api/voice/connect → connect Gemini Live socket
POST /api/voice/disconnect → disconnect
GET /api/voice/api-key → masked current key
POST /api/voice/api-key → {key} → persist new key
POST /api/typed-replay/say → {text,record,record_name} → generates, caches
GET /api/typed-replay/audio/last → streams cached WAV (browser plays it)
POST /api/typed-replay/replay-last → bumps replay counter (audio still client-side)
POST /api/typed-replay/save-last → persists cached generation to records
GET /api/typed-replay/status → engine + session state
GET /api/typed-replay/records → list
DELETE /api/typed-replay/records/{name} → delete one
POST /api/typed-replay/records/{name}/rename
GET /api/records/ → list saved records
GET /api/records/audio/{name}?kind=... → stream a record's WAV
POST /api/records/delete → {record_name} → delete one
POST /api/records/delete-all → wipe data/audio/*.wav + reset index
GET /api/scripts/ → list persona/rule files
POST /api/scripts/load → {name} → file contents
POST /api/scripts/save → {name,content}
POST /api/scripts/create → {name,content}
POST /api/scripts/delete → {name}
GET /api/prompt/ → resolved system prompt
POST /api/prompt/update → {content}
POST /api/prompt/reload → re-read from disk
GET /api/logs/{module}/tail → last N log lines
POST /api/logs/snapshot → save snapshot bundle
GET /api/logs/bundle → download all logs as a zip
GET /api/audio/status → mic/spk mute state (server-side, informational)
WS /ws/logs → live log stream
License / attribution
Internal project for YS Lootah Technology. Trimmed from Sanad — original
Sanad reuses patterns from SanadVoice/gemini_interact and Unitree
unitree_sdk2py.