Sanad_lite

Multi-user, browser-audio fork of Sanad. The full Sanad robot stack (arm, macros, camera, live conversation subprocess) was stripped out; what remains is a small FastAPI dashboard for typed-replay TTS and saved-record management where all audio plays in each user's own browser, not on the host machine.

┌────────────────────────────────────────────────────────────────────┐
│  Dashboard (FastAPI) ── http://<host>:8000                         │
│  ├─ /login                Cookie-session auth                      │
│  ├─ Voice & Audio         Gemini API key, Typed Replay (TTS)       │
│  ├─ Recordings            Saved WAVs — Play / Raw / Download / Del │
│  │                        plus "Delete All"                        │
│  └─ Settings & Logs       Scripts, system prompt, live log tail    │
└────────────────────────────────────────────────────────────────────┘

Run on your laptop

pip install --user \
  fastapi 'uvicorn[standard]' itsdangerous python-multipart pydantic \
  websockets

cd /home/zedx/Robotics_workspace/yslootahtech/Project/Sanad_lite
SANAD_DASHBOARD_HOST=127.0.0.1 python3 main.py

Open http://127.0.0.1:8000 and sign in with:

Username: lkasjda213h Password: kj812bf@jdon

Setting SANAD_DASHBOARD_HOST=127.0.0.1 keeps the server bound to localhost; omit it to auto-bind to wlan0's IP so colleagues on the LAN can reach it at http://<your-ip>:8000.

The websockets package is needed because the Gemini Live TTS used by Typed Replay opens a WebSocket to Google. Everything else (records list, records delete-all, login, logs) works without it.

Gemini API key — required, none ships with the repo. The api_key in config/core_config.json (gemini_defaults) is intentionally empty (""). Typed Replay / Gemini TTS won't work until you supply one:

paste it in the dashboard → Voice & Audio → Gemini API Key (hot-swap, no restart), or

export SANAD_GEMINI_API_KEY=AIza... before python3 main.py, or

set gemini_defaults.api_key in config/core_config.json.

Get a key at https://aistudio.google.com/apikey.

The other heavy deps (pyaudio, transformers, torch) are listed in requirements.txt but are not required for the lite dashboard. They were leftovers from the parent Sanad project and may still be imported lazily by voice/audio_manager.py / voice/local_tts.py on construction — failures are caught silently in main.py.

Run on the server

Replace the SSH/IP/path placeholders with your server's values:

# 1. Install deps once on the server
ssh <user>@<server-ip> 'pip install itsdangerous fastapi "uvicorn[standard]" python-multipart pydantic websockets'

# 2. Push the lite tree
rsync -av --delete \
  --exclude=__pycache__ --exclude=logs --exclude=data \
  /home/zedx/Robotics_workspace/yslootahtech/Project/Sanad_lite/ \
  <user>@<server-ip>:~/Sanad_lite/

# 3. Start it on the server (SSH in first, then run)
ssh <user>@<server-ip>
cd ~/Sanad_lite
python3 main.py

Then open http://<server-ip>:8000 and sign in with lkasjda213h / kj812bf@jdon.

To leave it running after you log out, use tmux, screen, nohup, or the systemd unit at shell_scripts/sanad.service (edit the paths inside to match your install).

Credentials are in config/core_config.json:

"auth": {
  "username": "lkasjda213h",
  "password": "kj812bf@jdon"
}

Change them before any non-LAN deployment. The session cookie is signed with a fresh secret each time main.py starts, so a restart logs every user out.

For a stronger setup, replace the plaintext check with a bcrypt hash in dashboard/routes/auth.py.

Audio architecture — who plays what, where

Action	Where audio plays
Recordings → Play	each viewing user's browser
Recordings → Raw	each viewing user's browser
Recordings → Download	saves WAV to viewing user's device
Recordings → Delete All	wipes `data/audio/*.wav` on the server
Voice & Audio → Typed Replay → Generate & Play	each viewing user's browser
Voice & Audio → Typed Replay → Replay Last	each viewing user's browser

Server-side ALSA / PulseAudio is not touched for any of the above. Both audio paths use the same pattern:

Server generates / loads the WAV bytes.
Server returns them as audio/wav from an HTTP endpoint (/api/records/audio/{name} or /api/typed-replay/audio/last).
Browser fetches the response into new Audio(url) and calls .play().

So if you host the dashboard on machine A and a colleague on machine B opens http://A:8000 and clicks Play, the sound comes out of B's speakers. Machine A stays silent.

Directory layout

Path	Contents
`main.py`	Entry point — boots subsystems + dashboard.
`config.py`	Runtime constants derived from `config/*_config.json`.
`config/`	Per-subsystem JSON: `core`, `voice`, `gemini`, `dashboard`.
`core/`	Brain (callback whitelist + status), skill registry, event bus, config loader, logger.
`gemini/`	`client.py` — Gemini Live WebSocket client used by typed_replay for one-shot TTS calls.
`voice/`	`typed_replay.py` (server generates, browser plays), `audio_manager.py` (host PyAudio — only used to share a PyAudio instance with typed_replay; degrades gracefully if PyAudio is missing), `local_tts.py` (offline SpeechT5 — unused in the lite UI but kept for the `/api/voice/generate` legacy route), `audio_devices.py`, `text_utils.py`.
`dashboard/`	`app.py` (FastAPI + SessionMiddleware + auth gate), `routes/*.py`, `static/index.html`, `static/login.html`.
`dashboard/routes/`	`auth.py`, `health.py`, `system.py`, `voice.py`, `logs.py`, `audio_control.py`, `scripts.py`, `records.py`, `prompt.py`, `typed_replay.py`, plus `websockets/log_stream.py`.
`scripts/`	`sanad_script.txt` (persona), `sanad_rule.txt` (rules).
`data/audio/`	Generated WAVs from Typed Replay → Save Last. Wiped by "Delete All".
`data/motions/`	Persisted dashboard settings (Gemini API key, G1 volume) — back-compat path.
`logs/`	Per-module rotating logs.
`tests/`	`test_smoke.py` — Brain whitelist, skill registry, wake-phrase matching, atomic IO, audio devices, isolation.

Runtime env vars

Var	Values	Default	Effect
`SANAD_DASHBOARD_HOST`	IP or hostname	wlan0's IP	Override the bind address. Use `127.0.0.1` for localhost-only.
`SANAD_DASHBOARD_INTERFACE`	iface name	`wlan0`	Pick which interface's IP to auto-bind to.
`SANAD_GEMINI_API_KEY`	string	`""` (empty)	Gemini API key. No key ships in the repo — set this, paste one in the dashboard (Voice & Audio → Gemini API Key), or fill `gemini_defaults.api_key` in `config/core_config.json`.

What was stripped vs Sanad (full)

Removed because the lite dashboard never needed them:

Motion / arm: motion/, scripts/sanad_arm.txt, config/motion_config.json, dashboard/routes/{motion,macros,replay,skills}.py.
Live voice conversation: voice/sanad_voice.py, voice/audio_io.py, voice/live_voice_loop.py, voice/wake_phrase_manager.py, voice/model_script.py, voice/model_subprocess.py, gemini/subprocess.py, gemini/script.py, dashboard/routes/{live_voice,live_subprocess,wake_phrases}.py.
Offline brain: local/ (LLM, STT, TTS, VAD), config/local_config.json.
Camera / vision: dashboard/routes/vision.py and all /api/vision/* endpoints, the camera tab UI.
Examples / demos: examples/.
Tabs: Operations, Motion & Replay, Camera & Vision (deprecated), Live Voice Commands card, Wake Phrase Manager card, Live Gemini Process card.

Added by lite:

Login page + session cookie auth (dashboard/routes/auth.py, dashboard/static/login.html, SessionMiddleware).
Browser-side audio streaming — GET /api/records/audio/{name}?kind={speaker,raw} and GET /api/typed-replay/audio/last.
Download button on each saved record.
Delete All button that wipes every WAV under data/audio/.

Troubleshooting

Symptom	Fix
`ModuleNotFoundError: itsdangerous` at startup	`pip install itsdangerous` — required by Starlette's `SessionMiddleware`.
`ModuleNotFoundError: websockets` when generating typed-replay audio	`pip install websockets` — `gemini/client.py` uses it.
Redirected to `/login` on every API call	Session cookie cleared on server restart by design — sign in again.
`Failed to construct audio_mgr — pyaudio not installed` warning at startup	Harmless on a laptop. `voice/audio_manager.py` requires PyAudio + portaudio headers; not needed for any user-facing button. Install with `sudo apt install portaudio19-dev && pip install pyaudio` if you want it gone.
ALSA / PortAudio noise at startup (`pcm_dmix.c`, `Cannot connect to JACK`)	Pre-init probe of PortAudio inside `pyaudio.PyAudio()`. Cosmetic — the lite dashboard never actually opens an ALSA stream. To silence it, drop PyAudio entirely (uninstall + add a `_safe_import` guard for `voice.audio_manager`).
`Gemini TTS attempt N returned no audio — parts: …` then 503	Gemini Live is non-deterministic on short Arabic snippets — it sometimes returns reasoning text instead of audio. The retry chain in `voice/typed_replay.py:generate_audio` tries 3 prompt variants. Lengthen the text or add diacritics if it persists.
`cannot import name 'X' from 'Project.Sanad.main'`	A route is trying to import a global that lite removed. Add a `try/except ImportError` in that route or drop the route from `dashboard/app.py:_REST_ROUTES`.

Endpoints

GET    /                                    → / dashboard (auth-gated)
GET    /login                               → login page
POST   /api/auth/login                      → {username,password} → set cookie
POST   /api/auth/logout                     → clear cookie
GET    /api/auth/me                         → {authenticated, user}

GET    /api/health                          → {status, brain}
GET    /api/status                          → {brain, voice}
GET    /api/system/info                     → host / interfaces / subsystems

GET    /api/voice/status                    → Gemini connection state
POST   /api/voice/connect                   → connect Gemini Live socket
POST   /api/voice/disconnect                → disconnect
GET    /api/voice/api-key                   → masked current key
POST   /api/voice/api-key                   → {key} → persist new key

POST   /api/typed-replay/say                → {text,record,record_name} → generates, caches
GET    /api/typed-replay/audio/last         → streams cached WAV (browser plays it)
POST   /api/typed-replay/replay-last        → bumps replay counter (audio still client-side)
POST   /api/typed-replay/save-last          → persists cached generation to records
GET    /api/typed-replay/status             → engine + session state
GET    /api/typed-replay/records            → list
DELETE /api/typed-replay/records/{name}     → delete one
POST   /api/typed-replay/records/{name}/rename

GET    /api/records/                        → list saved records
GET    /api/records/audio/{name}?kind=...   → stream a record's WAV
POST   /api/records/delete                  → {record_name} → delete one
POST   /api/records/delete-all              → wipe data/audio/*.wav + reset index

GET    /api/scripts/                        → list persona/rule files
POST   /api/scripts/load                    → {name} → file contents
POST   /api/scripts/save                    → {name,content}
POST   /api/scripts/create                  → {name,content}
POST   /api/scripts/delete                  → {name}

GET    /api/prompt/                         → resolved system prompt
POST   /api/prompt/update                   → {content}
POST   /api/prompt/reload                   → re-read from disk

GET    /api/logs/{module}/tail              → last N log lines
POST   /api/logs/snapshot                   → save snapshot bundle
GET    /api/logs/bundle                     → download all logs as a zip
GET    /api/audio/status                    → mic/spk mute state (server-side, informational)
WS     /ws/logs                             → live log stream

License / attribution

Internal project for YS Lootah Technology. Trimmed from Sanad — original Sanad reuses patterns from SanadVoice/gemini_interact and Unitree unitree_sdk2py.

13 KiB Raw Permalink Blame History