# Sanad_lite Multi-user, browser-audio fork of [Sanad](../Sanad/). The full Sanad robot stack (arm, macros, camera, live conversation subprocess) was stripped out; what remains is a small FastAPI dashboard for **typed-replay TTS** and **saved-record management** where **all audio plays in each user's own browser**, not on the host machine. ``` ┌────────────────────────────────────────────────────────────────────┐ │ Dashboard (FastAPI) ── http://:8000 │ │ ├─ /login Cookie-session auth │ │ ├─ Voice & Audio Gemini API key, Typed Replay (TTS) │ │ ├─ Recordings Saved WAVs — Play / Raw / Download / Del │ │ │ plus "Delete All" │ │ └─ Settings & Logs Scripts, system prompt, live log tail │ └────────────────────────────────────────────────────────────────────┘ ``` ## Run on your laptop ```bash pip install --user \ fastapi 'uvicorn[standard]' itsdangerous python-multipart pydantic \ websockets cd /home/zedx/Robotics_workspace/yslootahtech/Project/Sanad_lite SANAD_DASHBOARD_HOST=127.0.0.1 python3 main.py ``` Open and sign in with: > **Username:** `lkasjda213h` > **Password:** `kj812bf@jdon` Setting `SANAD_DASHBOARD_HOST=127.0.0.1` keeps the server bound to localhost; omit it to auto-bind to `wlan0`'s IP so colleagues on the LAN can reach it at `http://:8000`. The `websockets` package is needed because the Gemini Live TTS used by Typed Replay opens a WebSocket to Google. Everything else (records list, records delete-all, login, logs) works without it. > **Gemini API key — required, none ships with the repo.** The `api_key` > in `config/core_config.json` (`gemini_defaults`) is intentionally empty > (`""`). Typed Replay / Gemini TTS won't work until you supply one: > - paste it in the dashboard → **Voice & Audio → Gemini API Key** (hot-swap, no restart), **or** > - `export SANAD_GEMINI_API_KEY=AIza...` before `python3 main.py`, **or** > - set `gemini_defaults.api_key` in `config/core_config.json`. > > Get a key at . > The other heavy deps (`pyaudio`, `transformers`, `torch`) are listed in > `requirements.txt` but are **not required** for the lite dashboard. > They were leftovers from the parent Sanad project and may still be > imported lazily by `voice/audio_manager.py` / `voice/local_tts.py` > on construction — failures are caught silently in `main.py`. ## Run on the server Replace the SSH/IP/path placeholders with your server's values: ```bash # 1. Install deps once on the server ssh @ 'pip install itsdangerous fastapi "uvicorn[standard]" python-multipart pydantic websockets' # 2. Push the lite tree rsync -av --delete \ --exclude=__pycache__ --exclude=logs --exclude=data \ /home/zedx/Robotics_workspace/yslootahtech/Project/Sanad_lite/ \ @:~/Sanad_lite/ # 3. Start it on the server (SSH in first, then run) ssh @ cd ~/Sanad_lite python3 main.py ``` Then open `http://:8000` and sign in with **`lkasjda213h`** / **`kj812bf@jdon`**. To leave it running after you log out, use `tmux`, `screen`, `nohup`, or the systemd unit at `shell_scripts/sanad.service` (edit the paths inside to match your install). ## Login Credentials are in `config/core_config.json`: ```json "auth": { "username": "lkasjda213h", "password": "kj812bf@jdon" } ``` Change them before any non-LAN deployment. The session cookie is signed with a fresh secret each time `main.py` starts, so a restart logs every user out. For a stronger setup, replace the plaintext check with a bcrypt hash in `dashboard/routes/auth.py`. ## Audio architecture — who plays what, where | Action | Where audio plays | |---|---| | Recordings → **Play** | each viewing user's browser | | Recordings → **Raw** | each viewing user's browser | | Recordings → **Download** | saves WAV to viewing user's device | | Recordings → **Delete All** | wipes `data/audio/*.wav` on the server | | Voice & Audio → **Typed Replay → Generate & Play** | each viewing user's browser | | Voice & Audio → **Typed Replay → Replay Last** | each viewing user's browser | Server-side ALSA / PulseAudio is **not** touched for any of the above. Both audio paths use the same pattern: 1. Server generates / loads the WAV bytes. 2. Server returns them as `audio/wav` from an HTTP endpoint (`/api/records/audio/{name}` or `/api/typed-replay/audio/last`). 3. Browser fetches the response into `new Audio(url)` and calls `.play()`. So if you host the dashboard on machine **A** and a colleague on machine **B** opens `http://A:8000` and clicks Play, the sound comes out of **B's** speakers. Machine A stays silent. ## Directory layout | Path | Contents | |---|---| | `main.py` | Entry point — boots subsystems + dashboard. | | `config.py` | Runtime constants derived from `config/*_config.json`. | | `config/` | Per-subsystem JSON: `core`, `voice`, `gemini`, `dashboard`. | | `core/` | Brain (callback whitelist + status), skill registry, event bus, config loader, logger. | | `gemini/` | `client.py` — Gemini Live WebSocket client used by typed_replay for one-shot TTS calls. | | `voice/` | `typed_replay.py` (server generates, browser plays), `audio_manager.py` (host PyAudio — only used to share a PyAudio instance with typed_replay; degrades gracefully if PyAudio is missing), `local_tts.py` (offline SpeechT5 — unused in the lite UI but kept for the `/api/voice/generate` legacy route), `audio_devices.py`, `text_utils.py`. | | `dashboard/` | `app.py` (FastAPI + SessionMiddleware + auth gate), `routes/*.py`, `static/index.html`, `static/login.html`. | | `dashboard/routes/` | `auth.py`, `health.py`, `system.py`, `voice.py`, `logs.py`, `audio_control.py`, `scripts.py`, `records.py`, `prompt.py`, `typed_replay.py`, plus `websockets/log_stream.py`. | | `scripts/` | `sanad_script.txt` (persona), `sanad_rule.txt` (rules). | | `data/audio/` | Generated WAVs from Typed Replay → Save Last. Wiped by "Delete All". | | `data/motions/` | Persisted dashboard settings (Gemini API key, G1 volume) — back-compat path. | | `logs/` | Per-module rotating logs. | | `tests/` | `test_smoke.py` — Brain whitelist, skill registry, wake-phrase matching, atomic IO, audio devices, isolation. | ## Runtime env vars | Var | Values | Default | Effect | |---|---|---|---| | `SANAD_DASHBOARD_HOST` | IP or hostname | wlan0's IP | Override the bind address. Use `127.0.0.1` for localhost-only. | | `SANAD_DASHBOARD_INTERFACE` | iface name | `wlan0` | Pick which interface's IP to auto-bind to. | | `SANAD_GEMINI_API_KEY` | string | `""` (empty) | Gemini API key. No key ships in the repo — set this, paste one in the dashboard (**Voice & Audio → Gemini API Key**), or fill `gemini_defaults.api_key` in `config/core_config.json`. | ## What was stripped vs Sanad (full) Removed because the lite dashboard never needed them: - **Motion / arm:** `motion/`, `scripts/sanad_arm.txt`, `config/motion_config.json`, `dashboard/routes/{motion,macros,replay,skills}.py`. - **Live voice conversation:** `voice/sanad_voice.py`, `voice/audio_io.py`, `voice/live_voice_loop.py`, `voice/wake_phrase_manager.py`, `voice/model_script.py`, `voice/model_subprocess.py`, `gemini/subprocess.py`, `gemini/script.py`, `dashboard/routes/{live_voice,live_subprocess,wake_phrases}.py`. - **Offline brain:** `local/` (LLM, STT, TTS, VAD), `config/local_config.json`. - **Camera / vision:** `dashboard/routes/vision.py` and all `/api/vision/*` endpoints, the camera tab UI. - **Examples / demos:** `examples/`. - **Tabs:** Operations, Motion & Replay, Camera & Vision (deprecated), Live Voice Commands card, Wake Phrase Manager card, Live Gemini Process card. Added by lite: - **Login page + session cookie auth** (`dashboard/routes/auth.py`, `dashboard/static/login.html`, `SessionMiddleware`). - **Browser-side audio streaming** — `GET /api/records/audio/{name}?kind={speaker,raw}` and `GET /api/typed-replay/audio/last`. - **Download button** on each saved record. - **Delete All button** that wipes every WAV under `data/audio/`. ## Troubleshooting | Symptom | Fix | |---|---| | `ModuleNotFoundError: itsdangerous` at startup | `pip install itsdangerous` — required by Starlette's `SessionMiddleware`. | | `ModuleNotFoundError: websockets` when generating typed-replay audio | `pip install websockets` — `gemini/client.py` uses it. | | Redirected to `/login` on every API call | Session cookie cleared on server restart by design — sign in again. | | `Failed to construct audio_mgr — pyaudio not installed` warning at startup | Harmless on a laptop. `voice/audio_manager.py` requires PyAudio + portaudio headers; not needed for any user-facing button. Install with `sudo apt install portaudio19-dev && pip install pyaudio` if you want it gone. | | ALSA / PortAudio noise at startup (`pcm_dmix.c`, `Cannot connect to JACK`) | Pre-init probe of PortAudio inside `pyaudio.PyAudio()`. Cosmetic — the lite dashboard never actually opens an ALSA stream. To silence it, drop PyAudio entirely (uninstall + add a `_safe_import` guard for `voice.audio_manager`). | | `Gemini TTS attempt N returned no audio — parts: …` then 503 | Gemini Live is non-deterministic on short Arabic snippets — it sometimes returns reasoning text instead of audio. The retry chain in `voice/typed_replay.py:generate_audio` tries 3 prompt variants. Lengthen the text or add diacritics if it persists. | | `cannot import name 'X' from 'Project.Sanad.main'` | A route is trying to import a global that lite removed. Add a `try/except ImportError` in that route or drop the route from `dashboard/app.py:_REST_ROUTES`. | ## Endpoints ``` GET / → / dashboard (auth-gated) GET /login → login page POST /api/auth/login → {username,password} → set cookie POST /api/auth/logout → clear cookie GET /api/auth/me → {authenticated, user} GET /api/health → {status, brain} GET /api/status → {brain, voice} GET /api/system/info → host / interfaces / subsystems GET /api/voice/status → Gemini connection state POST /api/voice/connect → connect Gemini Live socket POST /api/voice/disconnect → disconnect GET /api/voice/api-key → masked current key POST /api/voice/api-key → {key} → persist new key POST /api/typed-replay/say → {text,record,record_name} → generates, caches GET /api/typed-replay/audio/last → streams cached WAV (browser plays it) POST /api/typed-replay/replay-last → bumps replay counter (audio still client-side) POST /api/typed-replay/save-last → persists cached generation to records GET /api/typed-replay/status → engine + session state GET /api/typed-replay/records → list DELETE /api/typed-replay/records/{name} → delete one POST /api/typed-replay/records/{name}/rename GET /api/records/ → list saved records GET /api/records/audio/{name}?kind=... → stream a record's WAV POST /api/records/delete → {record_name} → delete one POST /api/records/delete-all → wipe data/audio/*.wav + reset index GET /api/scripts/ → list persona/rule files POST /api/scripts/load → {name} → file contents POST /api/scripts/save → {name,content} POST /api/scripts/create → {name,content} POST /api/scripts/delete → {name} GET /api/prompt/ → resolved system prompt POST /api/prompt/update → {content} POST /api/prompt/reload → re-read from disk GET /api/logs/{module}/tail → last N log lines POST /api/logs/snapshot → save snapshot bundle GET /api/logs/bundle → download all logs as a zip GET /api/audio/status → mic/spk mute state (server-side, informational) WS /ws/logs → live log stream ``` ## License / attribution Internal project for YS Lootah Technology. Trimmed from Sanad — original Sanad reuses patterns from `SanadVoice/gemini_interact` and Unitree `unitree_sdk2py`.