247 lines
13 KiB
Markdown
247 lines
13 KiB
Markdown
# Sanad_lite
|
|
|
|
Multi-user, browser-audio fork of [Sanad](../Sanad/). The full Sanad robot
|
|
stack (arm, macros, camera, live conversation subprocess) was stripped out;
|
|
what remains is a small FastAPI dashboard for **typed-replay TTS** and
|
|
**saved-record management** where **all audio plays in each user's own
|
|
browser**, not on the host machine.
|
|
|
|
```
|
|
┌────────────────────────────────────────────────────────────────────┐
|
|
│ Dashboard (FastAPI) ── http://<host>:8000 │
|
|
│ ├─ /login Cookie-session auth │
|
|
│ ├─ Voice & Audio Gemini API key, Typed Replay (TTS) │
|
|
│ ├─ Recordings Saved WAVs — Play / Raw / Download / Del │
|
|
│ │ plus "Delete All" │
|
|
│ └─ Settings & Logs Scripts, system prompt, live log tail │
|
|
└────────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
|
|
## Run on your laptop
|
|
|
|
```bash
|
|
pip install --user \
|
|
fastapi 'uvicorn[standard]' itsdangerous python-multipart pydantic \
|
|
websockets
|
|
|
|
cd /home/zedx/Robotics_workspace/yslootahtech/Project/Sanad_lite
|
|
SANAD_DASHBOARD_HOST=127.0.0.1 python3 main.py
|
|
```
|
|
|
|
Open <http://127.0.0.1:8000> and sign in with:
|
|
|
|
> **Username:** `lkasjda213h`
|
|
> **Password:** `kj812bf@jdon`
|
|
|
|
Setting `SANAD_DASHBOARD_HOST=127.0.0.1` keeps the server bound to
|
|
localhost; omit it to auto-bind to `wlan0`'s IP so colleagues on the LAN
|
|
can reach it at `http://<your-ip>:8000`.
|
|
|
|
The `websockets` package is needed because the Gemini Live TTS used by
|
|
Typed Replay opens a WebSocket to Google. Everything else (records list,
|
|
records delete-all, login, logs) works without it.
|
|
|
|
> **Gemini API key — required, none ships with the repo.** The `api_key`
|
|
> in `config/core_config.json` (`gemini_defaults`) is intentionally empty
|
|
> (`""`). Typed Replay / Gemini TTS won't work until you supply one:
|
|
> - paste it in the dashboard → **Voice & Audio → Gemini API Key** (hot-swap, no restart), **or**
|
|
> - `export SANAD_GEMINI_API_KEY=AIza...` before `python3 main.py`, **or**
|
|
> - set `gemini_defaults.api_key` in `config/core_config.json`.
|
|
>
|
|
> Get a key at <https://aistudio.google.com/apikey>.
|
|
|
|
> The other heavy deps (`pyaudio`, `transformers`, `torch`) are listed in
|
|
> `requirements.txt` but are **not required** for the lite dashboard.
|
|
> They were leftovers from the parent Sanad project and may still be
|
|
> imported lazily by `voice/audio_manager.py` / `voice/local_tts.py`
|
|
> on construction — failures are caught silently in `main.py`.
|
|
|
|
|
|
## Run on the server
|
|
|
|
Replace the SSH/IP/path placeholders with your server's values:
|
|
|
|
```bash
|
|
# 1. Install deps once on the server
|
|
ssh <user>@<server-ip> 'pip install itsdangerous fastapi "uvicorn[standard]" python-multipart pydantic websockets'
|
|
|
|
# 2. Push the lite tree
|
|
rsync -av --delete \
|
|
--exclude=__pycache__ --exclude=logs --exclude=data \
|
|
/home/zedx/Robotics_workspace/yslootahtech/Project/Sanad_lite/ \
|
|
<user>@<server-ip>:~/Sanad_lite/
|
|
|
|
# 3. Start it on the server (SSH in first, then run)
|
|
ssh <user>@<server-ip>
|
|
cd ~/Sanad_lite
|
|
python3 main.py
|
|
```
|
|
|
|
Then open `http://<server-ip>:8000` and sign in with **`lkasjda213h`** /
|
|
**`kj812bf@jdon`**.
|
|
|
|
To leave it running after you log out, use `tmux`, `screen`, `nohup`, or
|
|
the systemd unit at `shell_scripts/sanad.service` (edit the paths inside
|
|
to match your install).
|
|
|
|
|
|
## Login
|
|
|
|
Credentials are in `config/core_config.json`:
|
|
```json
|
|
"auth": {
|
|
"username": "lkasjda213h",
|
|
"password": "kj812bf@jdon"
|
|
}
|
|
```
|
|
|
|
Change them before any non-LAN deployment. The session cookie is signed
|
|
with a fresh secret each time `main.py` starts, so a restart logs every
|
|
user out.
|
|
|
|
For a stronger setup, replace the plaintext check with a bcrypt hash in
|
|
`dashboard/routes/auth.py`.
|
|
|
|
|
|
## Audio architecture — who plays what, where
|
|
|
|
| Action | Where audio plays |
|
|
|---|---|
|
|
| Recordings → **Play** | each viewing user's browser |
|
|
| Recordings → **Raw** | each viewing user's browser |
|
|
| Recordings → **Download** | saves WAV to viewing user's device |
|
|
| Recordings → **Delete All** | wipes `data/audio/*.wav` on the server |
|
|
| Voice & Audio → **Typed Replay → Generate & Play** | each viewing user's browser |
|
|
| Voice & Audio → **Typed Replay → Replay Last** | each viewing user's browser |
|
|
|
|
Server-side ALSA / PulseAudio is **not** touched for any of the above.
|
|
Both audio paths use the same pattern:
|
|
|
|
1. Server generates / loads the WAV bytes.
|
|
2. Server returns them as `audio/wav` from an HTTP endpoint
|
|
(`/api/records/audio/{name}` or `/api/typed-replay/audio/last`).
|
|
3. Browser fetches the response into `new Audio(url)` and calls `.play()`.
|
|
|
|
So if you host the dashboard on machine **A** and a colleague on machine
|
|
**B** opens `http://A:8000` and clicks Play, the sound comes out of **B's**
|
|
speakers. Machine A stays silent.
|
|
|
|
|
|
## Directory layout
|
|
|
|
| Path | Contents |
|
|
|---|---|
|
|
| `main.py` | Entry point — boots subsystems + dashboard. |
|
|
| `config.py` | Runtime constants derived from `config/*_config.json`. |
|
|
| `config/` | Per-subsystem JSON: `core`, `voice`, `gemini`, `dashboard`. |
|
|
| `core/` | Brain (callback whitelist + status), skill registry, event bus, config loader, logger. |
|
|
| `gemini/` | `client.py` — Gemini Live WebSocket client used by typed_replay for one-shot TTS calls. |
|
|
| `voice/` | `typed_replay.py` (server generates, browser plays), `audio_manager.py` (host PyAudio — only used to share a PyAudio instance with typed_replay; degrades gracefully if PyAudio is missing), `local_tts.py` (offline SpeechT5 — unused in the lite UI but kept for the `/api/voice/generate` legacy route), `audio_devices.py`, `text_utils.py`. |
|
|
| `dashboard/` | `app.py` (FastAPI + SessionMiddleware + auth gate), `routes/*.py`, `static/index.html`, `static/login.html`. |
|
|
| `dashboard/routes/` | `auth.py`, `health.py`, `system.py`, `voice.py`, `logs.py`, `audio_control.py`, `scripts.py`, `records.py`, `prompt.py`, `typed_replay.py`, plus `websockets/log_stream.py`. |
|
|
| `scripts/` | `sanad_script.txt` (persona), `sanad_rule.txt` (rules). |
|
|
| `data/audio/` | Generated WAVs from Typed Replay → Save Last. Wiped by "Delete All". |
|
|
| `data/motions/` | Persisted dashboard settings (Gemini API key, G1 volume) — back-compat path. |
|
|
| `logs/` | Per-module rotating logs. |
|
|
| `tests/` | `test_smoke.py` — Brain whitelist, skill registry, wake-phrase matching, atomic IO, audio devices, isolation. |
|
|
|
|
|
|
## Runtime env vars
|
|
|
|
| Var | Values | Default | Effect |
|
|
|---|---|---|---|
|
|
| `SANAD_DASHBOARD_HOST` | IP or hostname | wlan0's IP | Override the bind address. Use `127.0.0.1` for localhost-only. |
|
|
| `SANAD_DASHBOARD_INTERFACE` | iface name | `wlan0` | Pick which interface's IP to auto-bind to. |
|
|
| `SANAD_GEMINI_API_KEY` | string | `""` (empty) | Gemini API key. No key ships in the repo — set this, paste one in the dashboard (**Voice & Audio → Gemini API Key**), or fill `gemini_defaults.api_key` in `config/core_config.json`. |
|
|
|
|
|
|
## What was stripped vs Sanad (full)
|
|
|
|
Removed because the lite dashboard never needed them:
|
|
|
|
- **Motion / arm:** `motion/`, `scripts/sanad_arm.txt`, `config/motion_config.json`, `dashboard/routes/{motion,macros,replay,skills}.py`.
|
|
- **Live voice conversation:** `voice/sanad_voice.py`, `voice/audio_io.py`, `voice/live_voice_loop.py`, `voice/wake_phrase_manager.py`, `voice/model_script.py`, `voice/model_subprocess.py`, `gemini/subprocess.py`, `gemini/script.py`, `dashboard/routes/{live_voice,live_subprocess,wake_phrases}.py`.
|
|
- **Offline brain:** `local/` (LLM, STT, TTS, VAD), `config/local_config.json`.
|
|
- **Camera / vision:** `dashboard/routes/vision.py` and all `/api/vision/*` endpoints, the camera tab UI.
|
|
- **Examples / demos:** `examples/`.
|
|
- **Tabs:** Operations, Motion & Replay, Camera & Vision (deprecated), Live Voice Commands card, Wake Phrase Manager card, Live Gemini Process card.
|
|
|
|
Added by lite:
|
|
|
|
- **Login page + session cookie auth** (`dashboard/routes/auth.py`, `dashboard/static/login.html`, `SessionMiddleware`).
|
|
- **Browser-side audio streaming** — `GET /api/records/audio/{name}?kind={speaker,raw}` and `GET /api/typed-replay/audio/last`.
|
|
- **Download button** on each saved record.
|
|
- **Delete All button** that wipes every WAV under `data/audio/`.
|
|
|
|
|
|
## Troubleshooting
|
|
|
|
| Symptom | Fix |
|
|
|---|---|
|
|
| `ModuleNotFoundError: itsdangerous` at startup | `pip install itsdangerous` — required by Starlette's `SessionMiddleware`. |
|
|
| `ModuleNotFoundError: websockets` when generating typed-replay audio | `pip install websockets` — `gemini/client.py` uses it. |
|
|
| Redirected to `/login` on every API call | Session cookie cleared on server restart by design — sign in again. |
|
|
| `Failed to construct audio_mgr — pyaudio not installed` warning at startup | Harmless on a laptop. `voice/audio_manager.py` requires PyAudio + portaudio headers; not needed for any user-facing button. Install with `sudo apt install portaudio19-dev && pip install pyaudio` if you want it gone. |
|
|
| ALSA / PortAudio noise at startup (`pcm_dmix.c`, `Cannot connect to JACK`) | Pre-init probe of PortAudio inside `pyaudio.PyAudio()`. Cosmetic — the lite dashboard never actually opens an ALSA stream. To silence it, drop PyAudio entirely (uninstall + add a `_safe_import` guard for `voice.audio_manager`). |
|
|
| `Gemini TTS attempt N returned no audio — parts: …` then 503 | Gemini Live is non-deterministic on short Arabic snippets — it sometimes returns reasoning text instead of audio. The retry chain in `voice/typed_replay.py:generate_audio` tries 3 prompt variants. Lengthen the text or add diacritics if it persists. |
|
|
| `cannot import name 'X' from 'Project.Sanad.main'` | A route is trying to import a global that lite removed. Add a `try/except ImportError` in that route or drop the route from `dashboard/app.py:_REST_ROUTES`. |
|
|
|
|
|
|
## Endpoints
|
|
|
|
```
|
|
GET / → / dashboard (auth-gated)
|
|
GET /login → login page
|
|
POST /api/auth/login → {username,password} → set cookie
|
|
POST /api/auth/logout → clear cookie
|
|
GET /api/auth/me → {authenticated, user}
|
|
|
|
GET /api/health → {status, brain}
|
|
GET /api/status → {brain, voice}
|
|
GET /api/system/info → host / interfaces / subsystems
|
|
|
|
GET /api/voice/status → Gemini connection state
|
|
POST /api/voice/connect → connect Gemini Live socket
|
|
POST /api/voice/disconnect → disconnect
|
|
GET /api/voice/api-key → masked current key
|
|
POST /api/voice/api-key → {key} → persist new key
|
|
|
|
POST /api/typed-replay/say → {text,record,record_name} → generates, caches
|
|
GET /api/typed-replay/audio/last → streams cached WAV (browser plays it)
|
|
POST /api/typed-replay/replay-last → bumps replay counter (audio still client-side)
|
|
POST /api/typed-replay/save-last → persists cached generation to records
|
|
GET /api/typed-replay/status → engine + session state
|
|
GET /api/typed-replay/records → list
|
|
DELETE /api/typed-replay/records/{name} → delete one
|
|
POST /api/typed-replay/records/{name}/rename
|
|
|
|
GET /api/records/ → list saved records
|
|
GET /api/records/audio/{name}?kind=... → stream a record's WAV
|
|
POST /api/records/delete → {record_name} → delete one
|
|
POST /api/records/delete-all → wipe data/audio/*.wav + reset index
|
|
|
|
GET /api/scripts/ → list persona/rule files
|
|
POST /api/scripts/load → {name} → file contents
|
|
POST /api/scripts/save → {name,content}
|
|
POST /api/scripts/create → {name,content}
|
|
POST /api/scripts/delete → {name}
|
|
|
|
GET /api/prompt/ → resolved system prompt
|
|
POST /api/prompt/update → {content}
|
|
POST /api/prompt/reload → re-read from disk
|
|
|
|
GET /api/logs/{module}/tail → last N log lines
|
|
POST /api/logs/snapshot → save snapshot bundle
|
|
GET /api/logs/bundle → download all logs as a zip
|
|
GET /api/audio/status → mic/spk mute state (server-side, informational)
|
|
WS /ws/logs → live log stream
|
|
```
|
|
|
|
|
|
## License / attribution
|
|
|
|
Internal project for YS Lootah Technology. Trimmed from Sanad — original
|
|
Sanad reuses patterns from `SanadVoice/gemini_interact` and Unitree
|
|
`unitree_sdk2py`.
|