Sanad_lite/README.md

# Sanad_lite

Multi-user, browser-audio fork of [Sanad](../Sanad/). The full Sanad robot
stack (arm, macros, camera, live conversation subprocess) was stripped out;
what remains is a small FastAPI dashboard for **typed-replay TTS** and
**saved-record management** where **all audio plays in each user's own
browser**, not on the host machine.

```
┌────────────────────────────────────────────────────────────────────┐
│  Dashboard (FastAPI) ── http://<host>:8000                         │
│  ├─ /login                Cookie-session auth                      │
│  ├─ Voice & Audio         Gemini API key, Typed Replay (TTS)       │
│  ├─ Recordings            Saved WAVs — Play / Raw / Download / Del │
│  │                        plus "Delete All"                        │
│  └─ Settings & Logs       Scripts, system prompt, live log tail    │
└────────────────────────────────────────────────────────────────────┘
```


## Run on your laptop

```bash
pip install --user \
  fastapi 'uvicorn[standard]' itsdangerous python-multipart pydantic \
  websockets

cd /home/zedx/Robotics_workspace/yslootahtech/Project/Sanad_lite
SANAD_DASHBOARD_HOST=127.0.0.1 python3 main.py
```

Open <http://127.0.0.1:8000> and sign in with:

> **Username:** `lkasjda213h`
> **Password:** `kj812bf@jdon`

Setting `SANAD_DASHBOARD_HOST=127.0.0.1` keeps the server bound to
localhost; omit it to auto-bind to `wlan0`'s IP so colleagues on the LAN
can reach it at `http://<your-ip>:8000`.

The `websockets` package is needed because the Gemini Live TTS used by
Typed Replay opens a WebSocket to Google. Everything else (records list,
records delete-all, login, logs) works without it.

> **Gemini API key — required, none ships with the repo.** The `api_key`
> in `config/core_config.json` (`gemini_defaults`) is intentionally empty
> (`""`). Typed Replay / Gemini TTS won't work until you supply one:
> - paste it in the dashboard → **Voice & Audio → Gemini API Key** (hot-swap, no restart), **or**
> - `export SANAD_GEMINI_API_KEY=AIza...` before `python3 main.py`, **or**
> - set `gemini_defaults.api_key` in `config/core_config.json`.
>
> Get a key at <https://aistudio.google.com/apikey>.

> The other heavy deps (`pyaudio`, `transformers`, `torch`) are listed in
> `requirements.txt` but are **not required** for the lite dashboard.
> They were leftovers from the parent Sanad project and may still be
> imported lazily by `voice/audio_manager.py` / `voice/local_tts.py`
> on construction — failures are caught silently in `main.py`.


## Run on the server

Replace the SSH/IP/path placeholders with your server's values:

```bash
# 1. Install deps once on the server
ssh <user>@<server-ip> 'pip install itsdangerous fastapi "uvicorn[standard]" python-multipart pydantic websockets'

# 2. Push the lite tree
rsync -av --delete \
  --exclude=__pycache__ --exclude=logs --exclude=data \
  /home/zedx/Robotics_workspace/yslootahtech/Project/Sanad_lite/ \
  <user>@<server-ip>:~/Sanad_lite/

# 3. Start it on the server (SSH in first, then run)
ssh <user>@<server-ip>
cd ~/Sanad_lite
python3 main.py
```

Then open `http://<server-ip>:8000` and sign in with **`lkasjda213h`** /
**`kj812bf@jdon`**.

To leave it running after you log out, use `tmux`, `screen`, `nohup`, or
the systemd unit at `shell_scripts/sanad.service` (edit the paths inside
to match your install).


## Login

Credentials are in `config/core_config.json`:
```json
"auth": {
  "username": "lkasjda213h",
  "password": "kj812bf@jdon"
}
```

Change them before any non-LAN deployment. The session cookie is signed
with a fresh secret each time `main.py` starts, so a restart logs every
user out.

For a stronger setup, replace the plaintext check with a bcrypt hash in
`dashboard/routes/auth.py`.


## Audio architecture — who plays what, where

| Action | Where audio plays |
|---|---|
| Recordings → **Play** | each viewing user's browser |
| Recordings → **Raw** | each viewing user's browser |
| Recordings → **Download** | saves WAV to viewing user's device |
| Recordings → **Delete All** | wipes `data/audio/*.wav` on the server |
| Voice & Audio → **Typed Replay → Generate & Play** | each viewing user's browser |
| Voice & Audio → **Typed Replay → Replay Last** | each viewing user's browser |

Server-side ALSA / PulseAudio is **not** touched for any of the above.
Both audio paths use the same pattern:

1. Server generates / loads the WAV bytes.
2. Server returns them as `audio/wav` from an HTTP endpoint
   (`/api/records/audio/{name}` or `/api/typed-replay/audio/last`).
3. Browser fetches the response into `new Audio(url)` and calls `.play()`.

So if you host the dashboard on machine **A** and a colleague on machine
**B** opens `http://A:8000` and clicks Play, the sound comes out of **B's**
speakers. Machine A stays silent.


## Directory layout

| Path | Contents |
|---|---|
| `main.py` | Entry point — boots subsystems + dashboard. |
| `config.py` | Runtime constants derived from `config/*_config.json`. |
| `config/` | Per-subsystem JSON: `core`, `voice`, `gemini`, `dashboard`. |
| `core/` | Brain (callback whitelist + status), skill registry, event bus, config loader, logger. |
| `gemini/` | `client.py` — Gemini Live WebSocket client used by typed_replay for one-shot TTS calls. |
| `voice/` | `typed_replay.py` (server generates, browser plays), `audio_manager.py` (host PyAudio — only used to share a PyAudio instance with typed_replay; degrades gracefully if PyAudio is missing), `local_tts.py` (offline SpeechT5 — unused in the lite UI but kept for the `/api/voice/generate` legacy route), `audio_devices.py`, `text_utils.py`. |
| `dashboard/` | `app.py` (FastAPI + SessionMiddleware + auth gate), `routes/*.py`, `static/index.html`, `static/login.html`. |
| `dashboard/routes/` | `auth.py`, `health.py`, `system.py`, `voice.py`, `logs.py`, `audio_control.py`, `scripts.py`, `records.py`, `prompt.py`, `typed_replay.py`, plus `websockets/log_stream.py`. |
| `scripts/` | `sanad_script.txt` (persona), `sanad_rule.txt` (rules). |
| `data/audio/` | Generated WAVs from Typed Replay → Save Last. Wiped by "Delete All". |
| `data/motions/` | Persisted dashboard settings (Gemini API key, G1 volume) — back-compat path. |
| `logs/` | Per-module rotating logs. |
| `tests/` | `test_smoke.py` — Brain whitelist, skill registry, wake-phrase matching, atomic IO, audio devices, isolation. |


## Runtime env vars

| Var | Values | Default | Effect |
|---|---|---|---|
| `SANAD_DASHBOARD_HOST` | IP or hostname | wlan0's IP | Override the bind address. Use `127.0.0.1` for localhost-only. |
| `SANAD_DASHBOARD_INTERFACE` | iface name | `wlan0` | Pick which interface's IP to auto-bind to. |
| `SANAD_GEMINI_API_KEY` | string | `""` (empty) | Gemini API key. No key ships in the repo — set this, paste one in the dashboard (**Voice & Audio → Gemini API Key**), or fill `gemini_defaults.api_key` in `config/core_config.json`. |


## What was stripped vs Sanad (full)

Removed because the lite dashboard never needed them:

- **Motion / arm:** `motion/`, `scripts/sanad_arm.txt`, `config/motion_config.json`, `dashboard/routes/{motion,macros,replay,skills}.py`.
- **Live voice conversation:** `voice/sanad_voice.py`, `voice/audio_io.py`, `voice/live_voice_loop.py`, `voice/wake_phrase_manager.py`, `voice/model_script.py`, `voice/model_subprocess.py`, `gemini/subprocess.py`, `gemini/script.py`, `dashboard/routes/{live_voice,live_subprocess,wake_phrases}.py`.
- **Offline brain:** `local/` (LLM, STT, TTS, VAD), `config/local_config.json`.
- **Camera / vision:** `dashboard/routes/vision.py` and all `/api/vision/*` endpoints, the camera tab UI.
- **Examples / demos:** `examples/`.
- **Tabs:** Operations, Motion & Replay, Camera & Vision (deprecated), Live Voice Commands card, Wake Phrase Manager card, Live Gemini Process card.

Added by lite:

- **Login page + session cookie auth** (`dashboard/routes/auth.py`, `dashboard/static/login.html`, `SessionMiddleware`).
- **Browser-side audio streaming** — `GET /api/records/audio/{name}?kind={speaker,raw}` and `GET /api/typed-replay/audio/last`.
- **Download button** on each saved record.
- **Delete All button** that wipes every WAV under `data/audio/`.


## Troubleshooting

| Symptom | Fix |
|---|---|
| `ModuleNotFoundError: itsdangerous` at startup | `pip install itsdangerous` — required by Starlette's `SessionMiddleware`. |
| `ModuleNotFoundError: websockets` when generating typed-replay audio | `pip install websockets` — `gemini/client.py` uses it. |
| Redirected to `/login` on every API call | Session cookie cleared on server restart by design — sign in again. |
| `Failed to construct audio_mgr — pyaudio not installed` warning at startup | Harmless on a laptop. `voice/audio_manager.py` requires PyAudio + portaudio headers; not needed for any user-facing button. Install with `sudo apt install portaudio19-dev && pip install pyaudio` if you want it gone. |
| ALSA / PortAudio noise at startup (`pcm_dmix.c`, `Cannot connect to JACK`) | Pre-init probe of PortAudio inside `pyaudio.PyAudio()`. Cosmetic — the lite dashboard never actually opens an ALSA stream. To silence it, drop PyAudio entirely (uninstall + add a `_safe_import` guard for `voice.audio_manager`). |
| `Gemini TTS attempt N returned no audio — parts: …` then 503 | Gemini Live is non-deterministic on short Arabic snippets — it sometimes returns reasoning text instead of audio. The retry chain in `voice/typed_replay.py:generate_audio` tries 3 prompt variants. Lengthen the text or add diacritics if it persists. |
| `cannot import name 'X' from 'Project.Sanad.main'` | A route is trying to import a global that lite removed. Add a `try/except ImportError` in that route or drop the route from `dashboard/app.py:_REST_ROUTES`. |


## Endpoints

```
GET    /                                    → / dashboard (auth-gated)
GET    /login                               → login page
POST   /api/auth/login                      → {username,password} → set cookie
POST   /api/auth/logout                     → clear cookie
GET    /api/auth/me                         → {authenticated, user}

GET    /api/health                          → {status, brain}
GET    /api/status                          → {brain, voice}
GET    /api/system/info                     → host / interfaces / subsystems

GET    /api/voice/status                    → Gemini connection state
POST   /api/voice/connect                   → connect Gemini Live socket
POST   /api/voice/disconnect                → disconnect
GET    /api/voice/api-key                   → masked current key
POST   /api/voice/api-key                   → {key} → persist new key

POST   /api/typed-replay/say                → {text,record,record_name} → generates, caches
GET    /api/typed-replay/audio/last         → streams cached WAV (browser plays it)
POST   /api/typed-replay/replay-last        → bumps replay counter (audio still client-side)
POST   /api/typed-replay/save-last          → persists cached generation to records
GET    /api/typed-replay/status             → engine + session state
GET    /api/typed-replay/records            → list
DELETE /api/typed-replay/records/{name}     → delete one
POST   /api/typed-replay/records/{name}/rename

GET    /api/records/                        → list saved records
GET    /api/records/audio/{name}?kind=...   → stream a record's WAV
POST   /api/records/delete                  → {record_name} → delete one
POST   /api/records/delete-all              → wipe data/audio/*.wav + reset index

GET    /api/scripts/                        → list persona/rule files
POST   /api/scripts/load                    → {name} → file contents
POST   /api/scripts/save                    → {name,content}
POST   /api/scripts/create                  → {name,content}
POST   /api/scripts/delete                  → {name}

GET    /api/prompt/                         → resolved system prompt
POST   /api/prompt/update                   → {content}
POST   /api/prompt/reload                   → re-read from disk

GET    /api/logs/{module}/tail              → last N log lines
POST   /api/logs/snapshot                   → save snapshot bundle
GET    /api/logs/bundle                     → download all logs as a zip
GET    /api/audio/status                    → mic/spk mute state (server-side, informational)
WS     /ws/logs                             → live log stream
```


## License / attribution

Internal project for YS Lootah Technology. Trimmed from Sanad — original
Sanad reuses patterns from `SanadVoice/gemini_interact` and Unitree
`unitree_sdk2py`.