Saqr/assets/audio/README.md

# Pre-recorded audio library

WAV clips played via `AudioClient.PlayStream` on the G1 speaker. Bypassing
`TtsMaker` saves ~200–700 ms of firmware synthesis buffer per announcement
and eliminates 3104 "device busy" errors.

## Required format

Every file **must** be:

- **16 kHz** sample rate
- **mono** (1 channel)
- **16-bit signed PCM** (`int16`)
- `.wav` container

Any file not matching this is logged as a warning and skipped (the bridge
falls back to `TtsMaker` for that phrase per `config/robot_config.json.tts.mode`).

## Expected files

Drop WAVs under the right `<category>/<key>.wav` path so `AudioPlayer` finds them:

```
fixed/
  ready.wav             "Saqr is running. Press R2 plus X to start."
  deactivated.wav       "Saqr deactivated."
  no_camera.wav         "Camera not connected. Please plug in the camera and try again."
  safe.wav              "Safe to enter. Have a good day."
  unsafe_generic.wav    "Please stop. Wear your proper safety equipment."

unsafe_missing/
  helmet.wav            "Please stop. Wear your proper safety equipment. You are missing helmet."
  vest.wav              "Please stop. Wear your proper safety equipment. You are missing vest."
  helmet_vest.wav       "Please stop. Wear your proper safety equipment. You are missing helmet and vest."
```

**Naming rule for `unsafe_missing/`**: the `<key>` is the missing PPE names
sorted alphabetically and joined with `_`. So if someone misses both helmet
and vest, the bridge looks up `unsafe_missing/helmet_vest.wav` (not
`vest_helmet.wav`). If you extend `compliance.required_ppe` later, add clips
for every subset — for 3 required items that's 7 combinations (2³−1).

## Converting existing recordings

If your source file is at a different sample rate or stereo, convert with
`ffmpeg`:

```bash
ffmpeg -y -i input.m4a -ac 1 -ar 16000 -sample_fmt s16 fixed/safe.wav
```

Validate with:

```bash
python3 - <<'EOF'
import wave, sys
with wave.open("fixed/safe.wav", "rb") as wf:
    print(wf.getframerate(), "Hz,", wf.getnchannels(), "ch,", wf.getsampwidth()*8, "bit")
EOF
# must print: 16000 Hz, 1 ch, 16 bit
```

## Recording tips

- Quiet room; no echo.
- Don't clip — keep peaks below 0 dBFS.
- Leave ~100 ms of silence at the start and end so the clip doesn't pop.
- Target speaking rate: ~3 syllables/sec. The shortest clip (`deactivated`)
  should be ~2 s; longest (`no_camera`) around 5–6 s.

## Runtime behaviour

In `config/robot_config.json`:

```json
"tts": {
    "mode": "recorded_or_tts"
}
```

Modes:
- `recorded_or_tts` — play WAV if the clip exists, otherwise fall back to `TtsMaker`.
- `recorded_only` — play WAV or stay silent. Useful for demos where you want
  deterministic audio. Will skip any phrase whose clip is missing.
- `tts_only` — ignore the WAV library entirely (current legacy behaviour).

After adding or replacing WAVs, restart the bridge to reload the library
(`sudo systemctl restart saqr-bridge`).