Saqr/assets/audio/README.md

89 lines
2.9 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Pre-recorded audio library
WAV clips played via `AudioClient.PlayStream` on the G1 speaker. Bypassing
`TtsMaker` saves ~200700 ms of firmware synthesis buffer per announcement
and eliminates 3104 "device busy" errors.
## Required format
Every file **must** be:
- **16 kHz** sample rate
- **mono** (1 channel)
- **16-bit signed PCM** (`int16`)
- `.wav` container
Any file not matching this is logged as a warning and skipped (the bridge
falls back to `TtsMaker` for that phrase per `config/robot_config.json.tts.mode`).
## Expected files
Drop WAVs under the right `<category>/<key>.wav` path so `AudioPlayer` finds them:
```
fixed/
ready.wav "Saqr is running. Press R2 plus X to start."
deactivated.wav "Saqr deactivated."
no_camera.wav "Camera not connected. Please plug in the camera and try again."
safe.wav "Safe to enter. Have a good day."
unsafe_generic.wav "Please stop. Wear your proper safety equipment."
unsafe_missing/
helmet.wav "Please stop. Wear your proper safety equipment. You are missing helmet."
vest.wav "Please stop. Wear your proper safety equipment. You are missing vest."
helmet_vest.wav "Please stop. Wear your proper safety equipment. You are missing helmet and vest."
```
**Naming rule for `unsafe_missing/`**: the `<key>` is the missing PPE names
sorted alphabetically and joined with `_`. So if someone misses both helmet
and vest, the bridge looks up `unsafe_missing/helmet_vest.wav` (not
`vest_helmet.wav`). If you extend `compliance.required_ppe` later, add clips
for every subset — for 3 required items that's 7 combinations (2³1).
## Converting existing recordings
If your source file is at a different sample rate or stereo, convert with
`ffmpeg`:
```bash
ffmpeg -y -i input.m4a -ac 1 -ar 16000 -sample_fmt s16 fixed/safe.wav
```
Validate with:
```bash
python3 - <<'EOF'
import wave, sys
with wave.open("fixed/safe.wav", "rb") as wf:
print(wf.getframerate(), "Hz,", wf.getnchannels(), "ch,", wf.getsampwidth()*8, "bit")
EOF
# must print: 16000 Hz, 1 ch, 16 bit
```
## Recording tips
- Quiet room; no echo.
- Don't clip — keep peaks below 0 dBFS.
- Leave ~100 ms of silence at the start and end so the clip doesn't pop.
- Target speaking rate: ~3 syllables/sec. The shortest clip (`deactivated`)
should be ~2 s; longest (`no_camera`) around 56 s.
## Runtime behaviour
In `config/robot_config.json`:
```json
"tts": {
"mode": "recorded_or_tts"
}
```
Modes:
- `recorded_or_tts` — play WAV if the clip exists, otherwise fall back to `TtsMaker`.
- `recorded_only` — play WAV or stay silent. Useful for demos where you want
deterministic audio. Will skip any phrase whose clip is missing.
- `tts_only` — ignore the WAV library entirely (current legacy behaviour).
After adding or replacing WAVs, restart the bridge to reload the library
(`sudo systemctl restart saqr-bridge`).