89 lines
2.9 KiB
Markdown
89 lines
2.9 KiB
Markdown
# Pre-recorded audio library
|
||
|
||
WAV clips played via `AudioClient.PlayStream` on the G1 speaker. Bypassing
|
||
`TtsMaker` saves ~200–700 ms of firmware synthesis buffer per announcement
|
||
and eliminates 3104 "device busy" errors.
|
||
|
||
## Required format
|
||
|
||
Every file **must** be:
|
||
|
||
- **16 kHz** sample rate
|
||
- **mono** (1 channel)
|
||
- **16-bit signed PCM** (`int16`)
|
||
- `.wav` container
|
||
|
||
Any file not matching this is logged as a warning and skipped (the bridge
|
||
falls back to `TtsMaker` for that phrase per `config/robot_config.json.tts.mode`).
|
||
|
||
## Expected files
|
||
|
||
Drop WAVs under the right `<category>/<key>.wav` path so `AudioPlayer` finds them:
|
||
|
||
```
|
||
fixed/
|
||
ready.wav "Saqr is running. Press R2 plus X to start."
|
||
deactivated.wav "Saqr deactivated."
|
||
no_camera.wav "Camera not connected. Please plug in the camera and try again."
|
||
safe.wav "Safe to enter. Have a good day."
|
||
unsafe_generic.wav "Please stop. Wear your proper safety equipment."
|
||
|
||
unsafe_missing/
|
||
helmet.wav "Please stop. Wear your proper safety equipment. You are missing helmet."
|
||
vest.wav "Please stop. Wear your proper safety equipment. You are missing vest."
|
||
helmet_vest.wav "Please stop. Wear your proper safety equipment. You are missing helmet and vest."
|
||
```
|
||
|
||
**Naming rule for `unsafe_missing/`**: the `<key>` is the missing PPE names
|
||
sorted alphabetically and joined with `_`. So if someone misses both helmet
|
||
and vest, the bridge looks up `unsafe_missing/helmet_vest.wav` (not
|
||
`vest_helmet.wav`). If you extend `compliance.required_ppe` later, add clips
|
||
for every subset — for 3 required items that's 7 combinations (2³−1).
|
||
|
||
## Converting existing recordings
|
||
|
||
If your source file is at a different sample rate or stereo, convert with
|
||
`ffmpeg`:
|
||
|
||
```bash
|
||
ffmpeg -y -i input.m4a -ac 1 -ar 16000 -sample_fmt s16 fixed/safe.wav
|
||
```
|
||
|
||
Validate with:
|
||
|
||
```bash
|
||
python3 - <<'EOF'
|
||
import wave, sys
|
||
with wave.open("fixed/safe.wav", "rb") as wf:
|
||
print(wf.getframerate(), "Hz,", wf.getnchannels(), "ch,", wf.getsampwidth()*8, "bit")
|
||
EOF
|
||
# must print: 16000 Hz, 1 ch, 16 bit
|
||
```
|
||
|
||
## Recording tips
|
||
|
||
- Quiet room; no echo.
|
||
- Don't clip — keep peaks below 0 dBFS.
|
||
- Leave ~100 ms of silence at the start and end so the clip doesn't pop.
|
||
- Target speaking rate: ~3 syllables/sec. The shortest clip (`deactivated`)
|
||
should be ~2 s; longest (`no_camera`) around 5–6 s.
|
||
|
||
## Runtime behaviour
|
||
|
||
In `config/robot_config.json`:
|
||
|
||
```json
|
||
"tts": {
|
||
"mode": "recorded_or_tts"
|
||
}
|
||
```
|
||
|
||
Modes:
|
||
- `recorded_or_tts` — play WAV if the clip exists, otherwise fall back to `TtsMaker`.
|
||
- `recorded_only` — play WAV or stay silent. Useful for demos where you want
|
||
deterministic audio. Will skip any phrase whose clip is missing.
|
||
- `tts_only` — ignore the WAV library entirely (current legacy behaviour).
|
||
|
||
After adding or replacing WAVs, restart the bridge to reload the library
|
||
(`sudo systemctl restart saqr-bridge`).
|