2.9 KiB
Pre-recorded audio library
WAV clips played via AudioClient.PlayStream on the G1 speaker. Bypassing
TtsMaker saves ~200–700 ms of firmware synthesis buffer per announcement
and eliminates 3104 "device busy" errors.
Required format
Every file must be:
- 16 kHz sample rate
- mono (1 channel)
- 16-bit signed PCM (
int16) .wavcontainer
Any file not matching this is logged as a warning and skipped (the bridge
falls back to TtsMaker for that phrase per config/robot_config.json.tts.mode).
Expected files
Drop WAVs under the right <category>/<key>.wav path so AudioPlayer finds them:
fixed/
ready.wav "Saqr is running. Press R2 plus X to start."
deactivated.wav "Saqr deactivated."
no_camera.wav "Camera not connected. Please plug in the camera and try again."
safe.wav "Safe to enter. Have a good day."
unsafe_generic.wav "Please stop. Wear your proper safety equipment."
unsafe_missing/
helmet.wav "Please stop. Wear your proper safety equipment. You are missing helmet."
vest.wav "Please stop. Wear your proper safety equipment. You are missing vest."
helmet_vest.wav "Please stop. Wear your proper safety equipment. You are missing helmet and vest."
Naming rule for unsafe_missing/: the <key> is the missing PPE names
sorted alphabetically and joined with _. So if someone misses both helmet
and vest, the bridge looks up unsafe_missing/helmet_vest.wav (not
vest_helmet.wav). If you extend compliance.required_ppe later, add clips
for every subset — for 3 required items that's 7 combinations (2³−1).
Converting existing recordings
If your source file is at a different sample rate or stereo, convert with
ffmpeg:
ffmpeg -y -i input.m4a -ac 1 -ar 16000 -sample_fmt s16 fixed/safe.wav
Validate with:
python3 - <<'EOF'
import wave, sys
with wave.open("fixed/safe.wav", "rb") as wf:
print(wf.getframerate(), "Hz,", wf.getnchannels(), "ch,", wf.getsampwidth()*8, "bit")
EOF
# must print: 16000 Hz, 1 ch, 16 bit
Recording tips
- Quiet room; no echo.
- Don't clip — keep peaks below 0 dBFS.
- Leave ~100 ms of silence at the start and end so the clip doesn't pop.
- Target speaking rate: ~3 syllables/sec. The shortest clip (
deactivated) should be ~2 s; longest (no_camera) around 5–6 s.
Runtime behaviour
In config/robot_config.json:
"tts": {
"mode": "recorded_or_tts"
}
Modes:
recorded_or_tts— play WAV if the clip exists, otherwise fall back toTtsMaker.recorded_only— play WAV or stay silent. Useful for demos where you want deterministic audio. Will skip any phrase whose clip is missing.tts_only— ignore the WAV library entirely (current legacy behaviour).
After adding or replacing WAVs, restart the bridge to reload the library
(sudo systemctl restart saqr-bridge).