# Pre-recorded audio library WAV clips played via `AudioClient.PlayStream` on the G1 speaker. Bypassing `TtsMaker` saves ~200–700 ms of firmware synthesis buffer per announcement and eliminates 3104 "device busy" errors. ## Required format Every file **must** be: - **16 kHz** sample rate - **mono** (1 channel) - **16-bit signed PCM** (`int16`) - `.wav` container Any file not matching this is logged as a warning and skipped (the bridge falls back to `TtsMaker` for that phrase per `config/robot_config.json.tts.mode`). ## Expected files Drop WAVs under the right `/.wav` path so `AudioPlayer` finds them: ``` fixed/ ready.wav "Saqr is running. Press R2 plus X to start." deactivated.wav "Saqr deactivated." no_camera.wav "Camera not connected. Please plug in the camera and try again." safe.wav "Safe to enter. Have a good day." unsafe_generic.wav "Please stop. Wear your proper safety equipment." unsafe_missing/ helmet.wav "Please stop. Wear your proper safety equipment. You are missing helmet." vest.wav "Please stop. Wear your proper safety equipment. You are missing vest." helmet_vest.wav "Please stop. Wear your proper safety equipment. You are missing helmet and vest." ``` **Naming rule for `unsafe_missing/`**: the `` is the missing PPE names sorted alphabetically and joined with `_`. So if someone misses both helmet and vest, the bridge looks up `unsafe_missing/helmet_vest.wav` (not `vest_helmet.wav`). If you extend `compliance.required_ppe` later, add clips for every subset — for 3 required items that's 7 combinations (2³−1). ## Converting existing recordings If your source file is at a different sample rate or stereo, convert with `ffmpeg`: ```bash ffmpeg -y -i input.m4a -ac 1 -ar 16000 -sample_fmt s16 fixed/safe.wav ``` Validate with: ```bash python3 - <<'EOF' import wave, sys with wave.open("fixed/safe.wav", "rb") as wf: print(wf.getframerate(), "Hz,", wf.getnchannels(), "ch,", wf.getsampwidth()*8, "bit") EOF # must print: 16000 Hz, 1 ch, 16 bit ``` ## Recording tips - Quiet room; no echo. - Don't clip — keep peaks below 0 dBFS. - Leave ~100 ms of silence at the start and end so the clip doesn't pop. - Target speaking rate: ~3 syllables/sec. The shortest clip (`deactivated`) should be ~2 s; longest (`no_camera`) around 5–6 s. ## Runtime behaviour In `config/robot_config.json`: ```json "tts": { "mode": "recorded_or_tts" } ``` Modes: - `recorded_or_tts` — play WAV if the clip exists, otherwise fall back to `TtsMaker`. - `recorded_only` — play WAV or stay silent. Useful for demos where you want deterministic audio. Will skip any phrase whose clip is missing. - `tts_only` — ignore the WAV library entirely (current legacy behaviour). After adding or replacing WAVs, restart the bridge to reload the library (`sudo systemctl restart saqr-bridge`).