Saqr/assets/audio/README.md

2.9 KiB
Raw Blame History

Pre-recorded audio library

WAV clips played via AudioClient.PlayStream on the G1 speaker. Bypassing TtsMaker saves ~200700 ms of firmware synthesis buffer per announcement and eliminates 3104 "device busy" errors.

Required format

Every file must be:

  • 16 kHz sample rate
  • mono (1 channel)
  • 16-bit signed PCM (int16)
  • .wav container

Any file not matching this is logged as a warning and skipped (the bridge falls back to TtsMaker for that phrase per config/robot_config.json.tts.mode).

Expected files

Drop WAVs under the right <category>/<key>.wav path so AudioPlayer finds them:

fixed/
  ready.wav             "Saqr is running. Press R2 plus X to start."
  deactivated.wav       "Saqr deactivated."
  no_camera.wav         "Camera not connected. Please plug in the camera and try again."
  safe.wav              "Safe to enter. Have a good day."
  unsafe_generic.wav    "Please stop. Wear your proper safety equipment."

unsafe_missing/
  helmet.wav            "Please stop. Wear your proper safety equipment. You are missing helmet."
  vest.wav              "Please stop. Wear your proper safety equipment. You are missing vest."
  helmet_vest.wav       "Please stop. Wear your proper safety equipment. You are missing helmet and vest."

Naming rule for unsafe_missing/: the <key> is the missing PPE names sorted alphabetically and joined with _. So if someone misses both helmet and vest, the bridge looks up unsafe_missing/helmet_vest.wav (not vest_helmet.wav). If you extend compliance.required_ppe later, add clips for every subset — for 3 required items that's 7 combinations (2³1).

Converting existing recordings

If your source file is at a different sample rate or stereo, convert with ffmpeg:

ffmpeg -y -i input.m4a -ac 1 -ar 16000 -sample_fmt s16 fixed/safe.wav

Validate with:

python3 - <<'EOF'
import wave, sys
with wave.open("fixed/safe.wav", "rb") as wf:
    print(wf.getframerate(), "Hz,", wf.getnchannels(), "ch,", wf.getsampwidth()*8, "bit")
EOF
# must print: 16000 Hz, 1 ch, 16 bit

Recording tips

  • Quiet room; no echo.
  • Don't clip — keep peaks below 0 dBFS.
  • Leave ~100 ms of silence at the start and end so the clip doesn't pop.
  • Target speaking rate: ~3 syllables/sec. The shortest clip (deactivated) should be ~2 s; longest (no_camera) around 56 s.

Runtime behaviour

In config/robot_config.json:

"tts": {
    "mode": "recorded_or_tts"
}

Modes:

  • recorded_or_tts — play WAV if the clip exists, otherwise fall back to TtsMaker.
  • recorded_only — play WAV or stay silent. Useful for demos where you want deterministic audio. Will skip any phrase whose clip is missing.
  • tts_only — ignore the WAV library entirely (current legacy behaviour).

After adding or replacing WAVs, restart the bridge to reload the library (sudo systemctl restart saqr-bridge).