Full-day voice-stack refactor. Experiments run and reverted:
- Gemini Live HTTP microservice (Python 3.8 env incompat, latency)
- Vosk grammar STT (English lexicon can't decode 'Sanad'; big model
cold-load too slow on Jetson CPU)
Kept architecture:
- Voice/wake_detector.py — pure-numpy energy state machine with
adaptive baseline, burst-audio capture for post-hoc verify.
- Voice/marcus_voice.py — orchestrator with 3 modes
(wake_and_command / always_on / always_on_gated), hysteretic VAD,
pre-silence trim (300 ms pre-roll), DSP pipeline (DC remove,
80 Hz HPF, 0.97 pre-emphasis, peak-normalize), faster-whisper
base.en int8 with beam=8 + temperature fallback [0,0.2,0.4],
fuzzy-match canonicalisation, GARBAGE_PATTERNS + length filter,
/s-/ phonetic wake-verify, full-turn debug WAV recording.
Config-driven vocab (zero hardcoded strings in Python):
- stt.wake_words (33 variants of 'Sanad')
- stt.command_vocab (68 canonical phrases)
- stt.garbage_patterns (17 Whisper noise outputs)
- stt.min_transcription_length, stt.command_vocab_cutoff
Command parser widened (Brain/command_parser.py):
- _RE_SIMPLE_DIR — bare direction + verb+direction combos
('left', 'go back', 'move forward', 'step right', ...)
- _RE_STOP_SIMPLE — bare stop/halt/wait/pause/freeze/hold
- All motion constants sourced from config_Navigation.json
(move_map + step_duration_sec) via API/zmq_api.py; no more
hardcoded 0.3 / 2.0 magic numbers.
API/audio_api.py — _play_pcm now uses AudioClient.PlayStream with
automatic resampling to 16 kHz (matches Sanad's proven pattern).
Removed:
- Voice/vosk_stt.py (and all Vosk references in marcus_voice.py)
- Models/vosk-model-small-en-us-0.15/ (40 MB model + zip)
- All Vosk keys from Config/config_Voice.json
Documentation synced across README, Doc/architecture.md,
Doc/pipeline.md, Doc/functions.md, Doc/controlling.md,
Doc/MARCUS_API.md, Doc/environment.md changelog.
Known limitation: faster-whisper base.en on Jetson CPU + G1
far-field mic yields ~50% command-transcription accuracy due
to model capacity and mic reverberation. Wake + ack + recording
+ trim + Whisper + fuzzy + brain + motion all verified working
end-to-end. Future improvement path (unused): close-talking USB
mic via pactl_parec, or Gemini Live via HTTP microservice.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
91 lines
3.0 KiB
Python
91 lines
3.0 KiB
Python
"""
|
|
zmq_api.py — ZMQ velocity + command interface to Holosoma
|
|
|
|
Previously the PUB socket was bound at module import time. That made the
|
|
module unsafe to re-import from any multiprocessing child (e.g. the LiDAR
|
|
SLAM_worker spawn), because the child would try to rebind the same port
|
|
and crash with `Address already in use`.
|
|
|
|
The bind now lives in init_zmq() — call it once from the brain entrypoint.
|
|
Child processes can import this module without any network side effects.
|
|
"""
|
|
import json
|
|
import os
|
|
import time
|
|
import zmq
|
|
from Core.config_loader import load_config
|
|
from Core.logger import log
|
|
|
|
_cfg = load_config("ZMQ")
|
|
|
|
ZMQ_HOST = _cfg["zmq_host"]
|
|
ZMQ_PORT = _cfg["zmq_port"]
|
|
STOP_ITERATIONS = _cfg["stop_iterations"]
|
|
STOP_DELAY = _cfg["stop_delay"]
|
|
STEP_PAUSE = _cfg["step_pause"]
|
|
|
|
# Shared state. These stay None until init_zmq() is called.
|
|
ctx: zmq.Context = None
|
|
sock: zmq.Socket = None
|
|
_INIT_SETTLE = 0.5 # seconds to let PUB tell subscribers it's alive
|
|
|
|
|
|
def init_zmq() -> zmq.Socket:
|
|
"""
|
|
Bind the PUB socket. Idempotent — safe to call more than once.
|
|
Call from the main (parent) process only. Do NOT call from multiprocessing
|
|
children — they inherit nothing useful from the bound socket anyway.
|
|
"""
|
|
global ctx, sock
|
|
if sock is not None:
|
|
return sock
|
|
ctx = zmq.Context()
|
|
sock = ctx.socket(zmq.PUB)
|
|
sock.bind(f"tcp://{ZMQ_HOST}:{ZMQ_PORT}")
|
|
time.sleep(_INIT_SETTLE)
|
|
log(f"ZMQ PUB bound on tcp://{ZMQ_HOST}:{ZMQ_PORT} (pid={os.getpid()})",
|
|
"info", "zmq")
|
|
return sock
|
|
|
|
|
|
def _ensure_sock() -> zmq.Socket:
|
|
if sock is None:
|
|
raise RuntimeError(
|
|
"zmq_api not initialized — call init_zmq() from the brain "
|
|
"entrypoint before using send_vel/send_cmd/gradual_stop"
|
|
)
|
|
return sock
|
|
|
|
|
|
def get_socket():
|
|
"""Return the shared ZMQ PUB socket (for odometry to reuse)."""
|
|
return _ensure_sock()
|
|
|
|
|
|
def send_vel(vx: float = 0.0, vy: float = 0.0, vyaw: float = 0.0):
|
|
"""Send velocity to Holosoma. vx m/s | vy m/s | vyaw rad/s"""
|
|
_ensure_sock().send_string(json.dumps({"vel": {"vx": vx, "vy": vy, "vyaw": vyaw}}))
|
|
|
|
|
|
def gradual_stop():
|
|
"""Smooth deceleration to zero over ~1 second."""
|
|
s = _ensure_sock()
|
|
for _ in range(STOP_ITERATIONS):
|
|
s.send_string(json.dumps({"vel": {"vx": 0.0, "vy": 0.0, "vyaw": 0.0}}))
|
|
time.sleep(STOP_DELAY)
|
|
|
|
|
|
def send_cmd(cmd: str):
|
|
"""Send Holosoma state command: start | walk | stand | stop"""
|
|
_ensure_sock().send_string(json.dumps({"cmd": cmd}))
|
|
|
|
|
|
# Load navigation constants from config (pure data, safe at import time).
|
|
# MOVE_MAP[direction] = (vx, vy, vyaw). STEP_DURATION_SEC is how long one
|
|
# "step" of a bare directional command lasts (2 s at default velocities
|
|
# ≈ 60 cm forward or 34° turn). Both live in config_Navigation.json so
|
|
# motion can be retuned without editing Python.
|
|
_nav = load_config("Navigation")
|
|
MOVE_MAP = {k: tuple(v) for k, v in _nav["move_map"].items()}
|
|
STEP_DURATION_SEC = float(_nav.get("step_duration_sec", 2.0))
|