Marcus

Author	SHA1	Message	Date
kassam	bcb2fbbcdf	Update 2026-04-28 15:55:43	2026-04-28 15:55:45 +04:00
kassam	211d4f52ab	Update 2026-04-27 09:39:12	2026-04-27 09:39:13 +04:00
kassam	5d839d4f4e	Voice: finalise on faster-whisper + energy wake, remove Vosk Full-day voice-stack refactor. Experiments run and reverted: - Gemini Live HTTP microservice (Python 3.8 env incompat, latency) - Vosk grammar STT (English lexicon can't decode 'Sanad'; big model cold-load too slow on Jetson CPU) Kept architecture: - Voice/wake_detector.py — pure-numpy energy state machine with adaptive baseline, burst-audio capture for post-hoc verify. - Voice/marcus_voice.py — orchestrator with 3 modes (wake_and_command / always_on / always_on_gated), hysteretic VAD, pre-silence trim (300 ms pre-roll), DSP pipeline (DC remove, 80 Hz HPF, 0.97 pre-emphasis, peak-normalize), faster-whisper base.en int8 with beam=8 + temperature fallback [0,0.2,0.4], fuzzy-match canonicalisation, GARBAGE_PATTERNS + length filter, /s-/ phonetic wake-verify, full-turn debug WAV recording. Config-driven vocab (zero hardcoded strings in Python): - stt.wake_words (33 variants of 'Sanad') - stt.command_vocab (68 canonical phrases) - stt.garbage_patterns (17 Whisper noise outputs) - stt.min_transcription_length, stt.command_vocab_cutoff Command parser widened (Brain/command_parser.py): - _RE_SIMPLE_DIR — bare direction + verb+direction combos ('left', 'go back', 'move forward', 'step right', ...) - _RE_STOP_SIMPLE — bare stop/halt/wait/pause/freeze/hold - All motion constants sourced from config_Navigation.json (move_map + step_duration_sec) via API/zmq_api.py; no more hardcoded 0.3 / 2.0 magic numbers. API/audio_api.py — _play_pcm now uses AudioClient.PlayStream with automatic resampling to 16 kHz (matches Sanad's proven pattern). Removed: - Voice/vosk_stt.py (and all Vosk references in marcus_voice.py) - Models/vosk-model-small-en-us-0.15/ (40 MB model + zip) - All Vosk keys from Config/config_Voice.json Documentation synced across README, Doc/architecture.md, Doc/pipeline.md, Doc/functions.md, Doc/controlling.md, Doc/MARCUS_API.md, Doc/environment.md changelog. Known limitation: faster-whisper base.en on Jetson CPU + G1 far-field mic yields ~50% command-transcription accuracy due to model capacity and mic reverberation. Wake + ack + recording + trim + Whisper + fuzzy + brain + motion all verified working end-to-end. Future improvement path (unused): close-talking USB mic via pactl_parec, or Gemini Live via HTTP microservice. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 14:32:28 +04:00
kassam	2e3cc1ba5b	Update 2026-04-22 11:39:53	2026-04-22 11:39:54 +04:00
kassam	ac9271c62b	Update 2026-04-22 10:57:22	2026-04-22 10:57:23 +04:00
kassam	e0f6acd5c7	Update 2026-04-21 16:10:00	2026-04-21 16:10:03 +04:00
kassam	8491be7f1e	Initial project commit	2026-04-12 18:50:22 +04:00

7 Commits