Full-day voice-stack refactor. Experiments run and reverted:
- Gemini Live HTTP microservice (Python 3.8 env incompat, latency)
- Vosk grammar STT (English lexicon can't decode 'Sanad'; big model
cold-load too slow on Jetson CPU)
Kept architecture:
- Voice/wake_detector.py — pure-numpy energy state machine with
adaptive baseline, burst-audio capture for post-hoc verify.
- Voice/marcus_voice.py — orchestrator with 3 modes
(wake_and_command / always_on / always_on_gated), hysteretic VAD,
pre-silence trim (300 ms pre-roll), DSP pipeline (DC remove,
80 Hz HPF, 0.97 pre-emphasis, peak-normalize), faster-whisper
base.en int8 with beam=8 + temperature fallback [0,0.2,0.4],
fuzzy-match canonicalisation, GARBAGE_PATTERNS + length filter,
/s-/ phonetic wake-verify, full-turn debug WAV recording.
Config-driven vocab (zero hardcoded strings in Python):
- stt.wake_words (33 variants of 'Sanad')
- stt.command_vocab (68 canonical phrases)
- stt.garbage_patterns (17 Whisper noise outputs)
- stt.min_transcription_length, stt.command_vocab_cutoff
Command parser widened (Brain/command_parser.py):
- _RE_SIMPLE_DIR — bare direction + verb+direction combos
('left', 'go back', 'move forward', 'step right', ...)
- _RE_STOP_SIMPLE — bare stop/halt/wait/pause/freeze/hold
- All motion constants sourced from config_Navigation.json
(move_map + step_duration_sec) via API/zmq_api.py; no more
hardcoded 0.3 / 2.0 magic numbers.
API/audio_api.py — _play_pcm now uses AudioClient.PlayStream with
automatic resampling to 16 kHz (matches Sanad's proven pattern).
Removed:
- Voice/vosk_stt.py (and all Vosk references in marcus_voice.py)
- Models/vosk-model-small-en-us-0.15/ (40 MB model + zip)
- All Vosk keys from Config/config_Voice.json
Documentation synced across README, Doc/architecture.md,
Doc/pipeline.md, Doc/functions.md, Doc/controlling.md,
Doc/MARCUS_API.md, Doc/environment.md changelog.
Known limitation: faster-whisper base.en on Jetson CPU + G1
far-field mic yields ~50% command-transcription accuracy due
to model capacity and mic reverberation. Wake + ack + recording
+ trim + Whisper + fuzzy + brain + motion all verified working
end-to-end. Future improvement path (unused): close-talking USB
mic via pactl_parec, or Gemini Live via HTTP microservice.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
180 lines
12 KiB
Markdown
180 lines
12 KiB
Markdown
# Marcus — Function Inventory
|
||
|
||
**Robot persona:** Sanad (wake word + self-intro)
|
||
**Updated:** 2026-04-21
|
||
|
||
Every callable in the codebase, grouped by layer. Generated from AST, kept in sync with the source. See `architecture.md` for where each module lives and `pipeline.md` for how they connect.
|
||
|
||
**Totals:** 25 importable modules · 73 top-level functions · 9 public classes.
|
||
|
||
---
|
||
|
||
## `run_marcus.py` — entrypoint
|
||
|
||
Script only. Prepends `PROJECT_ROOT` to `sys.path`, then calls `Brain.marcus_brain.run_terminal()` in `__main__`.
|
||
|
||
---
|
||
|
||
## `Core/` — foundation, no external deps
|
||
|
||
| File | Function | Purpose |
|
||
|---|---|---|
|
||
| `env_loader.py` | `_find_env_file()`, `_load_dotenv(path)` | find + parse `.env` into `os.environ`; exports `PROJECT_ROOT` |
|
||
| `config_loader.py` | `load_config(name)`, `config_path(relative)` | cached reader for `Config/config_{name}.json` |
|
||
| `log_backend.py` | `_rotating_handler(path)` + **class `Logs`** | custom logging engine; all handlers are `RotatingFileHandler` (5 MB × 3) |
|
||
| `logger.py` | `get_logger(module)`, `log(msg, level, module)`, `log_and_print(msg, level, module)` | project-wide logging façade |
|
||
|
||
**`Core.log_backend.Logs`** methods:
|
||
`__init__(default_log_level, main_log_file)`, `_choose_fallback_log_dir`, `_normalize_log_name`, `_is_writable_path`, `_with_fallback`, `resolve_log_path`, `construct_path`, `log_to_file`, `LogEngine(folder, log_name)`, `LogsMessages(msg, type, folder, file)`, `print_and_log(...)`.
|
||
|
||
---
|
||
|
||
## `API/` — subsystem wrappers (Brain imports only from here)
|
||
|
||
| File | Public functions |
|
||
|---|---|
|
||
| `zmq_api.py` | `init_zmq()`, `get_socket()`, `send_vel(vx, vy, vyaw)`, `gradual_stop()`, `send_cmd(cmd)` |
|
||
| `camera_api.py` | `start_camera()`, `stop_camera()`, `get_frame()`, `get_frame_age()`, `get_raw_refs()`, `camera_loop()` |
|
||
| `llava_api.py` | `call_llava(prompt, img_b64, num_predict, use_history)`, `ask(command, img_b64)`, `ask_goal(goal, img_b64)`, `ask_talk(command, img_b64, facts)`, `ask_verify(target, condition, img_b64)`, `ask_patrol(img_b64)`, `remember_fact(fact)`, `add_to_history(user_msg, assistant_msg)`, `parse_json(raw)` |
|
||
| `yolo_api.py` | `init_yolo(raw_frame_ref, frame_lock)` + 8 stubs rebound on success: `yolo_sees`, `yolo_count`, `yolo_closest`, `yolo_summary`, `yolo_ppe_violations`, `yolo_person_too_close`, `yolo_all_classes`, `yolo_fps` |
|
||
| `odometry_api.py` | `init_odometry(zmq_sock)`, `get_position()` |
|
||
| `memory_api.py` | `init_memory()`, `log_cmd(cmd, response, duration)`, `log_detection(class_name, position, distance)`, `place_save(name)`, `place_goto(name)`, `places_list_str()` |
|
||
| `arm_api.py` | `do_arm(action)` — G1 GR00T stub |
|
||
| `imgsearch_api.py` | `init_imgsearch(get_frame_fn, send_vel_fn, gradual_stop_fn, llava_fn, yolo_sees_fn, model)`, `get_searcher()` |
|
||
| `audio_api.py` | **class `AudioAPI`** (see below) |
|
||
| `lidar_api.py` | `init_lidar()`, `obstacle_ahead(radius)`, `get_slam_pose()`, `get_nav_cmd()`, `get_loc_state()`, `get_safety_reasons()`, `get_lidar_status()`, `get_client()`, `stop_lidar()` |
|
||
|
||
**`API.audio_api.AudioAPI`** methods:
|
||
`speak(text, lang="en")`, `record(seconds)` → np.int16 array, `play_pcm(audio_16k)`, `save_recording(audio, name)`, properties `is_speaking`, `is_available`. Internal: `_init_sdk`, `_mute_mic`, `_unmute_mic`, `_resample`, `_play_pcm`, `_record_builtin`, `_record_parec`.
|
||
|
||
---
|
||
|
||
## `Voice/` — mic + TTS + wake + STT
|
||
|
||
| File | Public API |
|
||
|---|---|
|
||
| `builtin_mic.py` | `_find_g1_local_ip()` + **class `BuiltinMic`** |
|
||
| `builtin_tts.py` | **class `BuiltinTTS`** |
|
||
| `wake_detector.py` | **dataclass `WakeConfig`** + **class `WakeDetector`** |
|
||
| `marcus_voice.py` | module-level `WAKE_WORDS`, `COMMAND_VOCAB`, `GARBAGE_PATTERNS` (populated from config), helpers `_has_wake_word`, `_strip_wake_word`, `_strip_wake_word_once`, `_closest_command`, **class `VoiceModule`** |
|
||
|
||
**`Voice.builtin_mic.BuiltinMic`** — G1 UDP multicast mic:
|
||
`__init__(group, port, buf_max, read_timeout)`, `start()`, `stop()`, `read_chunk(num_bytes)`, `read_seconds(seconds)`, `flush()`; internal `_recv_loop`.
|
||
|
||
**`Voice.builtin_tts.BuiltinTTS`** — wraps `AudioClient.TtsMaker`:
|
||
`__init__(audio_client, default_speaker_id=0)`, `speak(text, speaker_id=None, block=True)`.
|
||
|
||
**`Voice.wake_detector.WakeDetector`** — pure-numpy energy wake:
|
||
`__init__(cfg: WakeConfig)`, `process(pcm_bytes) -> bool`, `reset()`, `get_last_burst() -> np.ndarray | None`. Internal: `_step(window)` state-machine per 50 ms analysis window; adaptive `_baseline_buf` rolling mean of idle-silence RMS; captures triggering burst audio for post-hoc Whisper verify.
|
||
|
||
**`Voice.marcus_voice.VoiceModule`** — voice orchestrator. Drives the wake detector, verifies each fire with a lightweight Whisper decode (wake-word substring OR /s-/ phonetic match), records commands with a hysteretic VAD, trims pre-speech silence, transcribes via faster-whisper, fuzzy-normalises near-misses to canonical commands, dispatches to brain.
|
||
`__init__(audio_api, on_command=None, on_wake=None)`, `start()`, `stop()`, `is_running` property. Internal: `_get_fw()` lazy faster-whisper loader, `_read_mic_raw` / `_read_mic_gained`, `_record_command()` with adaptive VAD + pre-silence trim, `_transcribe(audio)` Whisper decode + garbage filter, `_transcribe_command(audio)` thin wrapper, `_normalize_command(text)` fuzzy-match to `COMMAND_VOCAB`, `_handle_wake()` / `_voice_loop()` / `_voice_loop_wake()` / `_voice_loop_always_on(gated)`, `_save_unk_wav(audio)` for post-mortem debugging.
|
||
|
||
---
|
||
|
||
## `Vision/`
|
||
|
||
| File | Public API |
|
||
|---|---|
|
||
| `marcus_yolo.py` | `start_yolo(raw_frame_ref, frame_lock)`, `yolo_sees(class, min_confidence)`, `yolo_count(class)`, `yolo_closest(class)`, `yolo_all_classes()`, `yolo_summary()`, `yolo_ppe_violations()`, `yolo_person_too_close(threshold)`, `yolo_is_running()`, `yolo_fps()`, `_resolve_device(requested)` + **class `Detection`** |
|
||
| `marcus_imgsearch.py` | **class `ImageSearch`** + prompt helpers `_build_compare_prompt`, `_build_single_prompt`, image utils `_load_image_b64`, `_numpy_to_b64`, `_resize_b64` |
|
||
|
||
**`Vision.marcus_yolo.Detection`** — a single detection's metadata:
|
||
`__init__(class_name, confidence, x1, y1, x2, y2, frame_w, frame_h)`, props `size_ratio`, `position`, `distance_estimate`, method `to_dict()`, `__repr__`.
|
||
|
||
**`Vision.marcus_imgsearch.ImageSearch`** — rotate-and-compare search:
|
||
`__init__(get_frame_fn, send_vel_fn, gradual_stop_fn, llava_fn, yolo_sees_fn, model)`, `search(ref_img_b64, hint, max_steps, direction, yolo_prefilter)`, `search_from_file(image_path, hint, max_steps, direction)`, `abort()`.
|
||
|
||
---
|
||
|
||
## `Navigation/`
|
||
|
||
| File | Public API |
|
||
|---|---|
|
||
| `goal_nav.py` | `navigate_to_goal(goal, max_steps)`; private `_goal_yolo_target`, `_extract_extra_condition`, `_verify_condition` |
|
||
| `patrol.py` | `patrol(duration_minutes, alert_callback)` |
|
||
| `marcus_odometry.py` | **class `Odometry`** |
|
||
|
||
**`Navigation.marcus_odometry.Odometry`** — ROS2 `/dog_odom` + dead-reckoning fallback:
|
||
- lifecycle: `__init__()`, `start(zmq_sock)`, `stop()`, `reset()`, `is_running()`
|
||
- pose: `get_position()` → `{x, y, heading, source}`, `get_distance_from_start()`, `status_str()`, `__repr__`
|
||
- movement: `walk_distance(meters, speed, direction)`, `turn_degrees(degrees, speed)`, `navigate_to(x, y, heading, speed)`, `return_to_start(speed)`, `patrol_route(waypoints, speed, loop)`
|
||
- internal: `_init_own_zmq`, `_reset_state`, `_try_start_ros2`, `_dead_reckoning_loop`, `_send_vel`, `_gradual_stop`, `_check_stale`, `_time_based_walk`, `_time_based_turn`
|
||
|
||
---
|
||
|
||
## `Brain/`
|
||
|
||
| File | Public API |
|
||
|---|---|
|
||
| `marcus_brain.py` | `init_brain()`, `process_command(cmd)` → `{type, speak, action, elapsed}`, `get_brain_status()`, `shutdown()`, `run_terminal()`; private `_init_voice`, `_handle_llava`, `_handle_talk`, `_handle_search`, `_warmup_llava` |
|
||
| `command_parser.py` | `init_autonomous(auto_instance)`, `try_local_command(cmd)` (regex-table dispatcher); `_print_help`, `_print_examples` |
|
||
| `executor.py` | `execute(d)`, `execute_action(move, duration)`, `move_step(move, duration)`, `merge_actions(actions)`; `_obstacle_check` |
|
||
| `marcus_memory.py` | **class `Memory`** + utils `_read_json`, `_write_json`, `_sanitize_name`, `_fuzzy_match`, `_new_session_id` |
|
||
|
||
**`Brain.marcus_memory.Memory`** — places + sessions store, JSON-backed:
|
||
- places: `save_place(name, x, y, heading)`, `get_place(name)`, `delete_place(name)`, `list_places()`, `rename_place(old, new)`, `places_count()`
|
||
- sessions: `start_session()`, `end_session()`, `log_command(cmd, response, duration_s)`, `log_detection(class, pos, dist, x, y)`, `log_alert(type, detail)`, `get_last_command()`, `get_last_n_commands(n)`, `get_session_detections()`, `commands_count()`, `session_duration_str()`
|
||
- history: `last_session_summary()`, `previous_session_detections()`, `previous_session_places()`, `all_sessions()`
|
||
- internal: `_load_places`, `_start_autosave`, `_flush_session`, `_emergency_save`, `_write_summary`, `_prune_old_sessions`, `_get_previous_session_dir`
|
||
|
||
---
|
||
|
||
## `Autonomous/`
|
||
|
||
`marcus_autonomous.py` — **class `AutonomousMode`**: patrol-and-map state machine.
|
||
|
||
- `__init__(get_frame_fn, send_vel_fn, gradual_stop_fn, yolo_sees_fn, yolo_summary_fn, yolo_all_classes_fn, yolo_closest_fn, odom_fn, call_llava_fn, patrol_prompt, mem, models_dir)`
|
||
- lifecycle: `enable()`, `disable()`, `is_enabled()`, `status()`, `save_snapshot()`
|
||
- internal: `_explore_loop`, `_move_forward`, `_turn`, `_assess_scene`, `_create_map_dir`, `_save_observations`, `_save_path`, `_save_frame`, `_generate_summary`, `_save_session`, `_print_summary`
|
||
|
||
---
|
||
|
||
## `Server/` & `Bridge/`
|
||
|
||
| File | Public API |
|
||
|---|---|
|
||
| `Server/marcus_server.py` | `async handler(websocket)`, `async broadcast_frames()`, `async run_server(host, port)`, `main()`; helpers `_get_interface_ips`, `_check_lidar` |
|
||
| `Bridge/ros2_zmq_bridge.py` | **class `ROS2ZMQBridge`** (`_vel_cb`, `_cmd_cb`) + `main()` — standalone tool, not imported by Marcus |
|
||
|
||
---
|
||
|
||
## Suggested import surface for integration code
|
||
|
||
If you're writing glue on top of Marcus, the stable public surface is:
|
||
|
||
```python
|
||
# brain orchestration
|
||
from Brain.marcus_brain import init_brain, process_command, shutdown
|
||
|
||
# direct robot control (bypasses brain)
|
||
from API.zmq_api import init_zmq, send_vel, gradual_stop, send_cmd
|
||
from API.yolo_api import yolo_sees, yolo_summary, yolo_closest
|
||
from API.camera_api import start_camera, get_frame
|
||
from API.audio_api import AudioAPI # .speak(text), .record(seconds)
|
||
from API.lidar_api import init_lidar, obstacle_ahead, get_slam_pose, stop_lidar
|
||
from API.memory_api import init_memory, log_cmd, log_detection, place_save, place_goto
|
||
|
||
# voice pipeline
|
||
from Voice.marcus_voice import VoiceModule
|
||
from Voice.builtin_mic import BuiltinMic
|
||
from Voice.builtin_tts import BuiltinTTS
|
||
|
||
# navigation
|
||
from Navigation.goal_nav import navigate_to_goal
|
||
from Navigation.patrol import patrol
|
||
from Navigation.marcus_odometry import Odometry
|
||
|
||
# autonomous mode
|
||
from Autonomous.marcus_autonomous import AutonomousMode
|
||
```
|
||
|
||
---
|
||
|
||
## Convention notes
|
||
|
||
- **All layers above Core must import from `API.*` only** (not directly from `Vision/`, `Navigation/`, `Voice/`). Enforced by convention, not the language.
|
||
- **Underscore prefix = private.** `_foo` is internal; don't import it outside the module unless you're the test harness.
|
||
- **Stub rebinding pattern** (e.g. `API.yolo_api`): module-level placeholders get replaced with real implementations inside `init_*()` on success. If init fails, callers keep getting the safe stub (e.g. `yolo_sees` returns `False`).
|
||
- **Error returns are consistent per layer**: API layer returns `None` / empty dict / `False`; Brain layer returns structured dicts (`{"type","speak","action","elapsed"}`); no exception leaks to the terminal loop except at startup (`init_brain()` will raise to surface hardware issues like missing CUDA).
|