Marcus/Doc/functions.md

193 lines
13 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Marcus — Function Inventory
**Robot persona:** Sanad (wake word + self-intro)
**Updated:** 2026-04-21
Every callable in the codebase, grouped by layer. Generated from AST, kept in sync with the source. See `architecture.md` for where each module lives and `pipeline.md` for how they connect.
**Totals:** 25 importable modules · 73 top-level functions · 9 public classes.
---
## `run_marcus.py` — entrypoint
Script only. Prepends `PROJECT_ROOT` to `sys.path`, then calls `Brain.marcus_brain.run_terminal()` in `__main__`.
---
## `Core/` — foundation, no external deps
| File | Function | Purpose |
|---|---|---|
| `env_loader.py` | `_find_env_file()`, `_load_dotenv(path)` | find + parse `.env` into `os.environ`; exports `PROJECT_ROOT` |
| `config_loader.py` | `load_config(name)`, `config_path(relative)` | cached reader for `Config/config_{name}.json` |
| `log_backend.py` | `_rotating_handler(path)` + **class `Logs`** | custom logging engine; all handlers are `RotatingFileHandler` (5 MB × 3) |
| `logger.py` | `get_logger(module)`, `log(msg, level, module)`, `log_and_print(msg, level, module)` | project-wide logging façade |
**`Core.log_backend.Logs`** methods:
`__init__(default_log_level, main_log_file)`, `_choose_fallback_log_dir`, `_normalize_log_name`, `_is_writable_path`, `_with_fallback`, `resolve_log_path`, `construct_path`, `log_to_file`, `LogEngine(folder, log_name)`, `LogsMessages(msg, type, folder, file)`, `print_and_log(...)`.
---
## `API/` — subsystem wrappers (Brain imports only from here)
| File | Public functions |
|---|---|
| `zmq_api.py` | `init_zmq()`, `get_socket()`, `send_vel(vx, vy, vyaw)`, `gradual_stop()`, `send_cmd(cmd)` |
| `camera_api.py` | `start_camera()`, `stop_camera()`, `get_frame()`, `get_frame_age()`, `get_raw_refs()`, `camera_loop()` |
| `llava_api.py` | `call_llava(prompt, img_b64, num_predict, use_history)`, `ask(command, img_b64)`, `ask_goal(goal, img_b64)`, `ask_talk(command, img_b64, facts)`, `ask_verify(target, condition, img_b64)`, `ask_patrol(img_b64)`, `remember_fact(fact)`, `add_to_history(user_msg, assistant_msg)`, `parse_json(raw)` |
| `yolo_api.py` | `init_yolo(raw_frame_ref, frame_lock)` + 8 stubs rebound on success: `yolo_sees`, `yolo_count`, `yolo_closest`, `yolo_summary`, `yolo_ppe_violations`, `yolo_person_too_close`, `yolo_all_classes`, `yolo_fps` |
| `odometry_api.py` | `init_odometry(zmq_sock)`, `get_position()` |
| `memory_api.py` | `init_memory()`, `log_cmd(cmd, response, duration)`, `log_detection(class_name, position, distance)`, `place_save(name)`, `place_goto(name)`, `places_list_str()` |
| `arm_api.py` | `do_arm(action)` — G1 GR00T stub |
| `imgsearch_api.py` | `init_imgsearch(get_frame_fn, send_vel_fn, gradual_stop_fn, llava_fn, yolo_sees_fn, model)`, `get_searcher()` |
| `audio_api.py` | **class `AudioAPI`** (see below) |
| `lidar_api.py` | `init_lidar()`, `obstacle_ahead(radius)`, `get_slam_pose()`, `get_nav_cmd()`, `get_loc_state()`, `get_safety_reasons()`, `get_lidar_status()`, `get_client()`, `stop_lidar()` |
**`API.audio_api.AudioAPI`** methods:
`speak(text, lang="en")`, `record(seconds)` → np.int16 array, `play_pcm(audio_16k)`, `save_recording(audio, name)`, properties `is_speaking`, `is_available`. Internal: `_init_sdk`, `_mute_mic`, `_unmute_mic`, `_resample`, `_play_pcm`, `_record_builtin`, `_record_parec`.
---
## `Voice/` — audio I/O + Gemini Live STT + TtsMaker
| File | Public API |
|---|---|
| `audio_io.py` | `_find_g1_local_ip()`, `_resample_int16`, `_as_int16_array`, abstract **classes `Mic`, `Speaker`**, concrete **classes `BuiltinMic`, `BuiltinSpeaker`**, **dataclass `AudioIO`** with `from_profile()` factory |
| `builtin_mic.py` | **class `BuiltinMic`** (subclass of `audio_io.BuiltinMic` + `read_seconds()` for `AudioAPI.record()`) |
| `builtin_tts.py` | **class `BuiltinTTS`** (used by `AudioAPI.speak()`) |
| `gemini_script.py` | module-level `_load_voice_cfg()`, `_audio_energy()`, **class `GeminiBrain`** |
| `turn_recorder.py` | **class `TurnRecorder`** |
| `marcus_voice.py` | module-level `WAKE_WORDS`, `COMMAND_VOCAB`, `GARBAGE_PATTERNS` (populated from config), helpers `_has_wake_word`, `_strip_wake_word`, `_strip_wake_word_once`, `_closest_command`, **class `VoiceModule`** |
**`Voice.audio_io.BuiltinMic`** — G1 UDP multicast mic (Sanad-pattern port):
`__init__(group, port, buf_max)`, `start()`, `stop()`, `read_chunk(num_bytes)`, `flush()`; internal `_recv_loop`.
**`Voice.audio_io.BuiltinSpeaker`** — streaming wrapper over `AudioClient.PlayStream` (built but idle in STT-only mode; TtsMaker owns the speaker):
`__init__(audio_client, app_name=None)`, `begin_stream()`, `send_chunk(pcm, source_rate)`, `wait_finish()`, `stop()`, properties `interrupted`, `total_sent_sec`. Internal `_stop_play_api()`.
**`Voice.audio_io.AudioIO`** — paired mic + speaker bundle:
`@classmethod from_profile(profile_id, *, audio_client=None) -> AudioIO`, `start()`, `stop()`. Only `"builtin"` profile supported (Anker/Hollyland USB profiles dropped).
**`Voice.builtin_tts.BuiltinTTS`** — wraps `AudioClient.TtsMaker`:
`__init__(audio_client, default_speaker_id=0)`, `speak(text, speaker_id=None, block=True)`.
**`Voice.gemini_script.GeminiBrain`** — Gemini Live STT-only brain (Sanad `gemini/script.py` port):
`__init__(audio_io, recorder, voice_name=None, system_prompt="", *, api_key, on_transcript=None, on_command=None)`, `start()`, `stop()`, `async run()`. Internal: `_thread_main()` runs an asyncio loop in a worker thread, `_build_config(types)` returns `LiveConnectConfig(response_modalities=["TEXT"], input_audio_transcription, system_instruction)`, `_send_mic_loop(session, types)` streams 32 ms PCM chunks, `_receive_loop(session)` extracts `input_transcription.text` → callbacks + `model_turn` text → log + recorder.
**`Voice.turn_recorder.TurnRecorder`** — per-turn WAV saver:
`__init__(enabled, out_dir, user_rate, robot_rate)`, `capture_user(pcm_bytes)`, `capture_robot(pcm_bytes)`, `add_user_text(text)`, `add_robot_text(text)`, `finish_turn() -> dict`. Internal: `_save_wav`, `_append_index`. In STT-only mode `<ts>_robot.wav` is never written (Gemini emits text, not audio).
**`Voice.marcus_voice.VoiceModule`** — voice orchestrator. Builds `AudioIO.from_profile("builtin", audio_client=ac)`, spawns `GeminiBrain` with `_on_gemini_transcript` (transcript log) and `_dispatch_gemini_command` (wake-word gate + fuzzy match → on_command callback) hooks. Forwards every "Sanad + X" transcript to Marcus's brain via the user-supplied `on_command` callback.
`__init__(audio_api, on_command=None, on_wake=None)`, `start()`, `stop()`, `flush_mic()`, `is_speaking` property. Internal: `_voice_loop` (calls `_voice_loop_gemini`), `_voice_loop_gemini` (assembles AudioIO + TurnRecorder + GeminiBrain), `_on_gemini_transcript(text)`, `_dispatch_gemini_command(text, lang)`, `_normalize_command(text)`. The `flush_mic()` hook is called by `Brain/marcus_brain._on_command` before AND after `audio_api.speak()` to prevent TtsMaker output from being transcribed back as user input.
---
## `Vision/`
| File | Public API |
|---|---|
| `marcus_yolo.py` | `start_yolo(raw_frame_ref, frame_lock)`, `yolo_sees(class, min_confidence)`, `yolo_count(class)`, `yolo_closest(class)`, `yolo_all_classes()`, `yolo_summary()`, `yolo_ppe_violations()`, `yolo_person_too_close(threshold)`, `yolo_is_running()`, `yolo_fps()`, `_resolve_device(requested)` + **class `Detection`** |
| `marcus_imgsearch.py` | **class `ImageSearch`** + prompt helpers `_build_compare_prompt`, `_build_single_prompt`, image utils `_load_image_b64`, `_numpy_to_b64`, `_resize_b64` |
**`Vision.marcus_yolo.Detection`** — a single detection's metadata:
`__init__(class_name, confidence, x1, y1, x2, y2, frame_w, frame_h)`, props `size_ratio`, `position`, `distance_estimate`, method `to_dict()`, `__repr__`.
**`Vision.marcus_imgsearch.ImageSearch`** — rotate-and-compare search:
`__init__(get_frame_fn, send_vel_fn, gradual_stop_fn, llava_fn, yolo_sees_fn, model)`, `search(ref_img_b64, hint, max_steps, direction, yolo_prefilter)`, `search_from_file(image_path, hint, max_steps, direction)`, `abort()`.
---
## `Navigation/`
| File | Public API |
|---|---|
| `goal_nav.py` | `navigate_to_goal(goal, max_steps)`; private `_goal_yolo_target`, `_extract_extra_condition`, `_verify_condition` |
| `patrol.py` | `patrol(duration_minutes, alert_callback)` |
| `marcus_odometry.py` | **class `Odometry`** |
**`Navigation.marcus_odometry.Odometry`** — ROS2 `/dog_odom` + dead-reckoning fallback:
- lifecycle: `__init__()`, `start(zmq_sock)`, `stop()`, `reset()`, `is_running()`
- pose: `get_position()``{x, y, heading, source}`, `get_distance_from_start()`, `status_str()`, `__repr__`
- movement: `walk_distance(meters, speed, direction)`, `turn_degrees(degrees, speed)`, `navigate_to(x, y, heading, speed)`, `return_to_start(speed)`, `patrol_route(waypoints, speed, loop)`
- internal: `_init_own_zmq`, `_reset_state`, `_try_start_ros2`, `_dead_reckoning_loop`, `_send_vel`, `_gradual_stop`, `_check_stale`, `_time_based_walk`, `_time_based_turn`
---
## `Brain/`
| File | Public API |
|---|---|
| `marcus_brain.py` | `init_brain()`, `process_command(cmd)``{type, speak, action, elapsed}`, `get_brain_status()`, `shutdown()`, `run_terminal()`; private `_init_voice`, `_handle_llava`, `_handle_talk`, `_handle_search`, `_warmup_llava` |
| `command_parser.py` | `init_autonomous(auto_instance)`, `try_local_command(cmd)` (regex-table dispatcher); `_print_help`, `_print_examples` |
| `executor.py` | `execute(d)`, `execute_action(move, duration)`, `move_step(move, duration)`, `merge_actions(actions)`; `_obstacle_check` |
| `marcus_memory.py` | **class `Memory`** + utils `_read_json`, `_write_json`, `_sanitize_name`, `_fuzzy_match`, `_new_session_id` |
**`Brain.marcus_memory.Memory`** — places + sessions store, JSON-backed:
- places: `save_place(name, x, y, heading)`, `get_place(name)`, `delete_place(name)`, `list_places()`, `rename_place(old, new)`, `places_count()`
- sessions: `start_session()`, `end_session()`, `log_command(cmd, response, duration_s)`, `log_detection(class, pos, dist, x, y)`, `log_alert(type, detail)`, `get_last_command()`, `get_last_n_commands(n)`, `get_session_detections()`, `commands_count()`, `session_duration_str()`
- history: `last_session_summary()`, `previous_session_detections()`, `previous_session_places()`, `all_sessions()`
- internal: `_load_places`, `_start_autosave`, `_flush_session`, `_emergency_save`, `_write_summary`, `_prune_old_sessions`, `_get_previous_session_dir`
---
## `Autonomous/`
`marcus_autonomous.py`**class `AutonomousMode`**: patrol-and-map state machine.
- `__init__(get_frame_fn, send_vel_fn, gradual_stop_fn, yolo_sees_fn, yolo_summary_fn, yolo_all_classes_fn, yolo_closest_fn, odom_fn, call_llava_fn, patrol_prompt, mem, models_dir)`
- lifecycle: `enable()`, `disable()`, `is_enabled()`, `status()`, `save_snapshot()`
- internal: `_explore_loop`, `_move_forward`, `_turn`, `_assess_scene`, `_create_map_dir`, `_save_observations`, `_save_path`, `_save_frame`, `_generate_summary`, `_save_session`, `_print_summary`
---
## `Server/` & `Bridge/`
| File | Public API |
|---|---|
| `Server/marcus_server.py` | `async handler(websocket)`, `async broadcast_frames()`, `async run_server(host, port)`, `main()`; helpers `_get_interface_ips`, `_check_lidar` |
| `Bridge/ros2_zmq_bridge.py` | **class `ROS2ZMQBridge`** (`_vel_cb`, `_cmd_cb`) + `main()` — standalone tool, not imported by Marcus |
---
## Suggested import surface for integration code
If you're writing glue on top of Marcus, the stable public surface is:
```python
# brain orchestration
from Brain.marcus_brain import init_brain, process_command, shutdown
# direct robot control (bypasses brain)
from API.zmq_api import init_zmq, send_vel, gradual_stop, send_cmd
from API.yolo_api import yolo_sees, yolo_summary, yolo_closest
from API.camera_api import start_camera, get_frame
from API.audio_api import AudioAPI # .speak(text), .record(seconds)
from API.lidar_api import init_lidar, obstacle_ahead, get_slam_pose, stop_lidar
from API.memory_api import init_memory, log_cmd, log_detection, place_save, place_goto
# voice pipeline
from Voice.marcus_voice import VoiceModule
from Voice.audio_io import AudioIO, BuiltinMic, BuiltinSpeaker
from Voice.builtin_tts import BuiltinTTS # used by AudioAPI.speak()
from Voice.gemini_script import GeminiBrain
from Voice.turn_recorder import TurnRecorder
# navigation
from Navigation.goal_nav import navigate_to_goal
from Navigation.patrol import patrol
from Navigation.marcus_odometry import Odometry
# autonomous mode
from Autonomous.marcus_autonomous import AutonomousMode
```
---
## Convention notes
- **All layers above Core must import from `API.*` only** (not directly from `Vision/`, `Navigation/`, `Voice/`). Enforced by convention, not the language.
- **Underscore prefix = private.** `_foo` is internal; don't import it outside the module unless you're the test harness.
- **Stub rebinding pattern** (e.g. `API.yolo_api`): module-level placeholders get replaced with real implementations inside `init_*()` on success. If init fails, callers keep getting the safe stub (e.g. `yolo_sees` returns `False`).
- **Error returns are consistent per layer**: API layer returns `None` / empty dict / `False`; Brain layer returns structured dicts (`{"type","speak","action","elapsed"}`); no exception leaks to the terminal loop except at startup (`init_brain()` will raise to surface hardware issues like missing CUDA).