Marcus/Doc/functions.md

13 KiB
Raw Permalink Blame History

Marcus — Function Inventory

Robot persona: Sanad (wake word + self-intro) Updated: 2026-04-21

Every callable in the codebase, grouped by layer. Generated from AST, kept in sync with the source. See architecture.md for where each module lives and pipeline.md for how they connect.

Totals: 25 importable modules · 73 top-level functions · 9 public classes.


run_marcus.py — entrypoint

Script only. Prepends PROJECT_ROOT to sys.path, then calls Brain.marcus_brain.run_terminal() in __main__.


Core/ — foundation, no external deps

File Function Purpose
env_loader.py _find_env_file(), _load_dotenv(path) find + parse .env into os.environ; exports PROJECT_ROOT
config_loader.py load_config(name), config_path(relative) cached reader for Config/config_{name}.json
log_backend.py _rotating_handler(path) + class Logs custom logging engine; all handlers are RotatingFileHandler (5 MB × 3)
logger.py get_logger(module), log(msg, level, module), log_and_print(msg, level, module) project-wide logging façade

Core.log_backend.Logs methods: __init__(default_log_level, main_log_file), _choose_fallback_log_dir, _normalize_log_name, _is_writable_path, _with_fallback, resolve_log_path, construct_path, log_to_file, LogEngine(folder, log_name), LogsMessages(msg, type, folder, file), print_and_log(...).


API/ — subsystem wrappers (Brain imports only from here)

File Public functions
zmq_api.py init_zmq(), get_socket(), send_vel(vx, vy, vyaw), gradual_stop(), send_cmd(cmd)
camera_api.py start_camera(), stop_camera(), get_frame(), get_frame_age(), get_raw_refs(), camera_loop()
llava_api.py call_llava(prompt, img_b64, num_predict, use_history), ask(command, img_b64), ask_goal(goal, img_b64), ask_talk(command, img_b64, facts), ask_verify(target, condition, img_b64), ask_patrol(img_b64), remember_fact(fact), add_to_history(user_msg, assistant_msg), parse_json(raw)
yolo_api.py init_yolo(raw_frame_ref, frame_lock) + 8 stubs rebound on success: yolo_sees, yolo_count, yolo_closest, yolo_summary, yolo_ppe_violations, yolo_person_too_close, yolo_all_classes, yolo_fps
odometry_api.py init_odometry(zmq_sock), get_position()
memory_api.py init_memory(), log_cmd(cmd, response, duration), log_detection(class_name, position, distance), place_save(name), place_goto(name), places_list_str()
arm_api.py do_arm(action) — G1 GR00T stub
imgsearch_api.py init_imgsearch(get_frame_fn, send_vel_fn, gradual_stop_fn, llava_fn, yolo_sees_fn, model), get_searcher()
audio_api.py class AudioAPI (see below)
lidar_api.py init_lidar(), obstacle_ahead(radius), get_slam_pose(), get_nav_cmd(), get_loc_state(), get_safety_reasons(), get_lidar_status(), get_client(), stop_lidar()

API.audio_api.AudioAPI methods: speak(text, lang="en"), record(seconds) → np.int16 array, play_pcm(audio_16k), save_recording(audio, name), properties is_speaking, is_available. Internal: _init_sdk, _mute_mic, _unmute_mic, _resample, _play_pcm, _record_builtin, _record_parec.


Voice/ — audio I/O + Gemini Live STT + TtsMaker

File Public API
audio_io.py _find_g1_local_ip(), _resample_int16, _as_int16_array, abstract classes Mic, Speaker, concrete classes BuiltinMic, BuiltinSpeaker, dataclass AudioIO with from_profile() factory
builtin_mic.py class BuiltinMic (subclass of audio_io.BuiltinMic + read_seconds() for AudioAPI.record())
builtin_tts.py class BuiltinTTS (used by AudioAPI.speak())
gemini_script.py module-level _load_voice_cfg(), _audio_energy(), class GeminiBrain
turn_recorder.py class TurnRecorder
marcus_voice.py module-level WAKE_WORDS, COMMAND_VOCAB, GARBAGE_PATTERNS (populated from config), helpers _has_wake_word, _strip_wake_word, _strip_wake_word_once, _closest_command, class VoiceModule

Voice.audio_io.BuiltinMic — G1 UDP multicast mic (Sanad-pattern port): __init__(group, port, buf_max), start(), stop(), read_chunk(num_bytes), flush(); internal _recv_loop.

Voice.audio_io.BuiltinSpeaker — streaming wrapper over AudioClient.PlayStream (built but idle in STT-only mode; TtsMaker owns the speaker): __init__(audio_client, app_name=None), begin_stream(), send_chunk(pcm, source_rate), wait_finish(), stop(), properties interrupted, total_sent_sec. Internal _stop_play_api().

Voice.audio_io.AudioIO — paired mic + speaker bundle: @classmethod from_profile(profile_id, *, audio_client=None) -> AudioIO, start(), stop(). Only "builtin" profile supported (Anker/Hollyland USB profiles dropped).

Voice.builtin_tts.BuiltinTTS — wraps AudioClient.TtsMaker: __init__(audio_client, default_speaker_id=0), speak(text, speaker_id=None, block=True).

Voice.gemini_script.GeminiBrain — Gemini Live STT-only brain (Sanad gemini/script.py port): __init__(audio_io, recorder, voice_name=None, system_prompt="", *, api_key, on_transcript=None, on_command=None), start(), stop(), async run(). Internal: _thread_main() runs an asyncio loop in a worker thread, _build_config(types) returns LiveConnectConfig(response_modalities=["TEXT"], input_audio_transcription, system_instruction), _send_mic_loop(session, types) streams 32 ms PCM chunks, _receive_loop(session) extracts input_transcription.text → callbacks + model_turn text → log + recorder.

Voice.turn_recorder.TurnRecorder — per-turn WAV saver: __init__(enabled, out_dir, user_rate, robot_rate), capture_user(pcm_bytes), capture_robot(pcm_bytes), add_user_text(text), add_robot_text(text), finish_turn() -> dict. Internal: _save_wav, _append_index. In STT-only mode <ts>_robot.wav is never written (Gemini emits text, not audio).

Voice.marcus_voice.VoiceModule — voice orchestrator. Builds AudioIO.from_profile("builtin", audio_client=ac), spawns GeminiBrain with _on_gemini_transcript (transcript log) and _dispatch_gemini_command (wake-word gate + fuzzy match → on_command callback) hooks. Forwards every "Sanad + X" transcript to Marcus's brain via the user-supplied on_command callback. __init__(audio_api, on_command=None, on_wake=None), start(), stop(), flush_mic(), is_speaking property. Internal: _voice_loop (calls _voice_loop_gemini), _voice_loop_gemini (assembles AudioIO + TurnRecorder + GeminiBrain), _on_gemini_transcript(text), _dispatch_gemini_command(text, lang), _normalize_command(text). The flush_mic() hook is called by Brain/marcus_brain._on_command before AND after audio_api.speak() to prevent TtsMaker output from being transcribed back as user input.


Vision/

File Public API
marcus_yolo.py start_yolo(raw_frame_ref, frame_lock), yolo_sees(class, min_confidence), yolo_count(class), yolo_closest(class), yolo_all_classes(), yolo_summary(), yolo_ppe_violations(), yolo_person_too_close(threshold), yolo_is_running(), yolo_fps(), _resolve_device(requested) + class Detection
marcus_imgsearch.py class ImageSearch + prompt helpers _build_compare_prompt, _build_single_prompt, image utils _load_image_b64, _numpy_to_b64, _resize_b64

Vision.marcus_yolo.Detection — a single detection's metadata: __init__(class_name, confidence, x1, y1, x2, y2, frame_w, frame_h), props size_ratio, position, distance_estimate, method to_dict(), __repr__.

Vision.marcus_imgsearch.ImageSearch — rotate-and-compare search: __init__(get_frame_fn, send_vel_fn, gradual_stop_fn, llava_fn, yolo_sees_fn, model), search(ref_img_b64, hint, max_steps, direction, yolo_prefilter), search_from_file(image_path, hint, max_steps, direction), abort().


Navigation/

File Public API
goal_nav.py navigate_to_goal(goal, max_steps); private _goal_yolo_target, _extract_extra_condition, _verify_condition
patrol.py patrol(duration_minutes, alert_callback)
marcus_odometry.py class Odometry

Navigation.marcus_odometry.Odometry — ROS2 /dog_odom + dead-reckoning fallback:

  • lifecycle: __init__(), start(zmq_sock), stop(), reset(), is_running()
  • pose: get_position(){x, y, heading, source}, get_distance_from_start(), status_str(), __repr__
  • movement: walk_distance(meters, speed, direction), turn_degrees(degrees, speed), navigate_to(x, y, heading, speed), return_to_start(speed), patrol_route(waypoints, speed, loop)
  • internal: _init_own_zmq, _reset_state, _try_start_ros2, _dead_reckoning_loop, _send_vel, _gradual_stop, _check_stale, _time_based_walk, _time_based_turn

Brain/

File Public API
marcus_brain.py init_brain(), process_command(cmd){type, speak, action, elapsed}, get_brain_status(), shutdown(), run_terminal(); private _init_voice, _handle_llava, _handle_talk, _handle_search, _warmup_llava
command_parser.py init_autonomous(auto_instance), try_local_command(cmd) (regex-table dispatcher); _print_help, _print_examples
executor.py execute(d), execute_action(move, duration), move_step(move, duration), merge_actions(actions); _obstacle_check
marcus_memory.py class Memory + utils _read_json, _write_json, _sanitize_name, _fuzzy_match, _new_session_id

Brain.marcus_memory.Memory — places + sessions store, JSON-backed:

  • places: save_place(name, x, y, heading), get_place(name), delete_place(name), list_places(), rename_place(old, new), places_count()
  • sessions: start_session(), end_session(), log_command(cmd, response, duration_s), log_detection(class, pos, dist, x, y), log_alert(type, detail), get_last_command(), get_last_n_commands(n), get_session_detections(), commands_count(), session_duration_str()
  • history: last_session_summary(), previous_session_detections(), previous_session_places(), all_sessions()
  • internal: _load_places, _start_autosave, _flush_session, _emergency_save, _write_summary, _prune_old_sessions, _get_previous_session_dir

Autonomous/

marcus_autonomous.pyclass AutonomousMode: patrol-and-map state machine.

  • __init__(get_frame_fn, send_vel_fn, gradual_stop_fn, yolo_sees_fn, yolo_summary_fn, yolo_all_classes_fn, yolo_closest_fn, odom_fn, call_llava_fn, patrol_prompt, mem, models_dir)
  • lifecycle: enable(), disable(), is_enabled(), status(), save_snapshot()
  • internal: _explore_loop, _move_forward, _turn, _assess_scene, _create_map_dir, _save_observations, _save_path, _save_frame, _generate_summary, _save_session, _print_summary

Server/ & Bridge/

File Public API
Server/marcus_server.py async handler(websocket), async broadcast_frames(), async run_server(host, port), main(); helpers _get_interface_ips, _check_lidar
Bridge/ros2_zmq_bridge.py class ROS2ZMQBridge (_vel_cb, _cmd_cb) + main() — standalone tool, not imported by Marcus

Suggested import surface for integration code

If you're writing glue on top of Marcus, the stable public surface is:

# brain orchestration
from Brain.marcus_brain import init_brain, process_command, shutdown

# direct robot control (bypasses brain)
from API.zmq_api     import init_zmq, send_vel, gradual_stop, send_cmd
from API.yolo_api    import yolo_sees, yolo_summary, yolo_closest
from API.camera_api  import start_camera, get_frame
from API.audio_api   import AudioAPI            # .speak(text), .record(seconds)
from API.lidar_api   import init_lidar, obstacle_ahead, get_slam_pose, stop_lidar
from API.memory_api  import init_memory, log_cmd, log_detection, place_save, place_goto

# voice pipeline
from Voice.marcus_voice  import VoiceModule
from Voice.audio_io      import AudioIO, BuiltinMic, BuiltinSpeaker
from Voice.builtin_tts   import BuiltinTTS         # used by AudioAPI.speak()
from Voice.gemini_script import GeminiBrain
from Voice.turn_recorder import TurnRecorder

# navigation
from Navigation.goal_nav         import navigate_to_goal
from Navigation.patrol           import patrol
from Navigation.marcus_odometry  import Odometry

# autonomous mode
from Autonomous.marcus_autonomous import AutonomousMode

Convention notes

  • All layers above Core must import from API.* only (not directly from Vision/, Navigation/, Voice/). Enforced by convention, not the language.
  • Underscore prefix = private. _foo is internal; don't import it outside the module unless you're the test harness.
  • Stub rebinding pattern (e.g. API.yolo_api): module-level placeholders get replaced with real implementations inside init_*() on success. If init fails, callers keep getting the safe stub (e.g. yolo_sees returns False).
  • Error returns are consistent per layer: API layer returns None / empty dict / False; Brain layer returns structured dicts ({"type","speak","action","elapsed"}); no exception leaks to the terminal loop except at startup (init_brain() will raise to surface hardware issues like missing CUDA).