kassam 8491be7f1e Initial project commit

2026-04-12 18:50:22 +04:00

30 KiB

Raw Blame History

Marcus — System Architecture

Project: Marcus | YS Lootah Technology Hardware: Unitree G1 EDU Humanoid (29 DOF) + Jetson Orin NX (16GB) Updated: 2026-04-06

Overview

Marcus is a fully offline humanoid robot AI system. The brain runs on Jetson Orin NX with no cloud dependencies. It uses vision-language models (Qwen2.5-VL via Ollama) for understanding commands, YOLO for real-time object detection, dead reckoning for position tracking, and persistent memory across sessions.

Two operating modes:

Terminal mode (run_marcus.py) — direct keyboard control on the Jetson
Server mode (Server/marcus_server.py) — WebSocket server allowing remote CLI or GUI clients

Both modes use the same brain — identical command processing, same YOLO, same memory, same movement control.

Project Structure

Marcus/
├── run_marcus.py                 # Entrypoint — terminal mode
├── .env                          # Machine-specific: PROJECT_BASE, PROJECT_NAME
│
├── Core/                         # Foundation layer — no external deps
│   ├── env_loader.py             # Reads .env, resolves PROJECT_ROOT
│   ├── config_loader.py          # load_config(name) → reads Config/config_{name}.json
│   ├── Logger.py                 # Logging engine (file-based, no console output)
│   └── logger.py                 # Project wrapper: log(), log_and_print(), get_logger()
│
├── Config/                       # ALL configuration — one JSON per module
│   ├── config_ZMQ.json           # ZMQ host, port, stop params
│   ├── config_Camera.json        # RealSense resolution, fps, quality
│   ├── config_Brain.json         # Ollama model, prompts, num_predict values
│   ├── config_Vision.json        # YOLO model path, confidence, tracked classes
│   ├── config_Navigation.json    # move_map, goal aliases, YOLO goal classes
│   ├── config_Patrol.json        # patrol duration, proximity threshold
│   ├── config_Arm.json           # arm actions, aliases, availability flag
│   ├── config_Odometry.json      # speeds, tolerances, ROS2 topic
│   ├── config_Memory.json        # session/places paths
│   ├── config_Network.json       # Jetson IPs (eth0/wlan0), ports
│   ├── config_ImageSearch.json   # search defaults
│   └── marcus_prompts.yaml       # All LLaVA/Qwen prompts (main, goal, patrol, talk, verify)
│
├── API/                          # Interface layer — one file per subsystem
│   ├── zmq_api.py                # ZMQ PUB socket: send_vel(), gradual_stop(), send_cmd()
│   ├── camera_api.py             # RealSense thread: start/stop_camera(), get_frame()
│   ├── llava_api.py              # LLaVA queries: call_llava(), ask(), ask_goal(), ask_patrol()
│   ├── yolo_api.py               # YOLO interface: init_yolo(), yolo_sees(), yolo_summary()...
│   ├── odometry_api.py           # Odometry wrapper: init_odometry(), get_position()
│   ├── memory_api.py             # Memory wrapper: init_memory(), log_cmd(), place_save/goto()
│   ├── arm_api.py                # Arm gestures: do_arm(), ARM_ACTIONS, ALL_ARM_NAMES
│   └── imgsearch_api.py          # Image search wrapper: init_imgsearch(), get_searcher()
│
├── Brain/                        # Decision logic — imports ONLY from API/
│   ├── marcus_brain.py           # Orchestrator: init_brain(), process_command(), run_terminal()
│   ├── command_parser.py         # 14 regex patterns + try_local_command() dispatcher
│   ├── executor.py               # execute_action(), merge_actions(), execute()
│   └── marcus_memory.py          # Session + place memory (Memory class, 817 lines)
│
├── Navigation/                   # Movement + position tracking
│   ├── goal_nav.py               # navigate_to_goal() — YOLO+LLaVA hybrid visual search
│   ├── patrol.py                 # patrol() — autonomous HSE patrol with PPE detection
│   └── marcus_odometry.py        # Odometry class — dead reckoning + ROS2 fallback
│
├── Vision/                       # Computer vision
│   ├── marcus_yolo.py            # YOLO background inference: Detection class + query API
│   └── marcus_imgsearch.py       # ImageSearch class — reference image comparison
│
├── Server/                       # WebSocket server (runs on Jetson)
│   └── marcus_server.py          # Full brain over WebSocket — same as run_marcus.py
│
├── Client/                       # Remote clients (run on workstation)
│   ├── marcus_cli.py             # Terminal CLI client with color output
│   └── marcus_client.py          # Tkinter GUI client (3 tabs: Nav/Camera/LiDAR)
│
├── Bridge/                       # ROS2 integration
│   └── ros2_zmq_bridge.py        # ROS2 /cmd_vel → ZMQ velocity bridge
│
├── Autonomous/                   # Autonomous exploration mode
│   └── marcus_autonomous.py      # AutonomousMode — office exploration + mapping
│
├── Models/                       # AI model weights
│   ├── yolov8m.pt                # YOLOv8 medium (50MB)
│   └── Modelfile                 # Ollama model definition (FROM qwen2.5vl:7b)
│
├── Data/                         # Runtime-generated data ONLY (no code)
│   ├── Brain/Sessions/           # session_{id}_{date}/ — commands, detections, alerts
│   ├── Brain/Exploration/        # Autonomous mode map data
│   ├── History/Places/           # places.json — persistent named locations
│   ├── History/Sessions/         # Session history
│   ├── History/Prompts/          # Prompt history
│   ├── Navigation/Maps/          # SLAM occupancy grids
│   ├── Navigation/Waypoints/     # Saved waypoint files
│   ├── Vision/Camera/            # Captured camera frames
│   ├── Vision/Videos/            # Recorded video clips
│   └── Vision/Frames/            # Detection snapshots
│
├── Doc/                          # Documentation
│   ├── architecture.md           # This file
│   ├── controlling.md            # Startup guide + command reference
│   ├── MARCUS_API.md             # API reference
│   └── note.txt                  # Quick notes
│
├── logs/                         # Runtime logs (one per module)
│   ├── brain.log
│   ├── camera.log
│   ├── server.log
│   ├── zmq.log
│   └── main.log
│
└── Legacy/                       # Archived originals
    └── marcus_nav.py             # Original standalone prototype

Layer Architecture

┌─────────────────────────────────────────────────┐
│                  Entrypoints                     │
│  run_marcus.py (terminal)                        │
│  Server/marcus_server.py (WebSocket)             │
└──────────────────┬──────────────────────────────┘
                   │
┌──────────────────▼──────────────────────────────┐
│                Brain Layer                       │
│  marcus_brain.py    — init_brain()               │
│                     — process_command(cmd)        │
│  command_parser.py  — 14 regex local commands    │
│  executor.py        — execute LLaVA decisions    │
│  marcus_memory.py   — session + place memory     │
└──────────────────┬──────────────────────────────┘
                   │ imports only from API/
┌──────────────────▼──────────────────────────────┐
│                 API Layer                        │
│  zmq_api     camera_api    llava_api            │
│  yolo_api    odometry_api  memory_api           │
│  arm_api     imgsearch_api                      │
└──────────────────┬──────────────────────────────┘
                   │ wraps
┌──────────────────▼──────────────────────────────┐
│            Navigation / Vision                   │
│  goal_nav.py        marcus_yolo.py              │
│  patrol.py          marcus_imgsearch.py         │
│  marcus_odometry.py                              │
└──────────────────┬──────────────────────────────┘
                   │
┌──────────────────▼──────────────────────────────┐
│               Core Layer                         │
│  env_loader.py   config_loader.py               │
│  Logger.py       logger.py                      │
└──────────────────┬──────────────────────────────┘
                   │ reads
┌──────────────────▼──────────────────────────────┐
│              Config / .env                       │
│  11 JSON files + marcus_prompts.yaml            │
└─────────────────────────────────────────────────┘

Rule: Brain never imports from Vision/ or Navigation/ directly. It goes through the API layer.

File-by-File Documentation

Core/

`env_loader.py` (34 lines)

Reads .env from the project root to resolve PROJECT_ROOT. Uses a minimal built-in parser (no python-dotenv dependency). Exports PROJECT_ROOT as a Path object resolved from __file__, so it works regardless of where the script is called from. Fallback default: /home/unitree.

`config_loader.py` (30 lines)

load_config(name) reads Config/config_{name}.json and caches the result. All modules call this instead of hardcoding constants. Also provides config_path(relative) to resolve relative paths (e.g., "Models/yolov8m.pt") to absolute paths from PROJECT_ROOT.

`Logger.py` (186 lines)

Full logging engine from AI_Photographer. File-based only (no console output by default). Creates per-module log files in logs/. Handles write permission fallbacks, log name normalization, and corrupt log recovery.

`logger.py` (51 lines)

Project wrapper around Logger.py. Provides:

log(message, level, module) — write to logs/{module}.log
log_and_print(message, level, module) — write + print
get_logger(module) — get configured Logs instance

API/

Each API file wraps one subsystem. They read their own config via load_config(), handle import errors gracefully with fallback stubs, and export clean public functions.

`zmq_api.py` (49 lines)

Creates a ZMQ PUB socket on startup (binds to tcp://127.0.0.1:{zmq_port}). Holosoma's RL policy connects to this socket as SUB and receives velocity commands at 50Hz.

Exports:

send_vel(vx, vy, vyaw) — send velocity to Holosoma
gradual_stop() — 20 zero-velocity messages over 1 second
send_cmd(cmd) — send state command: "start", "walk", "stand", "stop"
get_socket() — return the shared PUB socket (for odometry to reuse)
MOVE_MAP — direction-to-velocity lookup: {"forward": (0.3, 0, 0), "left": (0, 0, 0.3), ...}

Config: config_ZMQ.json — host, port, stop_iterations, stop_delay, step_pause

`camera_api.py` (111 lines)

Background thread captures RealSense D435I frames continuously. Stores both raw BGR (for YOLO) and base64 JPEG (for LLaVA). Auto-reconnects on USB drops with exponential backoff (2s → 4s → 8s, max 10s).

Exports:

start_camera() — starts thread, returns (raw_frame_ref, raw_lock) for YOLO
stop_camera() — stops the thread
get_frame() — returns latest base64 JPEG (or last known good frame)
get_frame_age() — seconds since last successful frame
get_raw_refs() — returns shared numpy frame + lock for YOLO

Config: config_Camera.json — width (424), height (240), fps (15), jpeg_quality (70)

`llava_api.py` (107 lines)

Interface to Ollama's vision-language model (Qwen2.5-VL 3B). Manages conversation history (6-turn sliding window) and user-told facts for context injection.

Exports:

call_llava(prompt, img_b64, num_predict, use_history) — raw LLM call
ask(command, img_b64) — send command + image, get structured JSON response
ask_goal(goal, img_b64) — check if goal reached during navigation
ask_patrol(img_b64) — assess scene during autonomous patrol
parse_json(raw) — extract JSON from LLM output
add_to_history(user_msg, assistant_msg) — add to conversation context
remember_fact(fact) — store persistent fact (e.g., "Kassam is the programmer")
OLLAMA_MODEL — current model name from config

Config: config_Brain.json — ollama_model, max_history, num_predict values, prompts

`yolo_api.py` (66 lines)

Lazy-loads YOLO from Vision/marcus_yolo.py. If import fails, all functions return safe defaults (empty sets, False, 0). No crash on missing dependencies.

Exports:

init_yolo(raw_frame_ref, frame_lock) — start background inference
yolo_sees(class_name) — is class currently detected?
yolo_count(class_name) — how many instances?
yolo_closest(class_name) — nearest Detection object
yolo_summary() — human-readable summary: "2 persons (left, close) | 1 chair"
yolo_ppe_violations() — list of PPE violations
yolo_person_too_close(threshold) — safety proximity check
yolo_all_classes() — set of all currently detected classes
yolo_fps() — current inference rate
YOLO_AVAILABLE — True if YOLO loaded successfully

`odometry_api.py` (40 lines)

Wraps Navigation/marcus_odometry.py. Passes the shared ZMQ socket to avoid port conflicts.

Exports:

init_odometry(zmq_sock) — start tracking, returns success bool
get_position() — returns {"x": float, "y": float, "heading": float, "source": str}
odom — the Odometry instance (or None)
ODOM_AVAILABLE — True if running

`memory_api.py` (109 lines)

Wraps Brain/marcus_memory.py. Also contains place memory functions that combine memory + odometry.

Exports:

init_memory() — start session, load places
log_cmd(cmd, response, duration) — log command to session
log_detection(class_name, position, distance) — log YOLO detection with odometry position
place_save(name) — save current position as named place
place_goto(name) — navigate to saved place using odometry
places_list_str() — formatted table of all saved places
mem — Memory instance (or None)
MEMORY_AVAILABLE — True if running

`arm_api.py` (16 lines)

Stub for GR00T N1.5 arm control. Currently prints a message. ARM_ACTIONS and ARM_ALIASES loaded from config_Arm.json.

Exports:

do_arm(action) — execute arm gesture (currently stub)
ARM_ACTIONS — dict of action name → action ID
ARM_ALIASES — dict of common names → action ID
ALL_ARM_NAMES — set of all recognized arm command names
ARM_AVAILABLE — False (pending GR00T integration)

`imgsearch_api.py` (38 lines)

Wraps Vision/marcus_imgsearch.py. Wires camera, ZMQ, LLaVA, and YOLO into the ImageSearch class.

Exports:

init_imgsearch(get_frame_fn, send_vel_fn, ...) — wire dependencies
get_searcher() — return ImageSearch instance (or None)

Brain/

`marcus_brain.py` (372 lines)

The orchestrator. Contains all the brain's public functions used by both terminal and server modes.

Key functions:

init_brain() — initializes all subsystems in order: camera → YOLO → odometry → memory → image search → Holosoma boot → LLaVA warmup
process_command(cmd) → dict — routes a command through the full pipeline and returns {"type", "speak", "action", "elapsed"}. Pipeline order:
1. YOLO status check
2. Image search (search/)
3. Natural language goal auto-detect ("find a person", "look for a bottle")
4. Explicit goal (goal/ ...)
5. Patrol (patrol)
6. Local commands (place memory, odometry, help) via command_parser.py
7. Talk-only questions (what/who/where/how)
8. Greetings (hi/hello/salam) — instant, no AI
9. "Come to me" shortcut — instant forward 2s
10. Multi-step compound ("turn right then walk forward")
11. Standard LLaVA command — full AI inference
run_terminal() — terminal input loop (used by run_marcus.py)
get_brain_status() — returns dict of all subsystem states
shutdown() — clean stop of all subsystems

`command_parser.py` (300 lines)

14 compiled regex patterns that intercept commands before they reach LLaVA. Handles:

Pattern	Example	Action
`_RE_REMEMBER`	"remember this as door"	Save current position
`_RE_GOTO`	"go to door"	Navigate to saved place
`_RE_FORGET`	"forget door"	Delete saved place
`_RE_RENAME`	"rename door to entrance"	Rename place
`_RE_WALK_DIST`	"walk 1 meter"	Precise odometry walk
`_RE_WALK_BACK`	"walk backward 2 meters"	Precise backward walk
`_RE_TURN_DEG`	"turn right 90 degrees"	Precise odometry turn
`_RE_PATROL_RT`	"patrol: door → desk → exit"	Named waypoint patrol
`_RE_LAST_CMD`	"last command"	Recall from session
`_RE_DO_AGAIN`	"do that again"	Repeat last command
`_RE_UNDO`	"undo"	Reverse last movement
`_RE_LAST_SESS`	"last session"	Previous session summary
`_RE_WHERE`	"where am I"	Current odometry position
`_RE_GO_HOME`	"go home"	Return to start position

Also handles: session summary, help text, examples text.

`executor.py` (81 lines)

Executes LLaVA movement decisions. Converts the JSON action list into sustained ZMQ velocity commands.

Functions:

execute_action(move, duration) — single movement step. Uses MOVE_MAP for velocities, intercepts arm names that LLaVA sometimes puts in the actions list
move_step(move, duration) — lightweight version for goal/patrol loops (no full gradual_stop between steps)
merge_actions(actions) — combines consecutive same-direction steps: 5x right 1.0s → 1x right 5.0s
execute(d) — full decision execution: movements in sequence, arm gesture in background thread

`marcus_memory.py` (817 lines)

Persistent session and place memory. Thread-safe with atomic JSON writes.

Place memory:

Save named positions with odometry coordinates
Fuzzy name matching (typo tolerance)
Name sanitization (special chars → underscores)
Rename, delete, list operations

Session memory:

Per-session folders: session_{id}_{date}/
Logs: commands.json, detections.json, alerts.json, summary.txt
60-second auto-flush in background thread
Emergency save via atexit on crash
YOLO detection deduplication (5-second window)
Cross-session recall ("what did you do last session?")
Auto-prune old sessions (keeps last 50)

`goal_nav.py` (154 lines)

Visual goal navigation. Robot rotates continuously while scanning for a target using YOLO (fast, 0.4s checks) with LLaVA fallback (slow but handles non-YOLO classes).

How it works:

Parse goal to extract YOLO target class (via aliases: "guy" → "person", "sofa" → "couch")
Start continuous rotation in background thread
YOLO fast-check every 0.4s — if target class found:
- Extract compound condition ("holding a phone", "wearing red")
- If compound: ask LLaVA to verify ("Is the person holding a phone? yes/no")
- If verified (or no compound): stop and report
LLaVA fallback for non-YOLO classes: send goal_prompt with image, check if reached: true
Max steps limit (40 default), Ctrl+C to abort

Config: config_Navigation.json — goal_aliases, yolo_goal_classes, max_steps, rotation_speed

`patrol.py` (106 lines)

Autonomous HSE inspection patrol. Timed loop with YOLO PPE detection and LLaVA scene assessment.

How it works:

YOLO checks for PPE violations (no helmet, no vest) and logs alerts
Safety: stop if person too close (size_ratio > 0.3)
LLaVA assesses scene: observation, alert, next_move, duration
Executes lightweight movement steps between checks
All detections and alerts logged to session memory

Config: config_Patrol.json — default_duration_minutes, proximity_threshold

`marcus_odometry.py` (808 lines)

Precise position tracking and movement control.

Dual source (priority order):

ROS2 /dog_odom — joint encoder data, ±2cm accuracy (currently disabled due to DDS memory conflict)
Dead reckoning — velocity × time integration at 20Hz, ±10cm accuracy

Movement API:

walk_distance(meters, speed, direction) — odometry feedback loop, 5cm tolerance, safety timeout
turn_degrees(degrees, speed) — heading feedback with 0°/360° wrap-around, 2° tolerance
navigate_to(x, y, heading) — rotate to face target, walk straight, rotate to final heading
return_to_start() — navigate back to where start() was called
patrol_route(waypoints, loop) — walk through list of waypoints in order

All movements have time-based fallbacks when odometry isn't running. Speed clamped at 0.4 m/s. KeyboardInterrupt handling with gradual stop.

Vision/

`marcus_yolo.py` (474 lines)

Background YOLO inference engine. Runs in a daemon thread, reads from the shared camera frame buffer.

Detection class: Each detection has class_name, confidence, bbox, position (left/center/right), distance_estimate (very close/close/medium/far), size_ratio.

Public API:

start_yolo(raw_frame_ref, frame_lock) — start inference thread
yolo_sees(class_name, min_confidence) — check if class detected
yolo_count(class_name) — count instances
yolo_closest(class_name) — largest bbox (closest object)
yolo_summary() — "2 persons (left, close) | 1 chair (center, medium)"
yolo_ppe_violations() — PPE-specific detections
yolo_person_too_close(threshold) — safety proximity check

Config: config_Vision.json — model path, confidence (0.45), 19 tracked COCO classes

`marcus_imgsearch.py` (501 lines)

Image-guided search. User provides a reference photo; robot rotates and LLaVA compares camera frames to the reference.

How it works:

Load reference image (resize to 336x336 for efficiency)
Start continuous rotation
Optional YOLO pre-filter (find "person" class before running LLaVA)
LLaVA comparison: sends [reference, current_frame] as two images
Parse JSON response: found, confidence (low/medium/high), position, description
Stop on medium/high confidence match

Supports text-only search (no reference image) using hint description.

Server/

`marcus_server.py` (224 lines)

WebSocket server that wraps the full Marcus brain. Initializes all subsystems (camera, YOLO, odometry, memory, LLaVA) on startup, then accepts commands via WebSocket.

Architecture:

Calls init_brain() from marcus_brain.py — same init as terminal mode
Each incoming "command" message runs process_command(cmd) in a thread pool
Broadcasts camera frames to all clients at ~10Hz
Auto-detects eth0 and wlan0 IPs for the connection banner

WebSocket message types:

Client sends	Server responds
`{"type": "command", "command": "turn left"}`	`{"type": "thinking"}` then `{"type": "decision", "action": "LEFT", "speak": "Turning left", ...}`
`{"type": "capture"}`	`{"type": "capture_result", "ok": true, "data": "<base64>"}`
`{"type": "ping"}`	`{"type": "pong", "lidar": true, "status": {...}}`

Config: config_Network.json — jetson_ip, jetson_wlan_ip, websocket_port

Client/

`marcus_cli.py` (288 lines)

Terminal CLI client for remote control. Connects to the server via WebSocket.

Features:

Connection menu: choose eth0 / wlan0 / custom IP
Color-coded output: green=forward, cyan=turn, red=stop, orange=greeting/local
Displays Marcus: <speak text> for every response
System commands: status, camera, profile <name>, capture, help, q
Async receiver for real-time decision display while typing
Command history (not persisted)

`marcus_client.py` (1021 lines)

Tkinter GUI client with 3 tabs:

Navigation — live camera view, command entry, quick buttons, decision log
Camera — profile switcher, custom resolution, capture, preview toggle
LiDAR — full SLAM Commander (runs locally via SlamEngineClient from G1_Lootah/Lidar)

Bridge/

`ros2_zmq_bridge.py` (66 lines)

ROS2 Foxy node that subscribes to /cmd_vel (TwistStamped) and holosoma/other_input (String), forwarding to the ZMQ PUB socket. Requires Python 3.8 + ROS2 sourced. Used when external ROS2 nodes need to send velocity commands to Holosoma.

Autonomous/

`marcus_autonomous.py` (516 lines)

Autonomous office exploration mode. Marcus moves freely, identifies areas and objects, builds a live map, saves everything to a session folder.

State machine: IDLE → EXPLORING → IDLE

Exploration loop:

Safety: stop if person too close
Record YOLO detections + odometry path point
Every 5 steps: LLaVA scene assessment (area_type, objects, observation)
Move forward; turn when blocked (alternates left/right)
Save interesting frames to disk
Auto-flush to disk every 20 steps

Output: Data/Brain/Exploration/map_{id}_{date}/ — observations.json, path.json, summary.txt, frames/

Data Flow

Command: "turn right"

User types "turn right"
  │
  ▼
process_command("turn right")
  │ (no regex match — falls through to LLaVA)
  ▼
llava_api.ask("turn right", camera_frame)
  │ sends to Ollama qwen2.5vl:3b
  ▼
LLaVA returns: {"actions":[{"move":"right","duration":2.0}], "speak":"Turning right"}
  │
  ▼
executor.execute(d)
  │ merge_actions → execute_action("right", 2.0)
  ▼
zmq_api.send_vel(vyaw=-0.3) × 40 times over 2.0 seconds
  │
  ▼
Holosoma RL policy receives velocity → robot turns right
  │
  ▼
zmq_api.gradual_stop() → 20 zero-velocity messages

Command: "remember this as door"

User types "remember this as door"
  │
  ▼
process_command("remember this as door")
  │ matches _RE_REMEMBER regex
  ▼
command_parser.try_local_command()
  │ calls memory_api.place_save("door")
  ▼
odometry_api.get_position() → {"x": 1.2, "y": 0.5, "heading": 90.0}
  │
  ▼
marcus_memory.Memory.save_place("door", x=1.2, y=0.5, heading=90.0)
  │ atomic write to Data/History/Places/places.json
  ▼
Returns: {"type": "local", "speak": "Done", "action": "LOCAL"}

Command: "goal/ find a person"

User types "goal/ find a person"
  │
  ▼
process_command() → navigate_to_goal("find a person")
  │
  ▼
_goal_yolo_target("find a person") → "person"
  │ YOLO mode (not LLaVA fallback)
  ▼
Start continuous rotation thread (vyaw=0.3)
  │
  ▼
Loop every 0.4s:
  │ yolo_sees("person") → False → keep rotating
  │ yolo_sees("person") → False → keep rotating
  │ yolo_sees("person") → True!
  │   ▼
  │   _extract_extra_condition() → None (no compound)
  │   ▼
  │   gradual_stop()
  │   yolo_closest("person") → Detection(center, close)
  │   log_detection("person", "center", "close")
  ▼
Returns: {"type": "goal", "speak": "Goal navigation: find a person"}

Hardware Stack

Unitree G1 EDU (29 DOF)
  │
  ├── Jetson Orin NX (16GB VRAM)
  │     ├── Holosoma RL policy (50Hz) — locomotion joints 0-11
  │     ├── Ollama + Qwen2.5-VL 3B — vision-language understanding
  │     ├── YOLOv8m — real-time object detection (CPU, 320px)
  │     └── Marcus Brain — this project
  │
  ├── RealSense D435I — RGB camera (424x240 @ 15fps)
  │
  ├── Livox Mid360 LiDAR — 3D point cloud (via SlamEngineClient)
  │
  └── ZMQ PUB/SUB — velocity commands (tcp://127.0.0.1:5556)
        ├── Marcus Brain PUB → Holosoma SUB
        └── ROS2 Bridge PUB → Holosoma SUB (alternative)

Startup Order

Holosoma — must be running first (RL locomotion policy)
Marcus Server (python3 -m Server.marcus_server) — or Brain (python3 run_marcus.py)
Client (python3 -m Client.marcus_cli) — connects to server

Cannot run Server and Brain simultaneously (both bind ZMQ port 5556).

Config Reference

File	Key values
`config_ZMQ.json`	zmq_host: 127.0.0.1, zmq_port: 5556
`config_Camera.json`	424x240 @ 15fps, JPEG quality 70
`config_Brain.json`	qwen2.5vl:3b, history 6 turns, prompts
`config_Vision.json`	yolov8m.pt, confidence 0.45, 19 classes
`config_Navigation.json`	move_map velocities, goal aliases
`config_Network.json`	eth0: 192.168.123.164, wlan0: 10.255.254.86, port 8765
`config_Odometry.json`	walk 0.25 m/s, turn 0.25 rad/s, 5cm tolerance
`config_Memory.json`	Data/Brain/Sessions, Data/History/Places
`config_Patrol.json`	5 min default, proximity 0.3
`config_Arm.json`	16 gestures, arm_available: false (GR00T pending)

Line Count Summary

Layer	Files	Lines
Core	4	301
API	8	536
Brain	4	1,570
Navigation	3	1,068
Vision	2	975
Server	1	224
Client	2	1,309
Bridge	1	66
Autonomous	1	516
Entrypoint	1	16
Total	27	6,581

30 KiB Raw Blame History Unescape Escape

Marcus — System Architecture

Overview

Project Structure

Layer Architecture

File-by-File Documentation

Core/

env_loader.py (34 lines)

config_loader.py (30 lines)

Logger.py (186 lines)

logger.py (51 lines)

API/

zmq_api.py (49 lines)

camera_api.py (111 lines)

llava_api.py (107 lines)

yolo_api.py (66 lines)

odometry_api.py (40 lines)

memory_api.py (109 lines)

arm_api.py (16 lines)

imgsearch_api.py (38 lines)

Brain/

marcus_brain.py (372 lines)

command_parser.py (300 lines)

executor.py (81 lines)

marcus_memory.py (817 lines)

Navigation/

goal_nav.py (154 lines)

patrol.py (106 lines)

marcus_odometry.py (808 lines)

Vision/

marcus_yolo.py (474 lines)

marcus_imgsearch.py (501 lines)

Server/

marcus_server.py (224 lines)

Client/

marcus_cli.py (288 lines)

marcus_client.py (1021 lines)

Bridge/

ros2_zmq_bridge.py (66 lines)

Autonomous/

marcus_autonomous.py (516 lines)

Data Flow

Command: "turn right"

Command: "remember this as door"

Command: "goal/ find a person"

Hardware Stack

Startup Order

Config Reference

Line Count Summary

30 KiB

Raw Blame History

`env_loader.py` (34 lines)

`config_loader.py` (30 lines)

`Logger.py` (186 lines)

`logger.py` (51 lines)

`zmq_api.py` (49 lines)

`camera_api.py` (111 lines)

`llava_api.py` (107 lines)

`yolo_api.py` (66 lines)

`odometry_api.py` (40 lines)

`memory_api.py` (109 lines)

`arm_api.py` (16 lines)

`imgsearch_api.py` (38 lines)

`marcus_brain.py` (372 lines)

`command_parser.py` (300 lines)

`executor.py` (81 lines)

`marcus_memory.py` (817 lines)

`goal_nav.py` (154 lines)

`patrol.py` (106 lines)

`marcus_odometry.py` (808 lines)

`marcus_yolo.py` (474 lines)

`marcus_imgsearch.py` (501 lines)

`marcus_server.py` (224 lines)

`marcus_cli.py` (288 lines)

`marcus_client.py` (1021 lines)

`ros2_zmq_bridge.py` (66 lines)

`marcus_autonomous.py` (516 lines)