# Marcus — System Architecture **Project**: Marcus | YS Lootah Technology **Hardware**: Unitree G1 EDU Humanoid (29 DOF) + Jetson Orin NX (16GB) **Updated**: 2026-04-06 --- ## Overview Marcus is a fully offline humanoid robot AI system. The brain runs on Jetson Orin NX with no cloud dependencies. It uses vision-language models (Qwen2.5-VL via Ollama) for understanding commands, YOLO for real-time object detection, dead reckoning for position tracking, and persistent memory across sessions. Two operating modes: - **Terminal mode** (`run_marcus.py`) — direct keyboard control on the Jetson - **Server mode** (`Server/marcus_server.py`) — WebSocket server allowing remote CLI or GUI clients Both modes use the **same brain** — identical command processing, same YOLO, same memory, same movement control. --- ## Project Structure ``` Marcus/ ├── run_marcus.py # Entrypoint — terminal mode ├── .env # Machine-specific: PROJECT_BASE, PROJECT_NAME │ ├── Core/ # Foundation layer — no external deps │ ├── env_loader.py # Reads .env, resolves PROJECT_ROOT │ ├── config_loader.py # load_config(name) → reads Config/config_{name}.json │ ├── Logger.py # Logging engine (file-based, no console output) │ └── logger.py # Project wrapper: log(), log_and_print(), get_logger() │ ├── Config/ # ALL configuration — one JSON per module │ ├── config_ZMQ.json # ZMQ host, port, stop params │ ├── config_Camera.json # RealSense resolution, fps, quality │ ├── config_Brain.json # Ollama model, prompts, num_predict values │ ├── config_Vision.json # YOLO model path, confidence, tracked classes │ ├── config_Navigation.json # move_map, goal aliases, YOLO goal classes │ ├── config_Patrol.json # patrol duration, proximity threshold │ ├── config_Arm.json # arm actions, aliases, availability flag │ ├── config_Odometry.json # speeds, tolerances, ROS2 topic │ ├── config_Memory.json # session/places paths │ ├── config_Network.json # Jetson IPs (eth0/wlan0), ports │ ├── config_ImageSearch.json # search defaults │ └── marcus_prompts.yaml # All LLaVA/Qwen prompts (main, goal, patrol, talk, verify) │ ├── API/ # Interface layer — one file per subsystem │ ├── zmq_api.py # ZMQ PUB socket: send_vel(), gradual_stop(), send_cmd() │ ├── camera_api.py # RealSense thread: start/stop_camera(), get_frame() │ ├── llava_api.py # LLaVA queries: call_llava(), ask(), ask_goal(), ask_patrol() │ ├── yolo_api.py # YOLO interface: init_yolo(), yolo_sees(), yolo_summary()... │ ├── odometry_api.py # Odometry wrapper: init_odometry(), get_position() │ ├── memory_api.py # Memory wrapper: init_memory(), log_cmd(), place_save/goto() │ ├── arm_api.py # Arm gestures: do_arm(), ARM_ACTIONS, ALL_ARM_NAMES │ └── imgsearch_api.py # Image search wrapper: init_imgsearch(), get_searcher() │ ├── Brain/ # Decision logic — imports ONLY from API/ │ ├── marcus_brain.py # Orchestrator: init_brain(), process_command(), run_terminal() │ ├── command_parser.py # 14 regex patterns + try_local_command() dispatcher │ ├── executor.py # execute_action(), merge_actions(), execute() │ └── marcus_memory.py # Session + place memory (Memory class, 817 lines) │ ├── Navigation/ # Movement + position tracking │ ├── goal_nav.py # navigate_to_goal() — YOLO+LLaVA hybrid visual search │ ├── patrol.py # patrol() — autonomous HSE patrol with PPE detection │ └── marcus_odometry.py # Odometry class — dead reckoning + ROS2 fallback │ ├── Vision/ # Computer vision │ ├── marcus_yolo.py # YOLO background inference: Detection class + query API │ └── marcus_imgsearch.py # ImageSearch class — reference image comparison │ ├── Server/ # WebSocket server (runs on Jetson) │ └── marcus_server.py # Full brain over WebSocket — same as run_marcus.py │ ├── Client/ # Remote clients (run on workstation) │ ├── marcus_cli.py # Terminal CLI client with color output │ └── marcus_client.py # Tkinter GUI client (3 tabs: Nav/Camera/LiDAR) │ ├── Bridge/ # ROS2 integration │ └── ros2_zmq_bridge.py # ROS2 /cmd_vel → ZMQ velocity bridge │ ├── Autonomous/ # Autonomous exploration mode │ └── marcus_autonomous.py # AutonomousMode — office exploration + mapping │ ├── Models/ # AI model weights │ ├── yolov8m.pt # YOLOv8 medium (50MB) │ └── Modelfile # Ollama model definition (FROM qwen2.5vl:7b) │ ├── Data/ # Runtime-generated data ONLY (no code) │ ├── Brain/Sessions/ # session_{id}_{date}/ — commands, detections, alerts │ ├── Brain/Exploration/ # Autonomous mode map data │ ├── History/Places/ # places.json — persistent named locations │ ├── History/Sessions/ # Session history │ ├── History/Prompts/ # Prompt history │ ├── Navigation/Maps/ # SLAM occupancy grids │ ├── Navigation/Waypoints/ # Saved waypoint files │ ├── Vision/Camera/ # Captured camera frames │ ├── Vision/Videos/ # Recorded video clips │ └── Vision/Frames/ # Detection snapshots │ ├── Doc/ # Documentation │ ├── architecture.md # This file │ ├── controlling.md # Startup guide + command reference │ ├── MARCUS_API.md # API reference │ └── note.txt # Quick notes │ ├── logs/ # Runtime logs (one per module) │ ├── brain.log │ ├── camera.log │ ├── server.log │ ├── zmq.log │ └── main.log │ └── Legacy/ # Archived originals └── marcus_nav.py # Original standalone prototype ``` --- ## Layer Architecture ``` ┌─────────────────────────────────────────────────┐ │ Entrypoints │ │ run_marcus.py (terminal) │ │ Server/marcus_server.py (WebSocket) │ └──────────────────┬──────────────────────────────┘ │ ┌──────────────────▼──────────────────────────────┐ │ Brain Layer │ │ marcus_brain.py — init_brain() │ │ — process_command(cmd) │ │ command_parser.py — 14 regex local commands │ │ executor.py — execute LLaVA decisions │ │ marcus_memory.py — session + place memory │ └──────────────────┬──────────────────────────────┘ │ imports only from API/ ┌──────────────────▼──────────────────────────────┐ │ API Layer │ │ zmq_api camera_api llava_api │ │ yolo_api odometry_api memory_api │ │ arm_api imgsearch_api │ └──────────────────┬──────────────────────────────┘ │ wraps ┌──────────────────▼──────────────────────────────┐ │ Navigation / Vision │ │ goal_nav.py marcus_yolo.py │ │ patrol.py marcus_imgsearch.py │ │ marcus_odometry.py │ └──────────────────┬──────────────────────────────┘ │ ┌──────────────────▼──────────────────────────────┐ │ Core Layer │ │ env_loader.py config_loader.py │ │ Logger.py logger.py │ └──────────────────┬──────────────────────────────┘ │ reads ┌──────────────────▼──────────────────────────────┐ │ Config / .env │ │ 11 JSON files + marcus_prompts.yaml │ └─────────────────────────────────────────────────┘ ``` **Rule**: Brain never imports from Vision/ or Navigation/ directly. It goes through the API layer. --- ## File-by-File Documentation ### Core/ #### `env_loader.py` (34 lines) Reads `.env` from the project root to resolve `PROJECT_ROOT`. Uses a minimal built-in parser (no `python-dotenv` dependency). Exports `PROJECT_ROOT` as a `Path` object resolved from `__file__`, so it works regardless of where the script is called from. Fallback default: `/home/unitree`. #### `config_loader.py` (30 lines) `load_config(name)` reads `Config/config_{name}.json` and caches the result. All modules call this instead of hardcoding constants. Also provides `config_path(relative)` to resolve relative paths (e.g., `"Models/yolov8m.pt"`) to absolute paths from PROJECT_ROOT. #### `Logger.py` (186 lines) Full logging engine from AI_Photographer. File-based only (no console output by default). Creates per-module log files in `logs/`. Handles write permission fallbacks, log name normalization, and corrupt log recovery. #### `logger.py` (51 lines) Project wrapper around `Logger.py`. Provides: - `log(message, level, module)` — write to `logs/{module}.log` - `log_and_print(message, level, module)` — write + print - `get_logger(module)` — get configured Logs instance --- ### API/ Each API file wraps one subsystem. They read their own config via `load_config()`, handle import errors gracefully with fallback stubs, and export clean public functions. #### `zmq_api.py` (49 lines) Creates a ZMQ PUB socket on startup (binds to `tcp://127.0.0.1:{zmq_port}`). Holosoma's RL policy connects to this socket as SUB and receives velocity commands at 50Hz. **Exports:** - `send_vel(vx, vy, vyaw)` — send velocity to Holosoma - `gradual_stop()` — 20 zero-velocity messages over 1 second - `send_cmd(cmd)` — send state command: "start", "walk", "stand", "stop" - `get_socket()` — return the shared PUB socket (for odometry to reuse) - `MOVE_MAP` — direction-to-velocity lookup: `{"forward": (0.3, 0, 0), "left": (0, 0, 0.3), ...}` **Config:** `config_ZMQ.json` — host, port, stop_iterations, stop_delay, step_pause #### `camera_api.py` (111 lines) Background thread captures RealSense D435I frames continuously. Stores both raw BGR (for YOLO) and base64 JPEG (for LLaVA). Auto-reconnects on USB drops with exponential backoff (2s → 4s → 8s, max 10s). **Exports:** - `start_camera()` — starts thread, returns `(raw_frame_ref, raw_lock)` for YOLO - `stop_camera()` — stops the thread - `get_frame()` — returns latest base64 JPEG (or last known good frame) - `get_frame_age()` — seconds since last successful frame - `get_raw_refs()` — returns shared numpy frame + lock for YOLO **Config:** `config_Camera.json` — width (424), height (240), fps (15), jpeg_quality (70) #### `llava_api.py` (107 lines) Interface to Ollama's vision-language model (Qwen2.5-VL 3B). Manages conversation history (6-turn sliding window) and user-told facts for context injection. **Exports:** - `call_llava(prompt, img_b64, num_predict, use_history)` — raw LLM call - `ask(command, img_b64)` — send command + image, get structured JSON response - `ask_goal(goal, img_b64)` — check if goal reached during navigation - `ask_patrol(img_b64)` — assess scene during autonomous patrol - `parse_json(raw)` — extract JSON from LLM output - `add_to_history(user_msg, assistant_msg)` — add to conversation context - `remember_fact(fact)` — store persistent fact (e.g., "Kassam is the programmer") - `OLLAMA_MODEL` — current model name from config **Config:** `config_Brain.json` — ollama_model, max_history, num_predict values, prompts #### `yolo_api.py` (66 lines) Lazy-loads YOLO from `Vision/marcus_yolo.py`. If import fails, all functions return safe defaults (empty sets, False, 0). No crash on missing dependencies. **Exports:** - `init_yolo(raw_frame_ref, frame_lock)` — start background inference - `yolo_sees(class_name)` — is class currently detected? - `yolo_count(class_name)` — how many instances? - `yolo_closest(class_name)` — nearest Detection object - `yolo_summary()` — human-readable summary: "2 persons (left, close) | 1 chair" - `yolo_ppe_violations()` — list of PPE violations - `yolo_person_too_close(threshold)` — safety proximity check - `yolo_all_classes()` — set of all currently detected classes - `yolo_fps()` — current inference rate - `YOLO_AVAILABLE` — True if YOLO loaded successfully #### `odometry_api.py` (40 lines) Wraps `Navigation/marcus_odometry.py`. Passes the shared ZMQ socket to avoid port conflicts. **Exports:** - `init_odometry(zmq_sock)` — start tracking, returns success bool - `get_position()` — returns `{"x": float, "y": float, "heading": float, "source": str}` - `odom` — the Odometry instance (or None) - `ODOM_AVAILABLE` — True if running #### `memory_api.py` (109 lines) Wraps `Brain/marcus_memory.py`. Also contains place memory functions that combine memory + odometry. **Exports:** - `init_memory()` — start session, load places - `log_cmd(cmd, response, duration)` — log command to session - `log_detection(class_name, position, distance)` — log YOLO detection with odometry position - `place_save(name)` — save current position as named place - `place_goto(name)` — navigate to saved place using odometry - `places_list_str()` — formatted table of all saved places - `mem` — Memory instance (or None) - `MEMORY_AVAILABLE` — True if running #### `arm_api.py` (16 lines) Stub for GR00T N1.5 arm control. Currently prints a message. ARM_ACTIONS and ARM_ALIASES loaded from `config_Arm.json`. **Exports:** - `do_arm(action)` — execute arm gesture (currently stub) - `ARM_ACTIONS` — dict of action name → action ID - `ARM_ALIASES` — dict of common names → action ID - `ALL_ARM_NAMES` — set of all recognized arm command names - `ARM_AVAILABLE` — False (pending GR00T integration) #### `imgsearch_api.py` (38 lines) Wraps `Vision/marcus_imgsearch.py`. Wires camera, ZMQ, LLaVA, and YOLO into the ImageSearch class. **Exports:** - `init_imgsearch(get_frame_fn, send_vel_fn, ...)` — wire dependencies - `get_searcher()` — return ImageSearch instance (or None) --- ### Brain/ #### `marcus_brain.py` (372 lines) **The orchestrator.** Contains all the brain's public functions used by both terminal and server modes. **Key functions:** - `init_brain()` — initializes all subsystems in order: camera → YOLO → odometry → memory → image search → Holosoma boot → LLaVA warmup - `process_command(cmd) → dict` — routes a command through the full pipeline and returns `{"type", "speak", "action", "elapsed"}`. Pipeline order: 1. YOLO status check 2. Image search (`search/`) 3. Natural language goal auto-detect ("find a person", "look for a bottle") 4. Explicit goal (`goal/ ...`) 5. Patrol (`patrol`) 6. Local commands (place memory, odometry, help) via `command_parser.py` 7. Talk-only questions (what/who/where/how) 8. Greetings (hi/hello/salam) — instant, no AI 9. "Come to me" shortcut — instant forward 2s 10. Multi-step compound ("turn right then walk forward") 11. Standard LLaVA command — full AI inference - `run_terminal()` — terminal input loop (used by `run_marcus.py`) - `get_brain_status()` — returns dict of all subsystem states - `shutdown()` — clean stop of all subsystems #### `command_parser.py` (300 lines) 14 compiled regex patterns that intercept commands before they reach LLaVA. Handles: | Pattern | Example | Action | |---------|---------|--------| | `_RE_REMEMBER` | "remember this as door" | Save current position | | `_RE_GOTO` | "go to door" | Navigate to saved place | | `_RE_FORGET` | "forget door" | Delete saved place | | `_RE_RENAME` | "rename door to entrance" | Rename place | | `_RE_WALK_DIST` | "walk 1 meter" | Precise odometry walk | | `_RE_WALK_BACK` | "walk backward 2 meters" | Precise backward walk | | `_RE_TURN_DEG` | "turn right 90 degrees" | Precise odometry turn | | `_RE_PATROL_RT` | "patrol: door → desk → exit" | Named waypoint patrol | | `_RE_LAST_CMD` | "last command" | Recall from session | | `_RE_DO_AGAIN` | "do that again" | Repeat last command | | `_RE_UNDO` | "undo" | Reverse last movement | | `_RE_LAST_SESS` | "last session" | Previous session summary | | `_RE_WHERE` | "where am I" | Current odometry position | | `_RE_GO_HOME` | "go home" | Return to start position | Also handles: session summary, help text, examples text. #### `executor.py` (81 lines) Executes LLaVA movement decisions. Converts the JSON action list into sustained ZMQ velocity commands. **Functions:** - `execute_action(move, duration)` — single movement step. Uses `MOVE_MAP` for velocities, intercepts arm names that LLaVA sometimes puts in the actions list - `move_step(move, duration)` — lightweight version for goal/patrol loops (no full gradual_stop between steps) - `merge_actions(actions)` — combines consecutive same-direction steps: 5x right 1.0s → 1x right 5.0s - `execute(d)` — full decision execution: movements in sequence, arm gesture in background thread #### `marcus_memory.py` (817 lines) Persistent session and place memory. Thread-safe with atomic JSON writes. **Place memory:** - Save named positions with odometry coordinates - Fuzzy name matching (typo tolerance) - Name sanitization (special chars → underscores) - Rename, delete, list operations **Session memory:** - Per-session folders: `session_{id}_{date}/` - Logs: commands.json, detections.json, alerts.json, summary.txt - 60-second auto-flush in background thread - Emergency save via `atexit` on crash - YOLO detection deduplication (5-second window) - Cross-session recall ("what did you do last session?") - Auto-prune old sessions (keeps last 50) --- ### Navigation/ #### `goal_nav.py` (154 lines) Visual goal navigation. Robot rotates continuously while scanning for a target using YOLO (fast, 0.4s checks) with LLaVA fallback (slow but handles non-YOLO classes). **How it works:** 1. Parse goal to extract YOLO target class (via aliases: "guy" → "person", "sofa" → "couch") 2. Start continuous rotation in background thread 3. YOLO fast-check every 0.4s — if target class found: - Extract compound condition ("holding a phone", "wearing red") - If compound: ask LLaVA to verify ("Is the person holding a phone? yes/no") - If verified (or no compound): stop and report 4. LLaVA fallback for non-YOLO classes: send goal_prompt with image, check if `reached: true` 5. Max steps limit (40 default), Ctrl+C to abort **Config:** `config_Navigation.json` — goal_aliases, yolo_goal_classes, max_steps, rotation_speed #### `patrol.py` (106 lines) Autonomous HSE inspection patrol. Timed loop with YOLO PPE detection and LLaVA scene assessment. **How it works:** 1. YOLO checks for PPE violations (no helmet, no vest) and logs alerts 2. Safety: stop if person too close (size_ratio > 0.3) 3. LLaVA assesses scene: observation, alert, next_move, duration 4. Executes lightweight movement steps between checks 5. All detections and alerts logged to session memory **Config:** `config_Patrol.json` — default_duration_minutes, proximity_threshold #### `marcus_odometry.py` (808 lines) Precise position tracking and movement control. **Dual source** (priority order): 1. ROS2 `/dog_odom` — joint encoder data, ±2cm accuracy (currently disabled due to DDS memory conflict) 2. Dead reckoning — velocity × time integration at 20Hz, ±10cm accuracy **Movement API:** - `walk_distance(meters, speed, direction)` — odometry feedback loop, 5cm tolerance, safety timeout - `turn_degrees(degrees, speed)` — heading feedback with 0°/360° wrap-around, 2° tolerance - `navigate_to(x, y, heading)` — rotate to face target, walk straight, rotate to final heading - `return_to_start()` — navigate back to where `start()` was called - `patrol_route(waypoints, loop)` — walk through list of waypoints in order All movements have time-based fallbacks when odometry isn't running. Speed clamped at 0.4 m/s. KeyboardInterrupt handling with gradual stop. --- ### Vision/ #### `marcus_yolo.py` (474 lines) Background YOLO inference engine. Runs in a daemon thread, reads from the shared camera frame buffer. **Detection class:** Each detection has class_name, confidence, bbox, position (left/center/right), distance_estimate (very close/close/medium/far), size_ratio. **Public API:** - `start_yolo(raw_frame_ref, frame_lock)` — start inference thread - `yolo_sees(class_name, min_confidence)` — check if class detected - `yolo_count(class_name)` — count instances - `yolo_closest(class_name)` — largest bbox (closest object) - `yolo_summary()` — "2 persons (left, close) | 1 chair (center, medium)" - `yolo_ppe_violations()` — PPE-specific detections - `yolo_person_too_close(threshold)` — safety proximity check **Config:** `config_Vision.json` — model path, confidence (0.45), 19 tracked COCO classes #### `marcus_imgsearch.py` (501 lines) Image-guided search. User provides a reference photo; robot rotates and LLaVA compares camera frames to the reference. **How it works:** 1. Load reference image (resize to 336x336 for efficiency) 2. Start continuous rotation 3. Optional YOLO pre-filter (find "person" class before running LLaVA) 4. LLaVA comparison: sends [reference, current_frame] as two images 5. Parse JSON response: found, confidence (low/medium/high), position, description 6. Stop on medium/high confidence match Supports text-only search (no reference image) using hint description. --- ### Server/ #### `marcus_server.py` (224 lines) WebSocket server that wraps the full Marcus brain. Initializes all subsystems (camera, YOLO, odometry, memory, LLaVA) on startup, then accepts commands via WebSocket. **Architecture:** - Calls `init_brain()` from `marcus_brain.py` — same init as terminal mode - Each incoming `"command"` message runs `process_command(cmd)` in a thread pool - Broadcasts camera frames to all clients at ~10Hz - Auto-detects eth0 and wlan0 IPs for the connection banner **WebSocket message types:** | Client sends | Server responds | |---|---| | `{"type": "command", "command": "turn left"}` | `{"type": "thinking"}` then `{"type": "decision", "action": "LEFT", "speak": "Turning left", ...}` | | `{"type": "capture"}` | `{"type": "capture_result", "ok": true, "data": ""}` | | `{"type": "ping"}` | `{"type": "pong", "lidar": true, "status": {...}}` | **Config:** `config_Network.json` — jetson_ip, jetson_wlan_ip, websocket_port --- ### Client/ #### `marcus_cli.py` (288 lines) Terminal CLI client for remote control. Connects to the server via WebSocket. **Features:** - Connection menu: choose eth0 / wlan0 / custom IP - Color-coded output: green=forward, cyan=turn, red=stop, orange=greeting/local - Displays `Marcus: ` for every response - System commands: `status`, `camera`, `profile `, `capture`, `help`, `q` - Async receiver for real-time decision display while typing - Command history (not persisted) #### `marcus_client.py` (1021 lines) Tkinter GUI client with 3 tabs: - **Navigation** — live camera view, command entry, quick buttons, decision log - **Camera** — profile switcher, custom resolution, capture, preview toggle - **LiDAR** — full SLAM Commander (runs locally via SlamEngineClient from G1_Lootah/Lidar) --- ### Bridge/ #### `ros2_zmq_bridge.py` (66 lines) ROS2 Foxy node that subscribes to `/cmd_vel` (TwistStamped) and `holosoma/other_input` (String), forwarding to the ZMQ PUB socket. Requires Python 3.8 + ROS2 sourced. Used when external ROS2 nodes need to send velocity commands to Holosoma. --- ### Autonomous/ #### `marcus_autonomous.py` (516 lines) Autonomous office exploration mode. Marcus moves freely, identifies areas and objects, builds a live map, saves everything to a session folder. **State machine:** IDLE → EXPLORING → IDLE **Exploration loop:** 1. Safety: stop if person too close 2. Record YOLO detections + odometry path point 3. Every 5 steps: LLaVA scene assessment (area_type, objects, observation) 4. Move forward; turn when blocked (alternates left/right) 5. Save interesting frames to disk 6. Auto-flush to disk every 20 steps **Output:** `Data/Brain/Exploration/map_{id}_{date}/` — observations.json, path.json, summary.txt, frames/ --- ## Data Flow ### Command: "turn right" ``` User types "turn right" │ ▼ process_command("turn right") │ (no regex match — falls through to LLaVA) ▼ llava_api.ask("turn right", camera_frame) │ sends to Ollama qwen2.5vl:3b ▼ LLaVA returns: {"actions":[{"move":"right","duration":2.0}], "speak":"Turning right"} │ ▼ executor.execute(d) │ merge_actions → execute_action("right", 2.0) ▼ zmq_api.send_vel(vyaw=-0.3) × 40 times over 2.0 seconds │ ▼ Holosoma RL policy receives velocity → robot turns right │ ▼ zmq_api.gradual_stop() → 20 zero-velocity messages ``` ### Command: "remember this as door" ``` User types "remember this as door" │ ▼ process_command("remember this as door") │ matches _RE_REMEMBER regex ▼ command_parser.try_local_command() │ calls memory_api.place_save("door") ▼ odometry_api.get_position() → {"x": 1.2, "y": 0.5, "heading": 90.0} │ ▼ marcus_memory.Memory.save_place("door", x=1.2, y=0.5, heading=90.0) │ atomic write to Data/History/Places/places.json ▼ Returns: {"type": "local", "speak": "Done", "action": "LOCAL"} ``` ### Command: "goal/ find a person" ``` User types "goal/ find a person" │ ▼ process_command() → navigate_to_goal("find a person") │ ▼ _goal_yolo_target("find a person") → "person" │ YOLO mode (not LLaVA fallback) ▼ Start continuous rotation thread (vyaw=0.3) │ ▼ Loop every 0.4s: │ yolo_sees("person") → False → keep rotating │ yolo_sees("person") → False → keep rotating │ yolo_sees("person") → True! │ ▼ │ _extract_extra_condition() → None (no compound) │ ▼ │ gradual_stop() │ yolo_closest("person") → Detection(center, close) │ log_detection("person", "center", "close") ▼ Returns: {"type": "goal", "speak": "Goal navigation: find a person"} ``` --- ## Hardware Stack ``` Unitree G1 EDU (29 DOF) │ ├── Jetson Orin NX (16GB VRAM) │ ├── Holosoma RL policy (50Hz) — locomotion joints 0-11 │ ├── Ollama + Qwen2.5-VL 3B — vision-language understanding │ ├── YOLOv8m — real-time object detection (CPU, 320px) │ └── Marcus Brain — this project │ ├── RealSense D435I — RGB camera (424x240 @ 15fps) │ ├── Livox Mid360 LiDAR — 3D point cloud (via SlamEngineClient) │ └── ZMQ PUB/SUB — velocity commands (tcp://127.0.0.1:5556) ├── Marcus Brain PUB → Holosoma SUB └── ROS2 Bridge PUB → Holosoma SUB (alternative) ``` --- ## Startup Order 1. **Holosoma** — must be running first (RL locomotion policy) 2. **Marcus Server** (`python3 -m Server.marcus_server`) — or Brain (`python3 run_marcus.py`) 3. **Client** (`python3 -m Client.marcus_cli`) — connects to server Cannot run Server and Brain simultaneously (both bind ZMQ port 5556). --- ## Config Reference | File | Key values | |------|-----------| | `config_ZMQ.json` | zmq_host: 127.0.0.1, zmq_port: 5556 | | `config_Camera.json` | 424x240 @ 15fps, JPEG quality 70 | | `config_Brain.json` | qwen2.5vl:3b, history 6 turns, prompts | | `config_Vision.json` | yolov8m.pt, confidence 0.45, 19 classes | | `config_Navigation.json` | move_map velocities, goal aliases | | `config_Network.json` | eth0: 192.168.123.164, wlan0: 10.255.254.86, port 8765 | | `config_Odometry.json` | walk 0.25 m/s, turn 0.25 rad/s, 5cm tolerance | | `config_Memory.json` | Data/Brain/Sessions, Data/History/Places | | `config_Patrol.json` | 5 min default, proximity 0.3 | | `config_Arm.json` | 16 gestures, arm_available: false (GR00T pending) | --- ## Line Count Summary | Layer | Files | Lines | |-------|-------|-------| | Core | 4 | 301 | | API | 8 | 536 | | Brain | 4 | 1,570 | | Navigation | 3 | 1,068 | | Vision | 2 | 975 | | Server | 1 | 224 | | Client | 2 | 1,309 | | Bridge | 1 | 66 | | Autonomous | 1 | 516 | | Entrypoint | 1 | 16 | | **Total** | **27** | **6,581** |