kassam/Marcus

Fork 0

kassam 211d4f52ab Update 2026-04-27 09:39:12

2026-04-27 09:39:13 +04:00

13 KiB

Raw Permalink Blame History

Marcus — Control & Startup Guide

Robot persona: Sanad (wake word + self-intro; project code lives under Marcus/) Updated: 2026-04-21

Quick Start

Prerequisites (Jetson Orin NX, JetPack 5.1.1)

# Terminal 1 — Start Holosoma (locomotion policy, in hsinference env)
source ~/.holosoma_deps/miniconda3/bin/activate hsinference
cd ~/holosoma
~/.holosoma_deps/miniconda3/envs/hsinference/bin/python3 \
  src/holosoma_inference/holosoma_inference/run_policy.py \
  inference:g1-29dof-loco \
  --task.model-path src/holosoma_inference/holosoma_inference/models/loco/g1_29dof/fastsac_g1_29dof.onnx \
  --task.velocity-input zmq \
  --task.state-input zmq \
  --task.interface eth0

# Terminal 2 — Ollama server (leave running)
ollama serve > /tmp/ollama.log 2>&1 &
sleep 3
ollama list                # confirm qwen2.5vl:3b present

Option A — Terminal Mode (on Jetson)

# Terminal 3 — Start Marcus Brain
conda activate marcus
cd ~/Marcus
python3 run_marcus.py

Direct keyboard control + voice input (say "Sanad" to wake). Expected banner on boot:

================================================
         SANAD AI BRAIN — READY
================================================
  model     : qwen2.5vl:3b
  yolo      : True
  odometry  : True
  memory    : True
  lidar     : True
  voice     : True
  camera    : 424x240@15

Option B — Server + Client (remote)

# Terminal 3 (Jetson) — Start Server
conda activate marcus
cd ~/Marcus
python3 -m Server.marcus_server

# Terminal 4 (Workstation) — Connect Client
cd ~/Robotics_workspace/yslootahtech/Project/Marcus
python3 -m Client.marcus_cli

Client prompts for connection:

  Connection options:
    1) eth0  — 192.168.123.164:8765
    2) wlan0 — 10.255.254.86:8765
    3) custom
  Choose [1/2/3] or IP:

Or skip prompt: python3 -m Client.marcus_cli --ip 192.168.123.164 --port 8765

Voice

Wake word: "Sanad" — gated at dispatch time on Gemini's transcript. Common mishearings ("Sannad", "Senad", "Sa nad", etc.) all accepted via the 33-entry config_Voice.json::stt.wake_words fuzzy list. Word-boundary match, not substring (so "standard" doesn't trigger off "sand").
Mic: G1 on-board array mic, captured via UDP multicast 239.168.123.161:5555 (16 kHz mono, 16-bit PCM). No USB mic, no acoustic wake detector.
STT: Gemini Live (gemini-2.5-flash-native-audio-preview-12-2025) with response_modalities=["TEXT"] — Gemini does the transcription. The mic is streamed in 32 ms chunks; Gemini's server-side VAD decides turn boundaries. The Gemini WebSocket runs in a separate Python 3.10+ subprocess (Voice/gemini_runner.py) because google-genai doesn't support Python 3.8 (which marcus is pinned to). Marcus spawns the runner via the gemini_sdk conda env and reads JSON-line transcripts off its stdout. Requires pip install google-genai inside the gemini_sdk env (not the marcus env) and an API key in MARCUS_GEMINI_API_KEY (or SANAD_GEMINI_API_KEY fallback). Set MARCUS_GEMINI_PYTHON (or stt.gemini_python_path) if the gemini_sdk env lives somewhere besides ~/miniconda3/envs/gemini_sdk/.
TTS: Unitree client.TtsMaker() → G1 body speaker. English only. Gemini does NOT speak — only Marcus's brain reply is spoken, via TtsMaker.
Echo prevention: VoiceModule.flush_mic() is called by Marcus's brain before AND after audio_api.speak() so TtsMaker output isn't transcribed back into Gemini as a fake user utterance.

Interaction flow: speak "Sanad" + your request → Gemini transcribes (Marcus prints USER: ...) → wake-word gate passes → brain handles it (motion, VLM Q&A, place memory, …) → reply spoken through G1 speaker.

Examples:

"Sanad, turn right" → robot turns right, brain says "Done"
"Sanad, what do you see" → Qwen2.5-VL describes the camera frame, brain speaks the description
"Sanad" alone (no payload) → no dispatch (the persona prompt tells Gemini to acknowledge silently)
"what do you see" (no "Sanad") → wake-word gate blocks, no dispatch, no reply (avoids false motion from background chatter)

To disable voice entirely, set subsystems.voice: false in config_Brain.json — Marcus will boot text-only without opening the Gemini WebSocket.

Tuning knobs — all in config_Voice.json::stt:

Real "Sanad" misheard by Gemini and not matching wake_words → check logs/transcript.log for the HEARD line, add the variant to wake_words
Commands transcribed wrong → field accuracy is mostly Gemini's job; for room-specific tuning try gemini_vad_silence_duration_ms (longer = more patience for hesitations)
VAD too eager / too slow → gemini_vad_start_sensitivity (HIGH / LOW) and gemini_vad_end_sensitivity (LOW for slow speech, HIGH to cut early)
Filler words triggering dispatch → expand garbage_patterns
Robot too talkative / too terse → edit gemini_system_prompt (or point gemini_system_prompt_file at a .txt for richer personas)
Session reconnects too aggressive → raise gemini_max_consecutive_errors
Disable per-turn WAV saves → gemini_record_enabled: false

Command Reference

Movement

Command	Action
`turn left` / `turn right`	Rotate (2s default)
`walk forward` / `move back`	Walk (2s default)
`walk 1 meter`	Precise odometry walk
`walk backward 2 meters`	Precise backward walk
`turn right 90 degrees`	Precise odometry turn
`turn right then walk forward`	Multi-step compound
`come to me` / `come here`	Forward 2s (instant, no AI)
`stop`	Gradual stop

Vision

Command	Action
`what do you see`	Qwen2.5-VL describes camera view
`describe the room`	Qwen2.5-VL scene description
`is anyone here`	Qwen2.5-VL person check
`yolo`	Show YOLO detection status

Command	Action
`goal/ stop when you see a person`	YOLO fast search + stop
`goal/ find a laptop`	YOLO + Qwen-VL search
`goal/ stop when you see a guy holding a phone`	YOLO + Qwen-VL compound verification
`find a person`	Auto-detected as goal (no prefix needed)
`look for a bottle`	Auto-detected as goal

Place Memory

Command	Action
`remember this as door`	Save current position
`go to door`	Navigate to saved place
`places`	List all saved places
`forget door`	Delete place
`rename door to entrance`	Rename place
`where am I`	Show odometry position
`go home`	Return to start position

Patrol

Command	Action
`patrol`	Autonomous patrol (prompts for duration)
`patrol: door → desk → exit`	Named waypoint patrol

Image Search (requires `subsystems.imgsearch: true`)

Command	Action
`search/ /path/to/photo.jpg`	Find target from reference image
`search/ /path/to/photo.jpg person in blue shirt`	Image + hint
`search/ person in blue shirt`	Text-only search

Session Memory

Command	Action
`last command`	Show last typed command
`do that again`	Repeat last command
`undo`	Reverse last movement
`last session`	Previous session summary
`session summary`	Current session stats

Autonomous Mode

Command	Action
`auto on`	Start autonomous exploration
`auto off`	Stop
`auto status`	Current step / observations
`auto save`	Snapshot observations to disk

System

Command	Action
`help`	Command reference
`example`	Usage examples
`lidar` / `lidar status`	SLAM engine pose + health
`q` / `quit`	Shutdown

Client-Only Commands (CLI)

Command	Action
`status`	Ping server + LiDAR status
`camera`	Get camera configuration
`profile low/medium/high/full`	Switch camera profile
`capture`	Take a photo

Subsystem flags (`Config/config_Brain.json`)

Control what initializes at boot. Defaults:

"subsystems": {
  "lidar":      true,
  "voice":      true,
  "imgsearch":  false,
  "autonomous": true
}

Set any to false to skip that subsystem's init. Boot time drops roughly:

voice: false → ~1 s faster (no Gemini WebSocket open, no mic thread)
lidar: false → ~1 s faster (no SLAM subprocess spawn)
imgsearch: false → already the default; re-enable only when you need search/ …
autonomous: false → minor, but removes the AutonomousMode init

Network Configuration

Interface	IP	Use
`eth0`	192.168.123.164	Robot internal network (Jetson ↔ G1 ↔ LiDAR)
`wlan0`	10.255.254.86	Office WiFi (Jetson ↔ Workstation)

Service	Port	Protocol
Marcus WebSocket	8765	ws://
ZMQ velocity (→ Holosoma)	5556	tcp:// (PUB/SUB)
Ollama API	11434	HTTP (localhost only)
G1 audio multicast (mic)	5555	UDP multicast 239.168.123.161
Livox Mid-360 (LiDAR)	192.168.123.120	UDP (Livox SDK)

Most values configurable in Config/config_Network.json and config_Voice.json::mic_udp.

Troubleshooting

Issue	Cause	Fix
Banner shows `SANAD AI BRAIN — READY` but nothing moves	Holosoma not running	Start Holosoma (Terminal 1) first
`RuntimeError: CUDA not available` on boot	Wrong torch build on Jetson	See `Doc/environment.md` section 9.2 — reinstall the NVIDIA Jetson torch wheel
`llama runner process has terminated: %!w(<nil>)`	Ollama compute graph OOM	Already capped at `num_batch=128 / num_ctx=2048`. Check `free -h`; kill stale Ollama runners: `pkill -f "ollama runner"`
Traceback mentioning `multiprocessing/spawn.py` + ZMQ port 5556	Old import-time ZMQ bind regressed	Pull latest `API/zmq_api.py` — must call `init_zmq()` from the parent only
`[Camera] No frame for 10s` during warmup	Ollama blocking the main thread, or USB bandwidth	Warmup is ~10–15 s on first Qwen load; subsequent commands are fast
Wake word never fires	Gemini transcribed but `_has_wake_word` rejected	Check `logs/transcript.log` — if `HEARD ...` shows what Gemini heard but no `CMD ...` follows, the transcript has a misheard "Sanad" variant; add the root form to `config_Voice.json::stt.wake_words`.
Voice silent on boot	Missing Gemini API key	Check `logs/voice.log` for `No Gemini API key found`. Set `export MARCUS_GEMINI_API_KEY='...'` before launching `run_marcus.py`.
`google-genai not installed` in runner stderr	Package missing in gemini_sdk env	Activate the gemini_sdk conda env and `pip install google-genai` THERE (not in marcus).
`no Python 3.10+ env found for the Gemini runner`	gemini_sdk env in non-default path	Set `export MARCUS_GEMINI_PYTHON=/path/to/gemini_sdk/bin/python` or edit `stt.gemini_python_path`.
Mic silent	G1 audio service not publishing	Run `python3 Voice/builtin_mic.py` standalone — must print "OK — mic is capturing audio"
`[LiDAR] No data yet (will keep trying)`	SLAM worker still spawning (normal) or Livox network	First ~5 s normal. If persists, `ping 192.168.123.120`
Client can't connect	Wrong IP or server not running	Verify `ollama serve &` and `python3 -m Server.marcus_server` are both up

File Locations

What	Path
Brain code	`~/Marcus/Brain/`
Server	`~/Marcus/Server/marcus_server.py`
Voice	`~/Marcus/Voice/{audio_io,builtin_mic,builtin_tts,gemini_script,turn_recorder,marcus_voice}.py`
Config	`~/Marcus/Config/`
Prompts	`~/Marcus/Config/marcus_prompts.yaml`
YOLO model	`~/Marcus/Models/yolov8m.pt`
Session data	`~/Marcus/Data/Brain/Sessions/`
Places	`~/Marcus/Data/History/Places/places.json`
Logs	`~/Marcus/logs/`

See Doc/architecture.md for full project structure and file-by-file documentation. See Doc/environment.md for the verified Jetson software stack. See Doc/pipeline.md for the end-to-end data flow. See Doc/functions.md for the full function inventory (AST-generated).

Language policy

English only. Arabic was removed from the codebase on 2026-04-21:

Config/config_Voice.json::stt.wake_words — English fuzzy variants only (33 entries), excludes common English words that would false-trigger (said, sand, sunday, etc.)
Config/marcus_prompts.yaml — no Arabic examples left in any of the 7 prompts
API/audio_api.py::speak(text) — rejects non-ASCII (the G1 TtsMaker silently maps Arabic to Chinese, which nobody wants)
Brain/marcus_brain.py — greeting and talk-pattern regexes match English only

If you need Arabic back, the cleanest paths are either Piper TTS (offline) or edge-tts (online) — see git log for the removed implementations.

Logs

All .log files in logs/ rotate at 5 MB × 3 backups by default. To change:

export MARCUS_LOG_MAX_BYTES=10000000     # 10 MB per file
export MARCUS_LOG_BACKUP_COUNT=5          # keep 5 rotations
export MARCUS_LOG_DIR=/var/log/marcus     # move logs off SD card

Per-module log files:

brain.log, camera.log, lidar.log, zmq.log, server.log, main.log — via Core.logger.log()
voice.log — via stdlib logging in audio_api.py + marcus_voice.py
Session JSON: Data/Brain/Sessions/session_NNN_YYYY-MM-DD/{commands,detections,alerts,places}.json

13 KiB Raw Permalink Blame History Unescape Escape