13 KiB
Marcus — Control & Startup Guide
Robot persona: Sanad (wake word + self-intro; project code lives under Marcus/)
Updated: 2026-04-21
Quick Start
Prerequisites (Jetson Orin NX, JetPack 5.1.1)
# Terminal 1 — Start Holosoma (locomotion policy, in hsinference env)
source ~/.holosoma_deps/miniconda3/bin/activate hsinference
cd ~/holosoma
~/.holosoma_deps/miniconda3/envs/hsinference/bin/python3 \
src/holosoma_inference/holosoma_inference/run_policy.py \
inference:g1-29dof-loco \
--task.model-path src/holosoma_inference/holosoma_inference/models/loco/g1_29dof/fastsac_g1_29dof.onnx \
--task.velocity-input zmq \
--task.state-input zmq \
--task.interface eth0
# Terminal 2 — Ollama server (leave running)
ollama serve > /tmp/ollama.log 2>&1 &
sleep 3
ollama list # confirm qwen2.5vl:3b present
Option A — Terminal Mode (on Jetson)
# Terminal 3 — Start Marcus Brain
conda activate marcus
cd ~/Marcus
python3 run_marcus.py
Direct keyboard control + voice input (say "Sanad" to wake). Expected banner on boot:
================================================
SANAD AI BRAIN — READY
================================================
model : qwen2.5vl:3b
yolo : True
odometry : True
memory : True
lidar : True
voice : True
camera : 424x240@15
Option B — Server + Client (remote)
# Terminal 3 (Jetson) — Start Server
conda activate marcus
cd ~/Marcus
python3 -m Server.marcus_server
# Terminal 4 (Workstation) — Connect Client
cd ~/Robotics_workspace/yslootahtech/Project/Marcus
python3 -m Client.marcus_cli
Client prompts for connection:
Connection options:
1) eth0 — 192.168.123.164:8765
2) wlan0 — 10.255.254.86:8765
3) custom
Choose [1/2/3] or IP:
Or skip prompt: python3 -m Client.marcus_cli --ip 192.168.123.164 --port 8765
Voice
- Wake word: "Sanad" — gated at dispatch time on Gemini's transcript. Common mishearings ("Sannad", "Senad", "Sa nad", etc.) all accepted via the 33-entry
config_Voice.json::stt.wake_wordsfuzzy list. Word-boundary match, not substring (so "standard" doesn't trigger off "sand"). - Mic: G1 on-board array mic, captured via UDP multicast
239.168.123.161:5555(16 kHz mono, 16-bit PCM). No USB mic, no acoustic wake detector. - STT: Gemini Live (
gemini-2.5-flash-native-audio-preview-12-2025) withresponse_modalities=["TEXT"]— Gemini does the transcription. The mic is streamed in 32 ms chunks; Gemini's server-side VAD decides turn boundaries. The Gemini WebSocket runs in a separate Python 3.10+ subprocess (Voice/gemini_runner.py) becausegoogle-genaidoesn't support Python 3.8 (which marcus is pinned to). Marcus spawns the runner via thegemini_sdkconda env and reads JSON-line transcripts off its stdout. Requirespip install google-genaiinside the gemini_sdk env (not the marcus env) and an API key inMARCUS_GEMINI_API_KEY(orSANAD_GEMINI_API_KEYfallback). SetMARCUS_GEMINI_PYTHON(orstt.gemini_python_path) if the gemini_sdk env lives somewhere besides~/miniconda3/envs/gemini_sdk/. - TTS: Unitree
client.TtsMaker()→ G1 body speaker. English only. Gemini does NOT speak — only Marcus's brain reply is spoken, via TtsMaker. - Echo prevention:
VoiceModule.flush_mic()is called by Marcus's brain before AND afteraudio_api.speak()so TtsMaker output isn't transcribed back into Gemini as a fake user utterance.
Interaction flow: speak "Sanad" + your request → Gemini transcribes (Marcus prints USER: ...) → wake-word gate passes → brain handles it (motion, VLM Q&A, place memory, …) → reply spoken through G1 speaker.
Examples:
- "Sanad, turn right" → robot turns right, brain says "Done"
- "Sanad, what do you see" → Qwen2.5-VL describes the camera frame, brain speaks the description
- "Sanad" alone (no payload) → no dispatch (the persona prompt tells Gemini to acknowledge silently)
- "what do you see" (no "Sanad") → wake-word gate blocks, no dispatch, no reply (avoids false motion from background chatter)
To disable voice entirely, set subsystems.voice: false in config_Brain.json — Marcus will boot text-only without opening the Gemini WebSocket.
Tuning knobs — all in config_Voice.json::stt:
- Real "Sanad" misheard by Gemini and not matching wake_words → check
logs/transcript.logfor theHEARDline, add the variant towake_words - Commands transcribed wrong → field accuracy is mostly Gemini's job; for room-specific tuning try
gemini_vad_silence_duration_ms(longer = more patience for hesitations) - VAD too eager / too slow →
gemini_vad_start_sensitivity(HIGH/LOW) andgemini_vad_end_sensitivity(LOWfor slow speech,HIGHto cut early) - Filler words triggering dispatch → expand
garbage_patterns - Robot too talkative / too terse → edit
gemini_system_prompt(or pointgemini_system_prompt_fileat a.txtfor richer personas) - Session reconnects too aggressive → raise
gemini_max_consecutive_errors - Disable per-turn WAV saves →
gemini_record_enabled: false
Command Reference
Movement
| Command | Action |
|---|---|
turn left / turn right |
Rotate (2s default) |
walk forward / move back |
Walk (2s default) |
walk 1 meter |
Precise odometry walk |
walk backward 2 meters |
Precise backward walk |
turn right 90 degrees |
Precise odometry turn |
turn right then walk forward |
Multi-step compound |
come to me / come here |
Forward 2s (instant, no AI) |
stop |
Gradual stop |
Vision
| Command | Action |
|---|---|
what do you see |
Qwen2.5-VL describes camera view |
describe the room |
Qwen2.5-VL scene description |
is anyone here |
Qwen2.5-VL person check |
yolo |
Show YOLO detection status |
Goal Navigation
| Command | Action |
|---|---|
goal/ stop when you see a person |
YOLO fast search + stop |
goal/ find a laptop |
YOLO + Qwen-VL search |
goal/ stop when you see a guy holding a phone |
YOLO + Qwen-VL compound verification |
find a person |
Auto-detected as goal (no prefix needed) |
look for a bottle |
Auto-detected as goal |
Place Memory
| Command | Action |
|---|---|
remember this as door |
Save current position |
go to door |
Navigate to saved place |
places |
List all saved places |
forget door |
Delete place |
rename door to entrance |
Rename place |
where am I |
Show odometry position |
go home |
Return to start position |
Patrol
| Command | Action |
|---|---|
patrol |
Autonomous patrol (prompts for duration) |
patrol: door → desk → exit |
Named waypoint patrol |
Image Search (requires subsystems.imgsearch: true)
| Command | Action |
|---|---|
search/ /path/to/photo.jpg |
Find target from reference image |
search/ /path/to/photo.jpg person in blue shirt |
Image + hint |
search/ person in blue shirt |
Text-only search |
Session Memory
| Command | Action |
|---|---|
last command |
Show last typed command |
do that again |
Repeat last command |
undo |
Reverse last movement |
last session |
Previous session summary |
session summary |
Current session stats |
Autonomous Mode
| Command | Action |
|---|---|
auto on |
Start autonomous exploration |
auto off |
Stop |
auto status |
Current step / observations |
auto save |
Snapshot observations to disk |
System
| Command | Action |
|---|---|
help |
Command reference |
example |
Usage examples |
lidar / lidar status |
SLAM engine pose + health |
q / quit |
Shutdown |
Client-Only Commands (CLI)
| Command | Action |
|---|---|
status |
Ping server + LiDAR status |
camera |
Get camera configuration |
profile low/medium/high/full |
Switch camera profile |
capture |
Take a photo |
Subsystem flags (Config/config_Brain.json)
Control what initializes at boot. Defaults:
"subsystems": {
"lidar": true,
"voice": true,
"imgsearch": false,
"autonomous": true
}
Set any to false to skip that subsystem's init. Boot time drops roughly:
voice: false→ ~1 s faster (no Gemini WebSocket open, no mic thread)lidar: false→ ~1 s faster (no SLAM subprocess spawn)imgsearch: false→ already the default; re-enable only when you needsearch/ …autonomous: false→ minor, but removes the AutonomousMode init
Network Configuration
| Interface | IP | Use |
|---|---|---|
eth0 |
192.168.123.164 | Robot internal network (Jetson ↔ G1 ↔ LiDAR) |
wlan0 |
10.255.254.86 | Office WiFi (Jetson ↔ Workstation) |
| Service | Port | Protocol |
|---|---|---|
| Marcus WebSocket | 8765 | ws:// |
| ZMQ velocity (→ Holosoma) | 5556 | tcp:// (PUB/SUB) |
| Ollama API | 11434 | HTTP (localhost only) |
| G1 audio multicast (mic) | 5555 | UDP multicast 239.168.123.161 |
| Livox Mid-360 (LiDAR) | 192.168.123.120 | UDP (Livox SDK) |
Most values configurable in Config/config_Network.json and config_Voice.json::mic_udp.
Troubleshooting
| Issue | Cause | Fix |
|---|---|---|
Banner shows SANAD AI BRAIN — READY but nothing moves |
Holosoma not running | Start Holosoma (Terminal 1) first |
RuntimeError: CUDA not available on boot |
Wrong torch build on Jetson | See Doc/environment.md section 9.2 — reinstall the NVIDIA Jetson torch wheel |
llama runner process has terminated: %!w(<nil>) |
Ollama compute graph OOM | Already capped at num_batch=128 / num_ctx=2048. Check free -h; kill stale Ollama runners: pkill -f "ollama runner" |
Traceback mentioning multiprocessing/spawn.py + ZMQ port 5556 |
Old import-time ZMQ bind regressed | Pull latest API/zmq_api.py — must call init_zmq() from the parent only |
[Camera] No frame for 10s during warmup |
Ollama blocking the main thread, or USB bandwidth | Warmup is ~10–15 s on first Qwen load; subsequent commands are fast |
| Wake word never fires | Gemini transcribed but _has_wake_word rejected |
Check logs/transcript.log — if HEARD ... shows what Gemini heard but no CMD ... follows, the transcript has a misheard "Sanad" variant; add the root form to config_Voice.json::stt.wake_words. |
| Voice silent on boot | Missing Gemini API key | Check logs/voice.log for No Gemini API key found. Set export MARCUS_GEMINI_API_KEY='...' before launching run_marcus.py. |
google-genai not installed in runner stderr |
Package missing in gemini_sdk env | Activate the gemini_sdk conda env and pip install google-genai THERE (not in marcus). |
no Python 3.10+ env found for the Gemini runner |
gemini_sdk env in non-default path | Set export MARCUS_GEMINI_PYTHON=/path/to/gemini_sdk/bin/python or edit stt.gemini_python_path. |
| Mic silent | G1 audio service not publishing | Run python3 Voice/builtin_mic.py standalone — must print "OK — mic is capturing audio" |
[LiDAR] No data yet (will keep trying) |
SLAM worker still spawning (normal) or Livox network | First ~5 s normal. If persists, ping 192.168.123.120 |
| Client can't connect | Wrong IP or server not running | Verify ollama serve & and python3 -m Server.marcus_server are both up |
File Locations
| What | Path |
|---|---|
| Brain code | ~/Marcus/Brain/ |
| Server | ~/Marcus/Server/marcus_server.py |
| Voice | ~/Marcus/Voice/{audio_io,builtin_mic,builtin_tts,gemini_script,turn_recorder,marcus_voice}.py |
| Config | ~/Marcus/Config/ |
| Prompts | ~/Marcus/Config/marcus_prompts.yaml |
| YOLO model | ~/Marcus/Models/yolov8m.pt |
| Session data | ~/Marcus/Data/Brain/Sessions/ |
| Places | ~/Marcus/Data/History/Places/places.json |
| Logs | ~/Marcus/logs/ |
See Doc/architecture.md for full project structure and file-by-file documentation.
See Doc/environment.md for the verified Jetson software stack.
See Doc/pipeline.md for the end-to-end data flow.
See Doc/functions.md for the full function inventory (AST-generated).
Language policy
English only. Arabic was removed from the codebase on 2026-04-21:
Config/config_Voice.json::stt.wake_words— English fuzzy variants only (33 entries), excludes common English words that would false-trigger (said,sand,sunday, etc.)Config/marcus_prompts.yaml— no Arabic examples left in any of the 7 promptsAPI/audio_api.py::speak(text)— rejects non-ASCII (the G1 TtsMaker silently maps Arabic to Chinese, which nobody wants)Brain/marcus_brain.py— greeting and talk-pattern regexes match English only
If you need Arabic back, the cleanest paths are either Piper TTS (offline) or edge-tts (online) — see git log for the removed implementations.
Logs
All .log files in logs/ rotate at 5 MB × 3 backups by default. To change:
export MARCUS_LOG_MAX_BYTES=10000000 # 10 MB per file
export MARCUS_LOG_BACKUP_COUNT=5 # keep 5 rotations
export MARCUS_LOG_DIR=/var/log/marcus # move logs off SD card
Per-module log files:
brain.log,camera.log,lidar.log,zmq.log,server.log,main.log— viaCore.logger.log()voice.log— via stdliblogginginaudio_api.py+marcus_voice.py- Session JSON:
Data/Brain/Sessions/session_NNN_YYYY-MM-DD/{commands,detections,alerts,places}.json