# Marcus — Control & Startup Guide **Robot persona:** Sanad (wake word + self-intro; project code lives under `Marcus/`) **Updated**: 2026-04-21 --- ## Quick Start ### Prerequisites (Jetson Orin NX, JetPack 5.1.1) ```bash # Terminal 1 — Start Holosoma (locomotion policy, in hsinference env) source ~/.holosoma_deps/miniconda3/bin/activate hsinference cd ~/holosoma ~/.holosoma_deps/miniconda3/envs/hsinference/bin/python3 \ src/holosoma_inference/holosoma_inference/run_policy.py \ inference:g1-29dof-loco \ --task.model-path src/holosoma_inference/holosoma_inference/models/loco/g1_29dof/fastsac_g1_29dof.onnx \ --task.velocity-input zmq \ --task.state-input zmq \ --task.interface eth0 # Terminal 2 — Ollama server (leave running) ollama serve > /tmp/ollama.log 2>&1 & sleep 3 ollama list # confirm qwen2.5vl:3b present ``` ### Option A — Terminal Mode (on Jetson) ```bash # Terminal 3 — Start Marcus Brain conda activate marcus cd ~/Marcus python3 run_marcus.py ``` Direct keyboard control + voice input (say **"Sanad"** to wake). Expected banner on boot: ``` ================================================ SANAD AI BRAIN — READY ================================================ model : qwen2.5vl:3b yolo : True odometry : True memory : True lidar : True voice : True camera : 424x240@15 ``` ### Option B — Server + Client (remote) ```bash # Terminal 3 (Jetson) — Start Server conda activate marcus cd ~/Marcus python3 -m Server.marcus_server # Terminal 4 (Workstation) — Connect Client cd ~/Robotics_workspace/yslootahtech/Project/Marcus python3 -m Client.marcus_cli ``` Client prompts for connection: ``` Connection options: 1) eth0 — 192.168.123.164:8765 2) wlan0 — 10.255.254.86:8765 3) custom Choose [1/2/3] or IP: ``` Or skip prompt: `python3 -m Client.marcus_cli --ip 192.168.123.164 --port 8765` --- ## Voice - **Wake word:** "Sanad" — gated at dispatch time on Gemini's transcript. Common mishearings ("Sannad", "Senad", "Sa nad", etc.) all accepted via the 33-entry `config_Voice.json::stt.wake_words` fuzzy list. Word-boundary match, not substring (so "standard" doesn't trigger off "sand"). - **Mic:** G1 on-board array mic, captured via UDP multicast `239.168.123.161:5555` (16 kHz mono, 16-bit PCM). No USB mic, no acoustic wake detector. - **STT:** Gemini Live (`gemini-2.5-flash-native-audio-preview-12-2025`) with `response_modalities=["TEXT"]` — Gemini does the transcription. The mic is streamed in 32 ms chunks; Gemini's server-side VAD decides turn boundaries. **The Gemini WebSocket runs in a separate Python 3.10+ subprocess** (`Voice/gemini_runner.py`) because `google-genai` doesn't support Python 3.8 (which marcus is pinned to). Marcus spawns the runner via the `gemini_sdk` conda env and reads JSON-line transcripts off its stdout. Requires `pip install google-genai` **inside the gemini_sdk env** (not the marcus env) and an API key in `MARCUS_GEMINI_API_KEY` (or `SANAD_GEMINI_API_KEY` fallback). Set `MARCUS_GEMINI_PYTHON` (or `stt.gemini_python_path`) if the gemini_sdk env lives somewhere besides `~/miniconda3/envs/gemini_sdk/`. - **TTS:** Unitree `client.TtsMaker()` → G1 body speaker. English only. Gemini does NOT speak — only Marcus's brain reply is spoken, via TtsMaker. - **Echo prevention:** `VoiceModule.flush_mic()` is called by Marcus's brain before AND after `audio_api.speak()` so TtsMaker output isn't transcribed back into Gemini as a fake user utterance. Interaction flow: speak "Sanad" + your request → Gemini transcribes (Marcus prints `USER: ...`) → wake-word gate passes → brain handles it (motion, VLM Q&A, place memory, …) → reply spoken through G1 speaker. Examples: - "Sanad, turn right" → robot turns right, brain says "Done" - "Sanad, what do you see" → Qwen2.5-VL describes the camera frame, brain speaks the description - "Sanad" alone (no payload) → no dispatch (the persona prompt tells Gemini to acknowledge silently) - "what do you see" (no "Sanad") → wake-word gate blocks, no dispatch, no reply (avoids false motion from background chatter) To disable voice entirely, set `subsystems.voice: false` in `config_Brain.json` — Marcus will boot text-only without opening the Gemini WebSocket. **Tuning knobs** — all in `config_Voice.json::stt`: - Real "Sanad" misheard by Gemini and not matching wake_words → check `logs/transcript.log` for the `HEARD` line, add the variant to `wake_words` - Commands transcribed wrong → field accuracy is mostly Gemini's job; for room-specific tuning try `gemini_vad_silence_duration_ms` (longer = more patience for hesitations) - VAD too eager / too slow → `gemini_vad_start_sensitivity` (`HIGH` / `LOW`) and `gemini_vad_end_sensitivity` (`LOW` for slow speech, `HIGH` to cut early) - Filler words triggering dispatch → expand `garbage_patterns` - Robot too talkative / too terse → edit `gemini_system_prompt` (or point `gemini_system_prompt_file` at a `.txt` for richer personas) - Session reconnects too aggressive → raise `gemini_max_consecutive_errors` - Disable per-turn WAV saves → `gemini_record_enabled: false` --- ## Command Reference ### Movement | Command | Action | |---------|--------| | `turn left` / `turn right` | Rotate (2s default) | | `walk forward` / `move back` | Walk (2s default) | | `walk 1 meter` | Precise odometry walk | | `walk backward 2 meters` | Precise backward walk | | `turn right 90 degrees` | Precise odometry turn | | `turn right then walk forward` | Multi-step compound | | `come to me` / `come here` | Forward 2s (instant, no AI) | | `stop` | Gradual stop | ### Vision | Command | Action | |---------|--------| | `what do you see` | Qwen2.5-VL describes camera view | | `describe the room` | Qwen2.5-VL scene description | | `is anyone here` | Qwen2.5-VL person check | | `yolo` | Show YOLO detection status | ### Goal Navigation | Command | Action | |---------|--------| | `goal/ stop when you see a person` | YOLO fast search + stop | | `goal/ find a laptop` | YOLO + Qwen-VL search | | `goal/ stop when you see a guy holding a phone` | YOLO + Qwen-VL compound verification | | `find a person` | Auto-detected as goal (no prefix needed) | | `look for a bottle` | Auto-detected as goal | ### Place Memory | Command | Action | |---------|--------| | `remember this as door` | Save current position | | `go to door` | Navigate to saved place | | `places` | List all saved places | | `forget door` | Delete place | | `rename door to entrance` | Rename place | | `where am I` | Show odometry position | | `go home` | Return to start position | ### Patrol | Command | Action | |---------|--------| | `patrol` | Autonomous patrol (prompts for duration) | | `patrol: door → desk → exit` | Named waypoint patrol | ### Image Search (requires `subsystems.imgsearch: true`) | Command | Action | |---------|--------| | `search/ /path/to/photo.jpg` | Find target from reference image | | `search/ /path/to/photo.jpg person in blue shirt` | Image + hint | | `search/ person in blue shirt` | Text-only search | ### Session Memory | Command | Action | |---------|--------| | `last command` | Show last typed command | | `do that again` | Repeat last command | | `undo` | Reverse last movement | | `last session` | Previous session summary | | `session summary` | Current session stats | ### Autonomous Mode | Command | Action | |---------|--------| | `auto on` | Start autonomous exploration | | `auto off` | Stop | | `auto status` | Current step / observations | | `auto save` | Snapshot observations to disk | ### System | Command | Action | |---------|--------| | `help` | Command reference | | `example` | Usage examples | | `lidar` / `lidar status` | SLAM engine pose + health | | `q` / `quit` | Shutdown | ### Client-Only Commands (CLI) | Command | Action | |---------|--------| | `status` | Ping server + LiDAR status | | `camera` | Get camera configuration | | `profile low/medium/high/full` | Switch camera profile | | `capture` | Take a photo | --- ## Subsystem flags (`Config/config_Brain.json`) Control what initializes at boot. Defaults: ```jsonc "subsystems": { "lidar": true, "voice": true, "imgsearch": false, "autonomous": true } ``` Set any to `false` to skip that subsystem's init. Boot time drops roughly: - `voice: false` → ~1 s faster (no Gemini WebSocket open, no mic thread) - `lidar: false` → ~1 s faster (no SLAM subprocess spawn) - `imgsearch: false` → already the default; re-enable only when you need `search/ …` - `autonomous: false` → minor, but removes the AutonomousMode init --- ## Network Configuration | Interface | IP | Use | |-----------|-----|------| | `eth0` | 192.168.123.164 | Robot internal network (Jetson ↔ G1 ↔ LiDAR) | | `wlan0` | 10.255.254.86 | Office WiFi (Jetson ↔ Workstation) | | Service | Port | Protocol | |---------|------|----------| | Marcus WebSocket | 8765 | ws:// | | ZMQ velocity (→ Holosoma) | 5556 | tcp:// (PUB/SUB) | | Ollama API | 11434 | HTTP (localhost only) | | G1 audio multicast (mic) | 5555 | UDP multicast 239.168.123.161 | | Livox Mid-360 (LiDAR) | 192.168.123.120 | UDP (Livox SDK) | Most values configurable in `Config/config_Network.json` and `config_Voice.json::mic_udp`. --- ## Troubleshooting | Issue | Cause | Fix | |-------|-------|-----| | Banner shows `SANAD AI BRAIN — READY` but nothing moves | Holosoma not running | Start Holosoma (Terminal 1) first | | `RuntimeError: CUDA not available` on boot | Wrong torch build on Jetson | See `Doc/environment.md` section 9.2 — reinstall the NVIDIA Jetson torch wheel | | `llama runner process has terminated: %!w()` | Ollama compute graph OOM | Already capped at `num_batch=128 / num_ctx=2048`. Check `free -h`; kill stale Ollama runners: `pkill -f "ollama runner"` | | Traceback mentioning `multiprocessing/spawn.py` + ZMQ port 5556 | Old import-time ZMQ bind regressed | Pull latest `API/zmq_api.py` — must call `init_zmq()` from the parent only | | `[Camera] No frame for 10s` during warmup | Ollama blocking the main thread, or USB bandwidth | Warmup is ~10–15 s on first Qwen load; subsequent commands are fast | | Wake word never fires | Gemini transcribed but `_has_wake_word` rejected | Check `logs/transcript.log` — if `HEARD ...` shows what Gemini heard but no `CMD ...` follows, the transcript has a misheard "Sanad" variant; add the root form to `config_Voice.json::stt.wake_words`. | | Voice silent on boot | Missing Gemini API key | Check `logs/voice.log` for `No Gemini API key found`. Set `export MARCUS_GEMINI_API_KEY='...'` before launching `run_marcus.py`. | | `google-genai not installed` in runner stderr | Package missing in gemini_sdk env | Activate the gemini_sdk conda env and `pip install google-genai` THERE (not in marcus). | | `no Python 3.10+ env found for the Gemini runner` | gemini_sdk env in non-default path | Set `export MARCUS_GEMINI_PYTHON=/path/to/gemini_sdk/bin/python` or edit `stt.gemini_python_path`. | | Mic silent | G1 audio service not publishing | Run `python3 Voice/builtin_mic.py` standalone — must print "OK — mic is capturing audio" | | `[LiDAR] No data yet (will keep trying)` | SLAM worker still spawning (normal) or Livox network | First ~5 s normal. If persists, `ping 192.168.123.120` | | Client can't connect | Wrong IP or server not running | Verify `ollama serve &` and `python3 -m Server.marcus_server` are both up | --- ## File Locations | What | Path | |------|------| | Brain code | `~/Marcus/Brain/` | | Server | `~/Marcus/Server/marcus_server.py` | | Voice | `~/Marcus/Voice/{audio_io,builtin_mic,builtin_tts,gemini_script,turn_recorder,marcus_voice}.py` | | Config | `~/Marcus/Config/` | | Prompts | `~/Marcus/Config/marcus_prompts.yaml` | | YOLO model | `~/Marcus/Models/yolov8m.pt` | | Session data | `~/Marcus/Data/Brain/Sessions/` | | Places | `~/Marcus/Data/History/Places/places.json` | | Logs | `~/Marcus/logs/` | See `Doc/architecture.md` for full project structure and file-by-file documentation. See `Doc/environment.md` for the verified Jetson software stack. See `Doc/pipeline.md` for the end-to-end data flow. See `Doc/functions.md` for the full function inventory (AST-generated). --- ## Language policy **English only.** Arabic was removed from the codebase on 2026-04-21: - `Config/config_Voice.json::stt.wake_words` — English fuzzy variants only (33 entries), excludes common English words that would false-trigger (`said`, `sand`, `sunday`, etc.) - `Config/marcus_prompts.yaml` — no Arabic examples left in any of the 7 prompts - `API/audio_api.py::speak(text)` — rejects non-ASCII (the G1 TtsMaker silently maps Arabic to Chinese, which nobody wants) - `Brain/marcus_brain.py` — greeting and talk-pattern regexes match English only If you need Arabic back, the cleanest paths are either Piper TTS (offline) or edge-tts (online) — see `git log` for the removed implementations. --- ## Logs All `.log` files in `logs/` rotate at **5 MB × 3 backups** by default. To change: ```bash export MARCUS_LOG_MAX_BYTES=10000000 # 10 MB per file export MARCUS_LOG_BACKUP_COUNT=5 # keep 5 rotations export MARCUS_LOG_DIR=/var/log/marcus # move logs off SD card ``` Per-module log files: - `brain.log`, `camera.log`, `lidar.log`, `zmq.log`, `server.log`, `main.log` — via `Core.logger.log()` - `voice.log` — via stdlib `logging` in `audio_api.py` + `marcus_voice.py` - Session JSON: `Data/Brain/Sessions/session_NNN_YYYY-MM-DD/{commands,detections,alerts,places}.json`