Marcus/Doc/controlling.md

8.9 KiB
Raw Blame History

Marcus — Control & Startup Guide

Robot persona: Sanad (wake word + self-intro; project code lives under Marcus/) Updated: 2026-04-21


Quick Start

Prerequisites (Jetson Orin NX, JetPack 5.1.1)

# Terminal 1 — Start Holosoma (locomotion policy, in hsinference env)
source ~/.holosoma_deps/miniconda3/bin/activate hsinference
cd ~/holosoma
~/.holosoma_deps/miniconda3/envs/hsinference/bin/python3 \
  src/holosoma_inference/holosoma_inference/run_policy.py \
  inference:g1-29dof-loco \
  --task.model-path src/holosoma_inference/holosoma_inference/models/loco/g1_29dof/fastsac_g1_29dof.onnx \
  --task.velocity-input zmq \
  --task.state-input zmq \
  --task.interface eth0

# Terminal 2 — Ollama server (leave running)
ollama serve > /tmp/ollama.log 2>&1 &
sleep 3
ollama list                # confirm qwen2.5vl:3b present

Option A — Terminal Mode (on Jetson)

# Terminal 3 — Start Marcus Brain
conda activate marcus
cd ~/Marcus
python3 run_marcus.py

Direct keyboard control + voice input (say "Sanad" to wake). Expected banner on boot:

================================================
         SANAD AI BRAIN — READY
================================================
  model     : qwen2.5vl:3b
  yolo      : True
  odometry  : True
  memory    : True
  lidar     : True
  voice     : True
  camera    : 424x240@15

Option B — Server + Client (remote)

# Terminal 3 (Jetson) — Start Server
conda activate marcus
cd ~/Marcus
python3 -m Server.marcus_server

# Terminal 4 (Workstation) — Connect Client
cd ~/Robotics_workspace/yslootahtech/Project/Marcus
python3 -m Client.marcus_cli

Client prompts for connection:

  Connection options:
    1) eth0  — 192.168.123.164:8765
    2) wlan0 — 10.255.254.86:8765
    3) custom
  Choose [1/2/3] or IP:

Or skip prompt: python3 -m Client.marcus_cli --ip 192.168.123.164 --port 8765


Voice

  • Wake word: "Sanad" (variants "sannad", "sanat", "sunnat" — see config_Voice.json::stt.wake_words_en)
  • Mic: G1 on-board array mic, captured via UDP multicast 239.168.123.161:5555 (16 kHz mono, 16-bit PCM). No USB mic needed.
  • STT: Whisper tiny (wake detection) + Whisper small (command transcription) — both run locally.
  • TTS: Unitree client.TtsMaker() → G1 body speaker. English only.
  • Barge-in: say something while Marcus is speaking and the mic buffer flushes on the next command.

Interaction flow: say "Sanad" → hear "Listening" → speak your command → see transcript on console → Marcus answers through the speaker.

To disable voice entirely, set subsystems.voice: false in config_Brain.json — Marcus will boot text-only ~2 s faster.


Command Reference

Movement

Command Action
turn left / turn right Rotate (2s default)
walk forward / move back Walk (2s default)
walk 1 meter Precise odometry walk
walk backward 2 meters Precise backward walk
turn right 90 degrees Precise odometry turn
turn right then walk forward Multi-step compound
come to me / come here Forward 2s (instant, no AI)
stop Gradual stop

Vision

Command Action
what do you see Qwen2.5-VL describes camera view
describe the room Qwen2.5-VL scene description
is anyone here Qwen2.5-VL person check
yolo Show YOLO detection status

Goal Navigation

Command Action
goal/ stop when you see a person YOLO fast search + stop
goal/ find a laptop YOLO + Qwen-VL search
goal/ stop when you see a guy holding a phone YOLO + Qwen-VL compound verification
find a person Auto-detected as goal (no prefix needed)
look for a bottle Auto-detected as goal

Place Memory

Command Action
remember this as door Save current position
go to door Navigate to saved place
places List all saved places
forget door Delete place
rename door to entrance Rename place
where am I Show odometry position
go home Return to start position

Patrol

Command Action
patrol Autonomous patrol (prompts for duration)
patrol: door → desk → exit Named waypoint patrol

Image Search (requires subsystems.imgsearch: true)

Command Action
search/ /path/to/photo.jpg Find target from reference image
search/ /path/to/photo.jpg person in blue shirt Image + hint
search/ person in blue shirt Text-only search

Session Memory

Command Action
last command Show last typed command
do that again Repeat last command
undo Reverse last movement
last session Previous session summary
session summary Current session stats

Autonomous Mode

Command Action
auto on Start autonomous exploration
auto off Stop
auto status Current step / observations
auto save Snapshot observations to disk

System

Command Action
help Command reference
example Usage examples
lidar / lidar status SLAM engine pose + health
q / quit Shutdown

Client-Only Commands (CLI)

Command Action
status Ping server + LiDAR status
camera Get camera configuration
profile low/medium/high/full Switch camera profile
capture Take a photo

Subsystem flags (Config/config_Brain.json)

Control what initializes at boot. Defaults:

"subsystems": {
  "lidar":      true,
  "voice":      true,
  "imgsearch":  false,
  "autonomous": true
}

Set any to false to skip that subsystem's init. Boot time drops roughly:

  • voice: false → ~2 s faster (no Whisper model load)
  • lidar: false → ~1 s faster (no SLAM subprocess spawn)
  • imgsearch: false → already the default; re-enable only when you need search/ …
  • autonomous: false → minor, but removes the AutonomousMode init

Network Configuration

Interface IP Use
eth0 192.168.123.164 Robot internal network (Jetson ↔ G1 ↔ LiDAR)
wlan0 10.255.254.86 Office WiFi (Jetson ↔ Workstation)
Service Port Protocol
Marcus WebSocket 8765 ws://
ZMQ velocity (→ Holosoma) 5556 tcp:// (PUB/SUB)
Ollama API 11434 HTTP (localhost only)
G1 audio multicast (mic) 5555 UDP multicast 239.168.123.161
Livox Mid-360 (LiDAR) 192.168.123.120 UDP (Livox SDK)

Most values configurable in Config/config_Network.json and config_Voice.json::mic_udp.


Troubleshooting

Issue Cause Fix
Banner shows SANAD AI BRAIN — READY but nothing moves Holosoma not running Start Holosoma (Terminal 1) first
RuntimeError: CUDA not available on boot Wrong torch build on Jetson See Doc/environment.md section 9.2 — reinstall the NVIDIA Jetson torch wheel
llama runner process has terminated: %!w(<nil>) Ollama compute graph OOM Already capped at num_batch=128 / num_ctx=2048. Check free -h; kill stale Ollama runners: pkill -f "ollama runner"
Traceback mentioning multiprocessing/spawn.py + ZMQ port 5556 Old import-time ZMQ bind regressed Pull latest API/zmq_api.py — must call init_zmq() from the parent only
[Camera] No frame for 10s during warmup Ollama blocking the main thread, or USB bandwidth Warmup is ~1015 s on first Qwen load; subsequent commands are fast
Wake word never fires Whisper hearing something else Check logs/voice.log — if it transcribes as "sunnat"/"sannat", add your variant to config_Voice.json::stt.wake_words_en
Mic silent G1 audio service not publishing Run python3 Voice/builtin_mic.py standalone — must print "OK — mic is capturing audio"
[LiDAR] No data yet (will keep trying) SLAM worker still spawning (normal) or Livox network First ~5 s normal. If persists, ping 192.168.123.120
Client can't connect Wrong IP or server not running Verify ollama serve & and python3 -m Server.marcus_server are both up

File Locations

What Path
Brain code ~/Marcus/Brain/
Server ~/Marcus/Server/marcus_server.py
Voice ~/Marcus/Voice/{builtin_mic,builtin_tts,marcus_voice}.py
Config ~/Marcus/Config/
Prompts ~/Marcus/Config/marcus_prompts.yaml
YOLO model ~/Marcus/Models/yolov8m.pt
Session data ~/Marcus/Data/Brain/Sessions/
Places ~/Marcus/Data/History/Places/places.json
Logs ~/Marcus/logs/

See Doc/architecture.md for full project structure and file-by-file documentation. See Doc/environment.md for the verified Jetson software stack. See Doc/pipeline.md for the end-to-end data flow.