257 lines
8.9 KiB
Markdown
257 lines
8.9 KiB
Markdown
# Marcus — Control & Startup Guide
|
||
|
||
**Robot persona:** Sanad (wake word + self-intro; project code lives under `Marcus/`)
|
||
**Updated**: 2026-04-21
|
||
|
||
---
|
||
|
||
## Quick Start
|
||
|
||
### Prerequisites (Jetson Orin NX, JetPack 5.1.1)
|
||
|
||
```bash
|
||
# Terminal 1 — Start Holosoma (locomotion policy, in hsinference env)
|
||
source ~/.holosoma_deps/miniconda3/bin/activate hsinference
|
||
cd ~/holosoma
|
||
~/.holosoma_deps/miniconda3/envs/hsinference/bin/python3 \
|
||
src/holosoma_inference/holosoma_inference/run_policy.py \
|
||
inference:g1-29dof-loco \
|
||
--task.model-path src/holosoma_inference/holosoma_inference/models/loco/g1_29dof/fastsac_g1_29dof.onnx \
|
||
--task.velocity-input zmq \
|
||
--task.state-input zmq \
|
||
--task.interface eth0
|
||
|
||
# Terminal 2 — Ollama server (leave running)
|
||
ollama serve > /tmp/ollama.log 2>&1 &
|
||
sleep 3
|
||
ollama list # confirm qwen2.5vl:3b present
|
||
```
|
||
|
||
### Option A — Terminal Mode (on Jetson)
|
||
|
||
```bash
|
||
# Terminal 3 — Start Marcus Brain
|
||
conda activate marcus
|
||
cd ~/Marcus
|
||
python3 run_marcus.py
|
||
```
|
||
|
||
Direct keyboard control + voice input (say **"Sanad"** to wake). Expected banner on boot:
|
||
|
||
```
|
||
================================================
|
||
SANAD AI BRAIN — READY
|
||
================================================
|
||
model : qwen2.5vl:3b
|
||
yolo : True
|
||
odometry : True
|
||
memory : True
|
||
lidar : True
|
||
voice : True
|
||
camera : 424x240@15
|
||
```
|
||
|
||
### Option B — Server + Client (remote)
|
||
|
||
```bash
|
||
# Terminal 3 (Jetson) — Start Server
|
||
conda activate marcus
|
||
cd ~/Marcus
|
||
python3 -m Server.marcus_server
|
||
|
||
# Terminal 4 (Workstation) — Connect Client
|
||
cd ~/Robotics_workspace/yslootahtech/Project/Marcus
|
||
python3 -m Client.marcus_cli
|
||
```
|
||
|
||
Client prompts for connection:
|
||
```
|
||
Connection options:
|
||
1) eth0 — 192.168.123.164:8765
|
||
2) wlan0 — 10.255.254.86:8765
|
||
3) custom
|
||
Choose [1/2/3] or IP:
|
||
```
|
||
|
||
Or skip prompt: `python3 -m Client.marcus_cli --ip 192.168.123.164 --port 8765`
|
||
|
||
---
|
||
|
||
## Voice
|
||
|
||
- **Wake word:** "Sanad" (variants "sannad", "sanat", "sunnat" — see `config_Voice.json::stt.wake_words_en`)
|
||
- **Mic:** G1 on-board array mic, captured via UDP multicast `239.168.123.161:5555` (16 kHz mono, 16-bit PCM). No USB mic needed.
|
||
- **STT:** Whisper `tiny` (wake detection) + Whisper `small` (command transcription) — both run locally.
|
||
- **TTS:** Unitree `client.TtsMaker()` → G1 body speaker. English only.
|
||
- **Barge-in:** say something while Marcus is speaking and the mic buffer flushes on the next command.
|
||
|
||
Interaction flow: say "Sanad" → hear *"Listening"* → speak your command → see transcript on console → Marcus answers through the speaker.
|
||
|
||
To disable voice entirely, set `subsystems.voice: false` in `config_Brain.json` — Marcus will boot text-only ~2 s faster.
|
||
|
||
---
|
||
|
||
## Command Reference
|
||
|
||
### Movement
|
||
| Command | Action |
|
||
|---------|--------|
|
||
| `turn left` / `turn right` | Rotate (2s default) |
|
||
| `walk forward` / `move back` | Walk (2s default) |
|
||
| `walk 1 meter` | Precise odometry walk |
|
||
| `walk backward 2 meters` | Precise backward walk |
|
||
| `turn right 90 degrees` | Precise odometry turn |
|
||
| `turn right then walk forward` | Multi-step compound |
|
||
| `come to me` / `come here` | Forward 2s (instant, no AI) |
|
||
| `stop` | Gradual stop |
|
||
|
||
### Vision
|
||
| Command | Action |
|
||
|---------|--------|
|
||
| `what do you see` | Qwen2.5-VL describes camera view |
|
||
| `describe the room` | Qwen2.5-VL scene description |
|
||
| `is anyone here` | Qwen2.5-VL person check |
|
||
| `yolo` | Show YOLO detection status |
|
||
|
||
### Goal Navigation
|
||
| Command | Action |
|
||
|---------|--------|
|
||
| `goal/ stop when you see a person` | YOLO fast search + stop |
|
||
| `goal/ find a laptop` | YOLO + Qwen-VL search |
|
||
| `goal/ stop when you see a guy holding a phone` | YOLO + Qwen-VL compound verification |
|
||
| `find a person` | Auto-detected as goal (no prefix needed) |
|
||
| `look for a bottle` | Auto-detected as goal |
|
||
|
||
### Place Memory
|
||
| Command | Action |
|
||
|---------|--------|
|
||
| `remember this as door` | Save current position |
|
||
| `go to door` | Navigate to saved place |
|
||
| `places` | List all saved places |
|
||
| `forget door` | Delete place |
|
||
| `rename door to entrance` | Rename place |
|
||
| `where am I` | Show odometry position |
|
||
| `go home` | Return to start position |
|
||
|
||
### Patrol
|
||
| Command | Action |
|
||
|---------|--------|
|
||
| `patrol` | Autonomous patrol (prompts for duration) |
|
||
| `patrol: door → desk → exit` | Named waypoint patrol |
|
||
|
||
### Image Search (requires `subsystems.imgsearch: true`)
|
||
| Command | Action |
|
||
|---------|--------|
|
||
| `search/ /path/to/photo.jpg` | Find target from reference image |
|
||
| `search/ /path/to/photo.jpg person in blue shirt` | Image + hint |
|
||
| `search/ person in blue shirt` | Text-only search |
|
||
|
||
### Session Memory
|
||
| Command | Action |
|
||
|---------|--------|
|
||
| `last command` | Show last typed command |
|
||
| `do that again` | Repeat last command |
|
||
| `undo` | Reverse last movement |
|
||
| `last session` | Previous session summary |
|
||
| `session summary` | Current session stats |
|
||
|
||
### Autonomous Mode
|
||
| Command | Action |
|
||
|---------|--------|
|
||
| `auto on` | Start autonomous exploration |
|
||
| `auto off` | Stop |
|
||
| `auto status` | Current step / observations |
|
||
| `auto save` | Snapshot observations to disk |
|
||
|
||
### System
|
||
| Command | Action |
|
||
|---------|--------|
|
||
| `help` | Command reference |
|
||
| `example` | Usage examples |
|
||
| `lidar` / `lidar status` | SLAM engine pose + health |
|
||
| `q` / `quit` | Shutdown |
|
||
|
||
### Client-Only Commands (CLI)
|
||
| Command | Action |
|
||
|---------|--------|
|
||
| `status` | Ping server + LiDAR status |
|
||
| `camera` | Get camera configuration |
|
||
| `profile low/medium/high/full` | Switch camera profile |
|
||
| `capture` | Take a photo |
|
||
|
||
---
|
||
|
||
## Subsystem flags (`Config/config_Brain.json`)
|
||
|
||
Control what initializes at boot. Defaults:
|
||
|
||
```jsonc
|
||
"subsystems": {
|
||
"lidar": true,
|
||
"voice": true,
|
||
"imgsearch": false,
|
||
"autonomous": true
|
||
}
|
||
```
|
||
|
||
Set any to `false` to skip that subsystem's init. Boot time drops roughly:
|
||
- `voice: false` → ~2 s faster (no Whisper model load)
|
||
- `lidar: false` → ~1 s faster (no SLAM subprocess spawn)
|
||
- `imgsearch: false` → already the default; re-enable only when you need `search/ …`
|
||
- `autonomous: false` → minor, but removes the AutonomousMode init
|
||
|
||
---
|
||
|
||
## Network Configuration
|
||
|
||
| Interface | IP | Use |
|
||
|-----------|-----|------|
|
||
| `eth0` | 192.168.123.164 | Robot internal network (Jetson ↔ G1 ↔ LiDAR) |
|
||
| `wlan0` | 10.255.254.86 | Office WiFi (Jetson ↔ Workstation) |
|
||
|
||
| Service | Port | Protocol |
|
||
|---------|------|----------|
|
||
| Marcus WebSocket | 8765 | ws:// |
|
||
| ZMQ velocity (→ Holosoma) | 5556 | tcp:// (PUB/SUB) |
|
||
| Ollama API | 11434 | HTTP (localhost only) |
|
||
| G1 audio multicast (mic) | 5555 | UDP multicast 239.168.123.161 |
|
||
| Livox Mid-360 (LiDAR) | 192.168.123.120 | UDP (Livox SDK) |
|
||
|
||
Most values configurable in `Config/config_Network.json` and `config_Voice.json::mic_udp`.
|
||
|
||
---
|
||
|
||
## Troubleshooting
|
||
|
||
| Issue | Cause | Fix |
|
||
|-------|-------|-----|
|
||
| Banner shows `SANAD AI BRAIN — READY` but nothing moves | Holosoma not running | Start Holosoma (Terminal 1) first |
|
||
| `RuntimeError: CUDA not available` on boot | Wrong torch build on Jetson | See `Doc/environment.md` section 9.2 — reinstall the NVIDIA Jetson torch wheel |
|
||
| `llama runner process has terminated: %!w(<nil>)` | Ollama compute graph OOM | Already capped at `num_batch=128 / num_ctx=2048`. Check `free -h`; kill stale Ollama runners: `pkill -f "ollama runner"` |
|
||
| Traceback mentioning `multiprocessing/spawn.py` + ZMQ port 5556 | Old import-time ZMQ bind regressed | Pull latest `API/zmq_api.py` — must call `init_zmq()` from the parent only |
|
||
| `[Camera] No frame for 10s` during warmup | Ollama blocking the main thread, or USB bandwidth | Warmup is ~10–15 s on first Qwen load; subsequent commands are fast |
|
||
| Wake word never fires | Whisper hearing something else | Check `logs/voice.log` — if it transcribes as "sunnat"/"sannat", add your variant to `config_Voice.json::stt.wake_words_en` |
|
||
| Mic silent | G1 audio service not publishing | Run `python3 Voice/builtin_mic.py` standalone — must print "OK — mic is capturing audio" |
|
||
| `[LiDAR] No data yet (will keep trying)` | SLAM worker still spawning (normal) or Livox network | First ~5 s normal. If persists, `ping 192.168.123.120` |
|
||
| Client can't connect | Wrong IP or server not running | Verify `ollama serve &` and `python3 -m Server.marcus_server` are both up |
|
||
|
||
---
|
||
|
||
## File Locations
|
||
|
||
| What | Path |
|
||
|------|------|
|
||
| Brain code | `~/Marcus/Brain/` |
|
||
| Server | `~/Marcus/Server/marcus_server.py` |
|
||
| Voice | `~/Marcus/Voice/{builtin_mic,builtin_tts,marcus_voice}.py` |
|
||
| Config | `~/Marcus/Config/` |
|
||
| Prompts | `~/Marcus/Config/marcus_prompts.yaml` |
|
||
| YOLO model | `~/Marcus/Models/yolov8m.pt` |
|
||
| Session data | `~/Marcus/Data/Brain/Sessions/` |
|
||
| Places | `~/Marcus/Data/History/Places/places.json` |
|
||
| Logs | `~/Marcus/logs/` |
|
||
|
||
See `Doc/architecture.md` for full project structure and file-by-file documentation.
|
||
See `Doc/environment.md` for the verified Jetson software stack.
|
||
See `Doc/pipeline.md` for the end-to-end data flow.
|