Marcus/Doc/controlling.md

# Marcus — Control & Startup Guide

**Robot persona:** Sanad (wake word + self-intro; project code lives under `Marcus/`)
**Updated**: 2026-04-21

---

## Quick Start

### Prerequisites (Jetson Orin NX, JetPack 5.1.1)

```bash
# Terminal 1 — Start Holosoma (locomotion policy, in hsinference env)
source ~/.holosoma_deps/miniconda3/bin/activate hsinference
cd ~/holosoma
~/.holosoma_deps/miniconda3/envs/hsinference/bin/python3 \
  src/holosoma_inference/holosoma_inference/run_policy.py \
  inference:g1-29dof-loco \
  --task.model-path src/holosoma_inference/holosoma_inference/models/loco/g1_29dof/fastsac_g1_29dof.onnx \
  --task.velocity-input zmq \
  --task.state-input zmq \
  --task.interface eth0

# Terminal 2 — Ollama server (leave running)
ollama serve > /tmp/ollama.log 2>&1 &
sleep 3
ollama list                # confirm qwen2.5vl:3b present
```

### Option A — Terminal Mode (on Jetson)

```bash
# Terminal 3 — Start Marcus Brain
conda activate marcus
cd ~/Marcus
python3 run_marcus.py
```

Direct keyboard control + voice input (say **"Sanad"** to wake). Expected banner on boot:

```
================================================
         SANAD AI BRAIN — READY
================================================
  model     : qwen2.5vl:3b
  yolo      : True
  odometry  : True
  memory    : True
  lidar     : True
  voice     : True
  camera    : 424x240@15
```

### Option B — Server + Client (remote)

```bash
# Terminal 3 (Jetson) — Start Server
conda activate marcus
cd ~/Marcus
python3 -m Server.marcus_server

# Terminal 4 (Workstation) — Connect Client
cd ~/Robotics_workspace/yslootahtech/Project/Marcus
python3 -m Client.marcus_cli
```

Client prompts for connection:
```
  Connection options:
    1) eth0  — 192.168.123.164:8765
    2) wlan0 — 10.255.254.86:8765
    3) custom
  Choose [1/2/3] or IP:
```

Or skip prompt: `python3 -m Client.marcus_cli --ip 192.168.123.164 --port 8765`

---

## Voice

- **Wake word:** "Sanad" (Whisper mishears it as "Stop", "Sand", "Set", "Send" — all accepted via the /s-/ phonetic rule; see `config_Voice.json::stt.wake_words` for the 33 fuzzy variants).
- **Mic:** G1 on-board array mic, captured via UDP multicast `239.168.123.161:5555` (16 kHz mono, 16-bit PCM). No USB mic needed.
- **Wake detection:** custom energy-envelope state machine (pure numpy, no ML) — fires on any 0.35-1.5 s speech burst followed by silence. Adaptive to room ambient.
- **Wake verify:** lightweight Whisper decode on the triggering burst. Accepts if it contains a wake-word variant OR starts with `s`/`sh`/`z` (Whisper's consistent signature for "Sanad"). Rejects pure noise / non-s speech silently.
- **STT (command):** faster-whisper `base.en` int8 on CPU — loads ~1.5 s on first wake, cached after.
- **TTS:** Unitree `client.TtsMaker()` → G1 body speaker. English only.
- **Barge-in:** the mic is muted during TTS playback, then flushed on return to listening.

Interaction flow: say "Sanad" → hear *"Yes"* → speak your command → see transcript on console → Marcus answers through the speaker.

Three voice modes selectable via `config_Voice.json::stt.mode`:
- `wake_and_command` (default) — wake word required before each command
- `always_on` — continuously transcribe + dispatch every utterance
- `always_on_gated` — always listen + log, dispatch only if utterance contains "Sanad"

To disable voice entirely, set `subsystems.voice: false` in `config_Brain.json` — Marcus will boot text-only ~2 s faster.

**Tuning knobs** (when false wakes or rejected real wakes) — all in `config_Voice.json::stt`:
- Too many false wakes from coughs/claps → raise `speech_threshold` or `min_word_duration`
- Real "Sanad" being rejected → check the log line `wake REJECTED — %r` to see what Whisper heard; widen `wake_words` if needed
- Commands transcribed wrong → check `whisper: lp=%.2f nsp=%.2f text=%r` log line; lower `whisper_no_speech_threshold` or tighten `whisper_log_prob_threshold`
- "I didn't catch that" on silence → raise `min_transcription_length`
- Latency too high → set `wake_ack: "none"` (skip "Yes" TTS, save ~1.7 s/cycle)

---

## Command Reference

### Movement
| Command | Action |
|---------|--------|
| `turn left` / `turn right` | Rotate (2s default) |
| `walk forward` / `move back` | Walk (2s default) |
| `walk 1 meter` | Precise odometry walk |
| `walk backward 2 meters` | Precise backward walk |
| `turn right 90 degrees` | Precise odometry turn |
| `turn right then walk forward` | Multi-step compound |
| `come to me` / `come here` | Forward 2s (instant, no AI) |
| `stop` | Gradual stop |

### Vision
| Command | Action |
|---------|--------|
| `what do you see` | Qwen2.5-VL describes camera view |
| `describe the room` | Qwen2.5-VL scene description |
| `is anyone here` | Qwen2.5-VL person check |
| `yolo` | Show YOLO detection status |

### Goal Navigation
| Command | Action |
|---------|--------|
| `goal/ stop when you see a person` | YOLO fast search + stop |
| `goal/ find a laptop` | YOLO + Qwen-VL search |
| `goal/ stop when you see a guy holding a phone` | YOLO + Qwen-VL compound verification |
| `find a person` | Auto-detected as goal (no prefix needed) |
| `look for a bottle` | Auto-detected as goal |

### Place Memory
| Command | Action |
|---------|--------|
| `remember this as door` | Save current position |
| `go to door` | Navigate to saved place |
| `places` | List all saved places |
| `forget door` | Delete place |
| `rename door to entrance` | Rename place |
| `where am I` | Show odometry position |
| `go home` | Return to start position |

### Patrol
| Command | Action |
|---------|--------|
| `patrol` | Autonomous patrol (prompts for duration) |
| `patrol: door → desk → exit` | Named waypoint patrol |

### Image Search (requires `subsystems.imgsearch: true`)
| Command | Action |
|---------|--------|
| `search/ /path/to/photo.jpg` | Find target from reference image |
| `search/ /path/to/photo.jpg person in blue shirt` | Image + hint |
| `search/ person in blue shirt` | Text-only search |

### Session Memory
| Command | Action |
|---------|--------|
| `last command` | Show last typed command |
| `do that again` | Repeat last command |
| `undo` | Reverse last movement |
| `last session` | Previous session summary |
| `session summary` | Current session stats |

### Autonomous Mode
| Command | Action |
|---------|--------|
| `auto on` | Start autonomous exploration |
| `auto off` | Stop |
| `auto status` | Current step / observations |
| `auto save` | Snapshot observations to disk |

### System
| Command | Action |
|---------|--------|
| `help` | Command reference |
| `example` | Usage examples |
| `lidar` / `lidar status` | SLAM engine pose + health |
| `q` / `quit` | Shutdown |

### Client-Only Commands (CLI)
| Command | Action |
|---------|--------|
| `status` | Ping server + LiDAR status |
| `camera` | Get camera configuration |
| `profile low/medium/high/full` | Switch camera profile |
| `capture` | Take a photo |

---

## Subsystem flags (`Config/config_Brain.json`)

Control what initializes at boot. Defaults:

```jsonc
"subsystems": {
  "lidar":      true,
  "voice":      true,
  "imgsearch":  false,
  "autonomous": true
}
```

Set any to `false` to skip that subsystem's init. Boot time drops roughly:
- `voice: false`   → ~2 s faster (no Whisper model load)
- `lidar: false`   → ~1 s faster (no SLAM subprocess spawn)
- `imgsearch: false` → already the default; re-enable only when you need `search/ …`
- `autonomous: false` → minor, but removes the AutonomousMode init

---

## Network Configuration

| Interface | IP | Use |
|-----------|-----|------|
| `eth0` | 192.168.123.164 | Robot internal network (Jetson ↔ G1 ↔ LiDAR) |
| `wlan0` | 10.255.254.86 | Office WiFi (Jetson ↔ Workstation) |

| Service | Port | Protocol |
|---------|------|----------|
| Marcus WebSocket | 8765 | ws:// |
| ZMQ velocity (→ Holosoma) | 5556 | tcp:// (PUB/SUB) |
| Ollama API | 11434 | HTTP (localhost only) |
| G1 audio multicast (mic) | 5555 | UDP multicast 239.168.123.161 |
| Livox Mid-360 (LiDAR) | 192.168.123.120 | UDP (Livox SDK) |

Most values configurable in `Config/config_Network.json` and `config_Voice.json::mic_udp`.

---

## Troubleshooting

| Issue | Cause | Fix |
|-------|-------|-----|
| Banner shows `SANAD AI BRAIN — READY` but nothing moves | Holosoma not running | Start Holosoma (Terminal 1) first |
| `RuntimeError: CUDA not available` on boot | Wrong torch build on Jetson | See `Doc/environment.md` section 9.2 — reinstall the NVIDIA Jetson torch wheel |
| `llama runner process has terminated: %!w(<nil>)` | Ollama compute graph OOM | Already capped at `num_batch=128 / num_ctx=2048`. Check `free -h`; kill stale Ollama runners: `pkill -f "ollama runner"` |
| Traceback mentioning `multiprocessing/spawn.py` + ZMQ port 5556 | Old import-time ZMQ bind regressed | Pull latest `API/zmq_api.py` — must call `init_zmq()` from the parent only |
| `[Camera] No frame for 10s` during warmup | Ollama blocking the main thread, or USB bandwidth | Warmup is ~10–15 s on first Qwen load; subsequent commands are fast |
| Wake word never fires | Energy burst below floor, or Whisper verify rejecting | Check `logs/voice.log` — if you see `wake REJECTED — 'X'`, add X's root variant to `config_Voice.json::stt.wake_words`. If `baseline=0` persists, your ambient exceeds the floor — raise `speech_threshold`. |
| Mic silent | G1 audio service not publishing | Run `python3 Voice/builtin_mic.py` standalone — must print "OK — mic is capturing audio" |
| `[LiDAR] No data yet (will keep trying)` | SLAM worker still spawning (normal) or Livox network | First ~5 s normal. If persists, `ping 192.168.123.120` |
| Client can't connect | Wrong IP or server not running | Verify `ollama serve &` and `python3 -m Server.marcus_server` are both up |

---

## File Locations

| What | Path |
|------|------|
| Brain code | `~/Marcus/Brain/` |
| Server | `~/Marcus/Server/marcus_server.py` |
| Voice | `~/Marcus/Voice/{builtin_mic,builtin_tts,wake_detector,marcus_voice}.py` |
| Config | `~/Marcus/Config/` |
| Prompts | `~/Marcus/Config/marcus_prompts.yaml` |
| YOLO model | `~/Marcus/Models/yolov8m.pt` |
| Session data | `~/Marcus/Data/Brain/Sessions/` |
| Places | `~/Marcus/Data/History/Places/places.json` |
| Logs | `~/Marcus/logs/` |

See `Doc/architecture.md` for full project structure and file-by-file documentation.
See `Doc/environment.md` for the verified Jetson software stack.
See `Doc/pipeline.md` for the end-to-end data flow.
See `Doc/functions.md` for the full function inventory (AST-generated).

---

## Language policy

**English only.** Arabic was removed from the codebase on 2026-04-21:
- `Config/config_Voice.json::stt.wake_words` — English fuzzy variants only (33 entries), excludes common English words that would false-trigger (`said`, `sand`, `sunday`, etc.)
- `Config/marcus_prompts.yaml` — no Arabic examples left in any of the 7 prompts
- `API/audio_api.py::speak(text)` — rejects non-ASCII (the G1 TtsMaker silently maps Arabic to Chinese, which nobody wants)
- `Brain/marcus_brain.py` — greeting and talk-pattern regexes match English only

If you need Arabic back, the cleanest paths are either Piper TTS (offline) or edge-tts (online) — see `git log` for the removed implementations.

---

## Logs

All `.log` files in `logs/` rotate at **5 MB × 3 backups** by default. To change:

```bash
export MARCUS_LOG_MAX_BYTES=10000000     # 10 MB per file
export MARCUS_LOG_BACKUP_COUNT=5          # keep 5 rotations
export MARCUS_LOG_DIR=/var/log/marcus     # move logs off SD card
```

Per-module log files:
- `brain.log`, `camera.log`, `lidar.log`, `zmq.log`, `server.log`, `main.log` — via `Core.logger.log()`
- `voice.log` — via stdlib `logging` in `audio_api.py` + `marcus_voice.py`
- Session JSON: `Data/Brain/Sessions/session_NNN_YYYY-MM-DD/{commands,detections,alerts,places}.json`