Marcus/Doc/controlling.md

287 lines
10 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Marcus — Control & Startup Guide
**Robot persona:** Sanad (wake word + self-intro; project code lives under `Marcus/`)
**Updated**: 2026-04-21
---
## Quick Start
### Prerequisites (Jetson Orin NX, JetPack 5.1.1)
```bash
# Terminal 1 — Start Holosoma (locomotion policy, in hsinference env)
source ~/.holosoma_deps/miniconda3/bin/activate hsinference
cd ~/holosoma
~/.holosoma_deps/miniconda3/envs/hsinference/bin/python3 \
src/holosoma_inference/holosoma_inference/run_policy.py \
inference:g1-29dof-loco \
--task.model-path src/holosoma_inference/holosoma_inference/models/loco/g1_29dof/fastsac_g1_29dof.onnx \
--task.velocity-input zmq \
--task.state-input zmq \
--task.interface eth0
# Terminal 2 — Ollama server (leave running)
ollama serve > /tmp/ollama.log 2>&1 &
sleep 3
ollama list # confirm qwen2.5vl:3b present
```
### Option A — Terminal Mode (on Jetson)
```bash
# Terminal 3 — Start Marcus Brain
conda activate marcus
cd ~/Marcus
python3 run_marcus.py
```
Direct keyboard control + voice input (say **"Sanad"** to wake). Expected banner on boot:
```
================================================
SANAD AI BRAIN — READY
================================================
model : qwen2.5vl:3b
yolo : True
odometry : True
memory : True
lidar : True
voice : True
camera : 424x240@15
```
### Option B — Server + Client (remote)
```bash
# Terminal 3 (Jetson) — Start Server
conda activate marcus
cd ~/Marcus
python3 -m Server.marcus_server
# Terminal 4 (Workstation) — Connect Client
cd ~/Robotics_workspace/yslootahtech/Project/Marcus
python3 -m Client.marcus_cli
```
Client prompts for connection:
```
Connection options:
1) eth0 — 192.168.123.164:8765
2) wlan0 — 10.255.254.86:8765
3) custom
Choose [1/2/3] or IP:
```
Or skip prompt: `python3 -m Client.marcus_cli --ip 192.168.123.164 --port 8765`
---
## Voice
- **Wake word:** "Sanad" (variants "sannad", "sanat", "sunnat" — see `config_Voice.json::stt.wake_words_en`)
- **Mic:** G1 on-board array mic, captured via UDP multicast `239.168.123.161:5555` (16 kHz mono, 16-bit PCM). No USB mic needed.
- **STT:** Whisper `tiny` (wake detection) + Whisper `small` (command transcription) — both run locally.
- **TTS:** Unitree `client.TtsMaker()` → G1 body speaker. English only.
- **Barge-in:** say something while Marcus is speaking and the mic buffer flushes on the next command.
Interaction flow: say "Sanad" → hear *"Listening"* → speak your command → see transcript on console → Marcus answers through the speaker.
To disable voice entirely, set `subsystems.voice: false` in `config_Brain.json` — Marcus will boot text-only ~2 s faster.
---
## Command Reference
### Movement
| Command | Action |
|---------|--------|
| `turn left` / `turn right` | Rotate (2s default) |
| `walk forward` / `move back` | Walk (2s default) |
| `walk 1 meter` | Precise odometry walk |
| `walk backward 2 meters` | Precise backward walk |
| `turn right 90 degrees` | Precise odometry turn |
| `turn right then walk forward` | Multi-step compound |
| `come to me` / `come here` | Forward 2s (instant, no AI) |
| `stop` | Gradual stop |
### Vision
| Command | Action |
|---------|--------|
| `what do you see` | Qwen2.5-VL describes camera view |
| `describe the room` | Qwen2.5-VL scene description |
| `is anyone here` | Qwen2.5-VL person check |
| `yolo` | Show YOLO detection status |
### Goal Navigation
| Command | Action |
|---------|--------|
| `goal/ stop when you see a person` | YOLO fast search + stop |
| `goal/ find a laptop` | YOLO + Qwen-VL search |
| `goal/ stop when you see a guy holding a phone` | YOLO + Qwen-VL compound verification |
| `find a person` | Auto-detected as goal (no prefix needed) |
| `look for a bottle` | Auto-detected as goal |
### Place Memory
| Command | Action |
|---------|--------|
| `remember this as door` | Save current position |
| `go to door` | Navigate to saved place |
| `places` | List all saved places |
| `forget door` | Delete place |
| `rename door to entrance` | Rename place |
| `where am I` | Show odometry position |
| `go home` | Return to start position |
### Patrol
| Command | Action |
|---------|--------|
| `patrol` | Autonomous patrol (prompts for duration) |
| `patrol: door → desk → exit` | Named waypoint patrol |
### Image Search (requires `subsystems.imgsearch: true`)
| Command | Action |
|---------|--------|
| `search/ /path/to/photo.jpg` | Find target from reference image |
| `search/ /path/to/photo.jpg person in blue shirt` | Image + hint |
| `search/ person in blue shirt` | Text-only search |
### Session Memory
| Command | Action |
|---------|--------|
| `last command` | Show last typed command |
| `do that again` | Repeat last command |
| `undo` | Reverse last movement |
| `last session` | Previous session summary |
| `session summary` | Current session stats |
### Autonomous Mode
| Command | Action |
|---------|--------|
| `auto on` | Start autonomous exploration |
| `auto off` | Stop |
| `auto status` | Current step / observations |
| `auto save` | Snapshot observations to disk |
### System
| Command | Action |
|---------|--------|
| `help` | Command reference |
| `example` | Usage examples |
| `lidar` / `lidar status` | SLAM engine pose + health |
| `q` / `quit` | Shutdown |
### Client-Only Commands (CLI)
| Command | Action |
|---------|--------|
| `status` | Ping server + LiDAR status |
| `camera` | Get camera configuration |
| `profile low/medium/high/full` | Switch camera profile |
| `capture` | Take a photo |
---
## Subsystem flags (`Config/config_Brain.json`)
Control what initializes at boot. Defaults:
```jsonc
"subsystems": {
"lidar": true,
"voice": true,
"imgsearch": false,
"autonomous": true
}
```
Set any to `false` to skip that subsystem's init. Boot time drops roughly:
- `voice: false` → ~2 s faster (no Whisper model load)
- `lidar: false` → ~1 s faster (no SLAM subprocess spawn)
- `imgsearch: false` → already the default; re-enable only when you need `search/ …`
- `autonomous: false` → minor, but removes the AutonomousMode init
---
## Network Configuration
| Interface | IP | Use |
|-----------|-----|------|
| `eth0` | 192.168.123.164 | Robot internal network (Jetson ↔ G1 ↔ LiDAR) |
| `wlan0` | 10.255.254.86 | Office WiFi (Jetson ↔ Workstation) |
| Service | Port | Protocol |
|---------|------|----------|
| Marcus WebSocket | 8765 | ws:// |
| ZMQ velocity (→ Holosoma) | 5556 | tcp:// (PUB/SUB) |
| Ollama API | 11434 | HTTP (localhost only) |
| G1 audio multicast (mic) | 5555 | UDP multicast 239.168.123.161 |
| Livox Mid-360 (LiDAR) | 192.168.123.120 | UDP (Livox SDK) |
Most values configurable in `Config/config_Network.json` and `config_Voice.json::mic_udp`.
---
## Troubleshooting
| Issue | Cause | Fix |
|-------|-------|-----|
| Banner shows `SANAD AI BRAIN — READY` but nothing moves | Holosoma not running | Start Holosoma (Terminal 1) first |
| `RuntimeError: CUDA not available` on boot | Wrong torch build on Jetson | See `Doc/environment.md` section 9.2 — reinstall the NVIDIA Jetson torch wheel |
| `llama runner process has terminated: %!w(<nil>)` | Ollama compute graph OOM | Already capped at `num_batch=128 / num_ctx=2048`. Check `free -h`; kill stale Ollama runners: `pkill -f "ollama runner"` |
| Traceback mentioning `multiprocessing/spawn.py` + ZMQ port 5556 | Old import-time ZMQ bind regressed | Pull latest `API/zmq_api.py` — must call `init_zmq()` from the parent only |
| `[Camera] No frame for 10s` during warmup | Ollama blocking the main thread, or USB bandwidth | Warmup is ~1015 s on first Qwen load; subsequent commands are fast |
| Wake word never fires | Whisper hearing something else | Check `logs/voice.log` — if it transcribes as "sunnat"/"sannat", add your variant to `config_Voice.json::stt.wake_words_en` |
| Mic silent | G1 audio service not publishing | Run `python3 Voice/builtin_mic.py` standalone — must print "OK — mic is capturing audio" |
| `[LiDAR] No data yet (will keep trying)` | SLAM worker still spawning (normal) or Livox network | First ~5 s normal. If persists, `ping 192.168.123.120` |
| Client can't connect | Wrong IP or server not running | Verify `ollama serve &` and `python3 -m Server.marcus_server` are both up |
---
## File Locations
| What | Path |
|------|------|
| Brain code | `~/Marcus/Brain/` |
| Server | `~/Marcus/Server/marcus_server.py` |
| Voice | `~/Marcus/Voice/{builtin_mic,builtin_tts,marcus_voice}.py` |
| Config | `~/Marcus/Config/` |
| Prompts | `~/Marcus/Config/marcus_prompts.yaml` |
| YOLO model | `~/Marcus/Models/yolov8m.pt` |
| Session data | `~/Marcus/Data/Brain/Sessions/` |
| Places | `~/Marcus/Data/History/Places/places.json` |
| Logs | `~/Marcus/logs/` |
See `Doc/architecture.md` for full project structure and file-by-file documentation.
See `Doc/environment.md` for the verified Jetson software stack.
See `Doc/pipeline.md` for the end-to-end data flow.
See `Doc/functions.md` for the full function inventory (AST-generated).
---
## Language policy
**English only.** Arabic was removed from the codebase on 2026-04-21:
- `Config/config_Voice.json::stt.wake_words_en` — only English variants (`sanad`, `sannad`, `sanat`, `sunnat`)
- `Config/marcus_prompts.yaml` — no Arabic examples left in any of the 7 prompts
- `API/audio_api.py::speak(text)` — rejects non-ASCII (the G1 TtsMaker silently maps Arabic to Chinese, which nobody wants)
- `Brain/marcus_brain.py` — greeting and talk-pattern regexes match English only
If you need Arabic back, the cleanest paths are either Piper TTS (offline) or edge-tts (online) — see `git log` for the removed implementations.
---
## Logs
All `.log` files in `logs/` rotate at **5 MB × 3 backups** by default. To change:
```bash
export MARCUS_LOG_MAX_BYTES=10000000 # 10 MB per file
export MARCUS_LOG_BACKUP_COUNT=5 # keep 5 rotations
export MARCUS_LOG_DIR=/var/log/marcus # move logs off SD card
```
Per-module log files:
- `brain.log`, `camera.log`, `lidar.log`, `zmq.log`, `server.log`, `main.log` — via `Core.logger.log()`
- `voice.log` — via stdlib `logging` in `audio_api.py` + `marcus_voice.py`
- Session JSON: `Data/Brain/Sessions/session_NNN_YYYY-MM-DD/{commands,detections,alerts,places}.json`