Marcus/Doc/environment.md
kassam 5d839d4f4e Voice: finalise on faster-whisper + energy wake, remove Vosk
Full-day voice-stack refactor. Experiments run and reverted:
- Gemini Live HTTP microservice (Python 3.8 env incompat, latency)
- Vosk grammar STT (English lexicon can't decode 'Sanad'; big model
  cold-load too slow on Jetson CPU)

Kept architecture:
- Voice/wake_detector.py — pure-numpy energy state machine with
  adaptive baseline, burst-audio capture for post-hoc verify.
- Voice/marcus_voice.py — orchestrator with 3 modes
  (wake_and_command / always_on / always_on_gated), hysteretic VAD,
  pre-silence trim (300 ms pre-roll), DSP pipeline (DC remove,
  80 Hz HPF, 0.97 pre-emphasis, peak-normalize), faster-whisper
  base.en int8 with beam=8 + temperature fallback [0,0.2,0.4],
  fuzzy-match canonicalisation, GARBAGE_PATTERNS + length filter,
  /s-/ phonetic wake-verify, full-turn debug WAV recording.

Config-driven vocab (zero hardcoded strings in Python):
- stt.wake_words (33 variants of 'Sanad')
- stt.command_vocab (68 canonical phrases)
- stt.garbage_patterns (17 Whisper noise outputs)
- stt.min_transcription_length, stt.command_vocab_cutoff

Command parser widened (Brain/command_parser.py):
- _RE_SIMPLE_DIR — bare direction + verb+direction combos
  ('left', 'go back', 'move forward', 'step right', ...)
- _RE_STOP_SIMPLE — bare stop/halt/wait/pause/freeze/hold
- All motion constants sourced from config_Navigation.json
  (move_map + step_duration_sec) via API/zmq_api.py; no more
  hardcoded 0.3 / 2.0 magic numbers.

API/audio_api.py — _play_pcm now uses AudioClient.PlayStream with
automatic resampling to 16 kHz (matches Sanad's proven pattern).

Removed:
- Voice/vosk_stt.py (and all Vosk references in marcus_voice.py)
- Models/vosk-model-small-en-us-0.15/ (40 MB model + zip)
- All Vosk keys from Config/config_Voice.json

Documentation synced across README, Doc/architecture.md,
Doc/pipeline.md, Doc/functions.md, Doc/controlling.md,
Doc/MARCUS_API.md, Doc/environment.md changelog.

Known limitation: faster-whisper base.en on Jetson CPU + G1
far-field mic yields ~50% command-transcription accuracy due
to model capacity and mic reverberation. Wake + ack + recording
+ trim + Whisper + fuzzy + brain + motion all verified working
end-to-end. Future improvement path (unused): close-talking USB
mic via pactl_parec, or Gemini Live via HTTP microservice.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 14:32:28 +04:00

20 KiB
Raw Blame History

Marcus — Environment & Version Reference

Project: Marcus | YS Lootah Technology Robot persona: Sanad (wake word + self-intro; codebase stays under Marcus/) Hardware: Unitree G1 EDU Humanoid (29 DOF) + Jetson Orin NX 16 GB Deployment host: unitree@192.168.123.164 (hostname ubuntu) Conda env: marcus Captured: 2026-04-12 (updated 2026-04-21)

This document is the canonical record of the verified GPU-accelerated software stack running on the Jetson Orin NX. It covers system software, Python environment, Marcus runtime dependencies, installation recipe, verification commands, and known quirks. Pair it with architecture.md (what the code does) and controlling.md (how to drive it).


1. Hardware

Item Value
Robot Unitree G1 EDU humanoid, 29 DoF
Compute Jetson Orin NX 16 GB (integrated Ampere GPU, 8.7 capability, tensor cores)
Camera Intel RealSense D435 (424x240 @ 15 fps, BGR8)
LiDAR (optional) loaded via API/lidar_api.py + Lidar/SLAM_worker.py
Network eth0 192.168.123.164 (Holosoma + Marcus), wlan0 10.255.254.86

2. System software (Jetson)

Layer Version Source of truth
Kernel Linux 5.10.104-tegra aarch64 uname -a
OS Ubuntu 20.04.6 LTS /etc/os-release
L4T R35.3.1 (2023-03-19 build, GCID 32827747) /etc/nv_tegra_release
JetPack 5.1.1 (derived from L4T R35.3.1) nvidia-l4t-core 35.3.1-20230319081403
CUDA runtime 11.4.19-1 dpkg -l cuda-runtime-11-4
CUDA toolkit (nvcc) 11.4.315 (Built 2022-10-23) nvcc --version
cuDNN 8.6.0.166-1+cuda11.4 dpkg -l libcudnn8
CUDA install paths /usr/local/cuda, /usr/local/cuda-11, /usr/local/cuda-11.4 ls /usr/local

3. Python environment

Layer Version
Conda 25.11.1
Active env marcus at /home/unitree/miniconda3/envs/marcus
Python 3.8.20 (Jetson stock Python for JetPack 5)
pip 25.0.1 (user site ~/.local/lib/python3.8/site-packages)
which python3 /home/unitree/miniconda3/envs/marcus/bin/python3

Other conda envs on the machine (not used by Marcus): base, gemini, gmr, marcus_tts, saqr, teleimager, tv, twist2, unitree_lerobot, plus the Holosoma-side hsinference under ~/.holosoma_deps/miniconda3.

Note: Python 3.8 is EOL (Oct 2024). It is retained because JetPack 5 ships it and NVIDIA's pre-built Jetson torch wheels for this generation target cp38. Upgrading requires either JetPack 6 or a from-source torch build.


4. PyTorch stack (GPU-critical)

The torch install must be the NVIDIA Jetson wheel, not the PyPI wheel. PyPI torch for aarch64 is CPU-only; only NVIDIA's builds expose CUDA on Jetson.

Item Expected Verified
torch.__version__ 2.1.0a0+41361538.nv23.06 yes
torch.cuda.is_available() True yes
torch.version.cuda 11.4 yes
torch.backends.cudnn.version() 8600 (= cuDNN 8.6.0) yes
torch.cuda.get_device_name(0) Orin yes
torch.cuda.get_device_capability(0) (8, 7) (Ampere + tensor cores) yes
torchvision.__version__ 0.16.1 (built from source against the Jetson torch) yes
torchvision.ops.nms(...).device cuda:0 yes

Capability 8.7 gives us FP16 tensor cores — the GPU-path FP16 kwarg in Vision/marcus_yolo.py is meaningful here, not placebo.


5. Ultralytics / YOLO runtime

Item Value
ultralytics 8.4.21
Weights Models/yolov8m.pt (~50 MB, auto-fetched if missing)
yolo checks GPU line GPU: Orin, 15389MiB, CUDA: 11.4
Marcus config device cuda (hard-required — no CPU fallback)
Marcus config half true (FP16)
Marcus config imgsz 320
First inference warmup ~45 s (cuDNN kernel autotune)
Steady-state FPS on Orin ~21.9 fps at imgsz=320 FP16

The 21.9 fps figure is measured via the smoke test in section 11 below. It comfortably exceeds the 15 fps camera stream, so YOLO is no longer the pipeline bottleneck.


6. Ollama / vision-language model

Item Value
Ollama CLI client 0.20.0
Ollama server 0.20.0 (curl http://localhost:11434/api/version)
Python ollama package 0.6.1 (no __version__ attribute — use pip show ollama)
Models installed qwen2.5vl:3b (3.2 GB), llava:7b (4.7 GB)
Marcus-configured model qwen2.5vl:3b (Config/config_Brain.json)
Resident VRAM when loaded ~11 GB (includes KV cache + vision projector)
Processor placement 100% GPU per ollama ps

Headroom note: with Qwen2.5-VL resident (~11 GB) + YOLO (~0.5 GB) + camera buffers + the ZMQ bridge, you have ~4 GB free on the 16 GB Orin NX. Comfortable but not unlimited — if image-search (which sends two images to Qwen at once) ever OOMs, enable quantized KV cache via OLLAMA_KV_CACHE_TYPE=q8_0.


7. Marcus runtime Python dependencies

Captured from importlib on 2026-04-12, marcus env on the Jetson.

Module Version Site
numpy 1.24.4 user
cv2 (opencv-python) 4.13.0 user
PIL (Pillow) 10.4.0 user
yaml (PyYAML) 6.0.3 user
zmq (pyzmq) 27.1.0 user
websockets 13.1 conda env
pyrealsense2 2.55.1.6486 user
dotenv (no __version__) user
ollama (python client) 0.6.1 user
requests 2.32.4 user
ultralytics 8.4.21 user
torch 2.1.0a0+41361538.nv23.06 user
torchvision 0.16.1 (egg) user
matplotlib 3.7.5 user (via ultralytics)
scipy 1.10.1 user (via ultralytics)
psutil 7.2.2 user (via ultralytics)
polars 1.8.2 user (via ultralytics)
ultralytics-thop 2.0.18 user

"user" = ~/.local/lib/python3.8/site-packages. Most Marcus deps live here rather than in the conda env's site-packages because of how JetPack ships system libs with --user installs.


8. Marcus project modules — import status

All 25 project modules import cleanly from the marcus env at /home/unitree/Marcus:

OK   Core.config_loader      Core.env_loader
OK   Core.log_backend        Core.logger
OK   Voice.builtin_mic       Voice.builtin_tts       Voice.marcus_voice
OK   Vision.marcus_yolo      Vision.marcus_imgsearch
OK   API.llava_api           API.yolo_api            API.camera_api
OK   API.zmq_api             API.imgsearch_api       API.odometry_api
OK   API.memory_api          API.arm_api             API.audio_api
OK   Navigation.goal_nav     Navigation.patrol       Navigation.marcus_odometry
OK   Brain.marcus_brain      Brain.marcus_memory     Brain.command_parser
OK   Autonomous.marcus_autonomous

Notable removals: Voice/marcus_gemini_voice.py deleted on 2026-04-21. Core/Logger.py renamed to Core/log_backend.py.


9. Installation recipe (reproducing this environment)

Run these steps on a fresh marcus conda env, in order. They reproduce the exact stack above.

9.1 Fix the Jetson clock if needed

Jetsons have no RTC battery; after power cycles the clock may reset to 1970 and break TLS (cert validation fails with "issued certificate not yet valid"). systemd-timesyncd may refuse NTP — fall back to manual date:

sudo systemctl restart systemd-timesyncd
sudo timedatectl set-ntp true                      # may fail on Jetson
# fallback:
sudo date -s "YYYY-MM-DD HH:MM:SS"
sudo hwclock --systohc

9.2 Remove CPU-only torch and install the NVIDIA Jetson wheel

conda activate marcus
pip uninstall -y torch torchvision torchaudio

cd ~
wget https://developer.download.nvidia.com/compute/redist/jp/v512/pytorch/torch-2.1.0a0+41361538.nv23.06-cp38-cp38-linux_aarch64.whl
# (JP 5.1.1 uses the same nv23.06 wheel; v511 returns 404 on this build)

pip install ~/torch-2.1.0a0+41361538.nv23.06-cp38-cp38-linux_aarch64.whl

Verify before proceeding:

python3 -c "import torch; print(torch.__version__, torch.cuda.is_available(), torch.cuda.get_device_name(0))"
# Expect: 2.1.0a0+41361538.nv23.06 True Orin

9.3 Build torchvision 0.16.1 from source (matches torch 2.1.0)

sudo apt install -y libjpeg-dev zlib1g-dev libpython3-dev \
                    libavcodec-dev libavformat-dev libswscale-dev

cd ~
git clone --branch v0.16.1 https://github.com/pytorch/vision torchvision
cd torchvision
export BUILD_VERSION=0.16.1
python3 setup.py install --user

Build takes ~1525 min on Orin NX. Verify:

python3 -c "
import torch, torchvision
x = torch.rand(5, 4).cuda(); s = torch.rand(5).cuda()
print(torchvision.__version__, torchvision.ops.nms(x, s, 0.5).device)
"
# Expect: 0.16.1 cuda:0

9.4 Ollama server + model

ollama serve > /tmp/ollama.log 2>&1 &
sleep 3
ollama list                          # confirm qwen2.5vl:3b present
ollama pull qwen2.5vl:3b             # if missing (~3 GB)
ollama run qwen2.5vl:3b "hi"         # warm model into VRAM
ollama ps                            # PROCESSOR must say "100% GPU"

9.5 Other deps

Already present via pip install --user from earlier setup — see section 7 for versions. No action needed unless reinstalling from scratch.


10. Marcus launch sequence

Full terminal-mode bring-up:

# Terminal 1 — Holosoma locomotion policy (runs in hsinference env, not marcus)
source ~/.holosoma_deps/miniconda3/bin/activate hsinference
cd ~/holosoma
~/.holosoma_deps/miniconda3/envs/hsinference/bin/python3 \
  src/holosoma_inference/holosoma_inference/run_policy.py \
  inference:g1-29dof-loco \
  --task.model-path src/holosoma_inference/holosoma_inference/models/loco/g1_29dof/fastsac_g1_29dof.onnx \
  --task.velocity-input zmq --task.state-input zmq --task.interface eth0

# Terminal 2 — Ollama server (leave running)
ollama serve  & sleep 3

# Terminal 3 — Marcus brain
conda activate marcus
cd ~/Marcus
python3 run_marcus.py

Expected Marcus YOLO init line:

[YOLO] Model loaded ✅ | device: cuda (Orin) | FP16 | 19 tracked classes

If instead you get RuntimeError: [YOLO] CUDA not available — torch.cuda.is_available() == False, the torch install is wrong or was overwritten by a pip install torch somewhere — redo section 9.2.


11. Verification commands (copy/paste)

Full-stack version check:

cat /etc/nv_tegra_release
dpkg -l | grep -E "nvidia-l4t-core|cuda-runtime|libcudnn8" | awk '{print $2, $3}'
nvcc --version | tail -n2
python3 --version
python3 -c "import torch, torchvision; print('torch', torch.__version__, '| cuda', torch.cuda.is_available(), '| cudnn', torch.backends.cudnn.version(), '| gpu', torch.cuda.get_device_name(0), '| tv', torchvision.__version__)"
python3 -c "import ultralytics; print('ultralytics', ultralytics.__version__)"
ollama --version
curl -s http://localhost:11434/api/version
ollama list
ollama ps

YOLO warmup + steady-state FPS (the gold-standard GPU smoke test):

cd ~/Marcus
python3 - <<'EOF'
import sys, os, time, threading
sys.path.insert(0, os.getcwd())
os.environ.setdefault("PROJECT_BASE", "/home/unitree")
os.environ.setdefault("PROJECT_NAME", "Marcus")
import numpy as np
import Vision.marcus_yolo as my
from Vision.marcus_yolo import (
    start_yolo, yolo_fps, yolo_is_running, _resolve_device, YOLO_DEVICE
)

dev, half = _resolve_device(YOLO_DEVICE)
print(f"[resolve] device={dev!r} half={half}")

raw, lock = [None], threading.Lock()
assert start_yolo(raw_frame_ref=raw, frame_lock=lock)
raw[0] = np.random.randint(0, 255, (240, 424, 3), dtype=np.uint8)

for i in range(15):
    time.sleep(1)
    print(f"  t={i+1:2d}s  fps={yolo_fps():.1f}")

time.sleep(5)
print(f"[final] fps={yolo_fps():.1f}")
my._yolo_running[0] = False
time.sleep(0.3)
EOF

GPU live telemetry while Marcus runs:

tegrastats --interval 500 | grep -oE "GR3D_FREQ [0-9]+%"

nvidia-smi is absent on Jetson — tegrastats is the equivalent.


12. Known quirks

  1. No RTC battery — clock resets to 1970 on every full power cycle. Fix before any wget/pip install that hits HTTPS. See 9.1.
  2. ollama python lib has no __version__ — use pip show ollama instead of ollama.__version__.
  3. nvidia-smi not available — normal on Jetson. Use tegrastats and torch.cuda.* APIs.
  4. Ollama server "could not connect" warning on first ollama list/ollama ps just means the server isn't running yet. Start it with ollama serve & before Marcus.
  5. YOLO first inference ~45 s — cuDNN kernel autotune + FP16 conversion on cold start. The first user command after python3 run_marcus.py will feel slow; subsequent commands are steady-state. A YOLO warmup pass in init_brain() would hide this — open item.
  6. Holosoma and Marcus share ZMQ port 5556run_marcus.py (terminal) and Server/marcus_server.py (websocket) cannot run simultaneously. Pick one.
  7. NVIDIA torch wheel is at /jp/v512/ on developer.download.nvidia.com even though this host is JetPack 5.1.1. The nv23.06 wheel is shared across JP 5.1.x (same CUDA 11.4 + cuDNN 8.6 runtime). /jp/v511/pytorch/ 404s — use v512.
  8. PyPI torch is CPU-only on aarch64 — any pip install torch with no wheel argument will silently replace the NVIDIA build with a CPU wheel and break Marcus startup (Marcus is now hard-configured to refuse CPU). If that happens, redo 9.2.

13. GPU-only policy (enforced in code)

As of 2026-04-12, Vision/marcus_yolo.py::_resolve_device raises RuntimeError instead of falling back to CPU when any of:

  • Config/config_Vision.json has yolo_device: "cpu"
  • torch is not installed
  • torch.cuda.is_available() returns False

API/yolo_api.py::init_yolo was also updated to propagate that RuntimeError (previously it caught Exception and silently disabled YOLO, leaving Marcus running blind). The brain crashes at init_brain() with a clear message if the GPU is unreachable — preferred over silent degradation on a safety-sensitive robot.

Config file (Config/config_Vision.json):

{
  "yolo_model_path": "Models/yolov8m.pt",
  "yolo_confidence": 0.45,
  "yolo_iou": 0.45,
  "yolo_device": "cuda",
  "yolo_half": true,
  "yolo_img_size": 320,
  "tracked_classes": [ ... ],
  "ppe_violation_classes": [ "no-helmet", "no_helmet", "no-vest", "no_vest" ]
}

14. Change log

Date Change
2026-04-12 Initial environment.md — full stack captured, GPU bring-up verified end to end. Steady-state YOLOv8m FPS on Orin NX measured at 21.9. Ollama Qwen2.5-VL verified at 100% GPU.
2026-04-12 Vision/marcus_yolo.py rewired to load config_Vision.json, added _resolve_device() with hard-fail on missing CUDA (GPU-only policy). API/yolo_api.py updated to propagate RuntimeError. Config/config_Vision.json set yolo_device=cuda, yolo_half=true.
2026-04-12 Installed NVIDIA Jetson torch 2.1.0a0+41361538.nv23.06 (replacing CPU-only PyPI 2.4.1) + built torchvision 0.16.1 from source against it. Verified nms device = cuda:0.
2026-04-12 Fixed llama.cpp compute-graph OOM on Jetson: added num_batch=128 + num_ctx=2048 caps in Config/config_Brain.json, propagated through API/llava_api.py and Vision/marcus_imgsearch.py. Qwen2.5-VL compute graph drops from ~7.5 GiB to ~1.8 GiB.
2026-04-21 Restructure: moved ZMQ bind out of API/zmq_api.py import time into init_zmq(); fixes LiDAR SLAM worker spawn crash. Added loud GPU-requirement banner in API/yolo_api.py. Dropped num_predict_main 200→120. Made inner-loop sleeps in goal_nav/autonomous/imgsearch conditional. Renamed Core/Logger.pyCore/log_backend.py (case-collision fix). Updated Doc/MARCUS_API.md to current state.
2026-04-21 Voice restructure: added Voice/builtin_mic.py (G1 array mic via UDP multicast 239.168.123.161:5555) and Voice/builtin_tts.py (thin AudioClient.TtsMaker wrapper). Rewired Voice/marcus_voice.py to use BuiltinMic. Refactored API/audio_api.py::speak() to use BuiltinTTS — removed ~110 lines of edge-tts + pydub + Piper plumbing. Deleted Voice/marcus_gemini_voice.py. Added subsystems.{lidar,voice,imgsearch,autonomous} gate in config_Brain.json::init_brain().
2026-04-21 Persona swap: robot identifies as Sanad. Wake words ["sanad","sannad","sanat","sunnat"], speaker.app_name="sanad", all Qwen prompts say "You are Sanad", banner reads SANAD AI BRAIN — READY, hardcoded self-intro says "I am Sanad". Project directory, class names, filenames, and PROJECT_NAME=Marcus env var unchanged.
2026-04-21 English-only sweep: stripped 5.8 KB of Arabic examples from marcus_prompts.yaml, removed Arabic talk-pattern and greeting regexes in Brain/marcus_brain.py, dropped Arabic wake words from config_Voice.json, changed user-facing prints Marcus: …Sanad: … in executor.py, marcus_brain.py, marcus_cli.py. Verified: 0 Arabic chars in live code/config.
2026-04-21 Logs hardened: Core/log_backend.py now uses RotatingFileHandler (5 MB × 3 backups, env-tunable via MARCUS_LOG_MAX_BYTES / MARCUS_LOG_BACKUP_COUNT) for all three code paths (main_handler, LogEngine, LogsMessages). API/audio_api.py + Voice/marcus_voice.py also rotate voice.log. default_logs_dir fixed: "Logs""logs" (matches actual directory; no more case-collision recreation).
2026-04-21 Dead code removed: deleted Legacy/marcus_nav.py (unused + Arabic), deleted Config/config_Memory.json (orphan — never loaded). Config count: 13 → 12 JSON files + marcus_prompts.yaml.
2026-04-21 Orphan config keys wired up (0 orphans remaining): config_ImageSearch.jsonVision/marcus_imgsearch.py (4 constants), config_Voice.mic_udp.read_timeout_secVoice/builtin_mic.py, config_Camera.{timeout_ms, stale_threshold_s, reconnect_delay_s}API/camera_api.py, config_Odometry.json (10 keys) → Navigation/marcus_odometry.py. All 156 config keys now referenced by code.
2026-04-21 Subprocess leak fix: AudioAPI._record_parec now wraps Popen in try/finally with terminate → wait(1.0) → kill fallback; orphan parec processes can no longer survive Ctrl-C. Last-resort proc.kill() catches only OSError (not bare except).
2026-04-21 Modelfile corrected: Models/Modelfile now FROM qwen2.5vl:3b (was :7b) with a header explaining it's an optional build template — runtime uses ollama pull qwen2.5vl:3b directly.
2026-04-21 Final verification: 14-dimension smoke test green — no Arabic, no dead dirs, 0 orphan keys, every FileHandler rotates, no bare except: pass, no stale Models_marcus / marcus_llava refs, 25/25 modules import.
2026-04-24 Voice finalised on faster-whisper + custom energy wake. Added Voice/wake_detector.py (pure-numpy energy state machine, adaptive noise floor, burst-audio capture for verify). Rewrote Voice/marcus_voice.py around it: three operating modes (wake_and_command / always_on / always_on_gated), hysteretic record VAD, pre-speech silence trim (300 ms pre-roll preserved), faster-whisper base.en int8 CPU decode, fuzzy-match canonicalisation against command_vocab, GARBAGE_PATTERNS + length filter for noise hallucinations, /s-/ phonetic wake verify (accepts Whisper mishearings of "Sanad" like "Stop"/"Set"/"Sand"). Tried and reverted: Gemini Live WebSocket (Python 3.8 incompatibility + latency), Vosk grammar STT (English lexicon can't decode "Sanad"; big model cold-load too slow on Jetson). All voice tunables (33 wake_words, 68 command_vocab, 17 garbage_patterns, ~25 threshold/VAD/Whisper keys) live in config_Voice.json::stt.* — zero hardcoded strings in Voice/.
2026-04-24 Command parser widened: Brain/command_parser.py now has _RE_SIMPLE_DIR (left, go back, move forward, step right, etc.) and _RE_STOP_SIMPLE (stop, halt, wait, pause, freeze) regex fast-paths — these bare-direction / bare-stop commands now skip Qwen entirely (~50 ms vs ~5 s). Motion velocities and step duration pulled from config_Navigation.json::{move_map, step_duration_sec} via API/zmq_api.py; command_parser no longer contains hardcoded 0.3 / 2.0 magic numbers.