Marcus/Doc/environment.md
2026-04-12 18:50:22 +04:00

381 lines
14 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Marcus — Environment & Version Reference
**Project**: Marcus | YS Lootah Technology
**Hardware**: Unitree G1 EDU Humanoid (29 DOF) + Jetson Orin NX 16 GB
**Deployment host**: `unitree@192.168.123.164` (hostname `ubuntu`)
**Conda env**: `marcus`
**Captured**: 2026-04-12
This document is the canonical record of the verified GPU-accelerated software stack running on the Jetson Orin NX. It covers system software, Python environment, Marcus runtime dependencies, installation recipe, verification commands, and known quirks. Pair it with `architecture.md` (what the code does) and `controlling.md` (how to drive it).
---
## 1. Hardware
| Item | Value |
|---|---|
| Robot | Unitree G1 EDU humanoid, 29 DoF |
| Compute | Jetson Orin NX 16 GB (integrated Ampere GPU, 8.7 capability, tensor cores) |
| Camera | Intel RealSense D435 (424x240 @ 15 fps, BGR8) |
| LiDAR | (optional) loaded via `API/lidar_api.py` + `Lidar/SLAM_worker.py` |
| Network | `eth0` 192.168.123.164 (Holosoma + Marcus), `wlan0` 10.255.254.86 |
---
## 2. System software (Jetson)
| Layer | Version | Source of truth |
|---|---|---|
| Kernel | `Linux 5.10.104-tegra aarch64` | `uname -a` |
| OS | Ubuntu 20.04.6 LTS | `/etc/os-release` |
| L4T | R35.3.1 (2023-03-19 build, GCID 32827747) | `/etc/nv_tegra_release` |
| JetPack | **5.1.1** (derived from L4T R35.3.1) | `nvidia-l4t-core 35.3.1-20230319081403` |
| CUDA runtime | `11.4.19-1` | `dpkg -l cuda-runtime-11-4` |
| CUDA toolkit (nvcc) | `11.4.315` (Built 2022-10-23) | `nvcc --version` |
| cuDNN | `8.6.0.166-1+cuda11.4` | `dpkg -l libcudnn8` |
| CUDA install paths | `/usr/local/cuda`, `/usr/local/cuda-11`, `/usr/local/cuda-11.4` | `ls /usr/local` |
---
## 3. Python environment
| Layer | Version |
|---|---|
| Conda | `25.11.1` |
| Active env | `marcus` at `/home/unitree/miniconda3/envs/marcus` |
| Python | `3.8.20` (Jetson stock Python for JetPack 5) |
| pip | `25.0.1` (user site `~/.local/lib/python3.8/site-packages`) |
| `which python3` | `/home/unitree/miniconda3/envs/marcus/bin/python3` |
Other conda envs on the machine (not used by Marcus): `base`, `gemini`, `gmr`, `marcus_tts`, `saqr`, `teleimager`, `tv`, `twist2`, `unitree_lerobot`, plus the Holosoma-side `hsinference` under `~/.holosoma_deps/miniconda3`.
Note: Python 3.8 is EOL (Oct 2024). It is retained because JetPack 5 ships it and NVIDIA's pre-built Jetson torch wheels for this generation target cp38. Upgrading requires either JetPack 6 or a from-source torch build.
---
## 4. PyTorch stack (GPU-critical)
**The torch install must be the NVIDIA Jetson wheel, not the PyPI wheel.** PyPI `torch` for aarch64 is CPU-only; only NVIDIA's builds expose CUDA on Jetson.
| Item | Expected | Verified |
|---|---|---|
| `torch.__version__` | `2.1.0a0+41361538.nv23.06` | yes |
| `torch.cuda.is_available()` | `True` | yes |
| `torch.version.cuda` | `11.4` | yes |
| `torch.backends.cudnn.version()` | `8600` (= cuDNN 8.6.0) | yes |
| `torch.cuda.get_device_name(0)` | `Orin` | yes |
| `torch.cuda.get_device_capability(0)` | `(8, 7)` (Ampere + tensor cores) | yes |
| `torchvision.__version__` | `0.16.1` (built from source against the Jetson torch) | yes |
| `torchvision.ops.nms(...).device` | `cuda:0` | yes |
Capability 8.7 gives us FP16 tensor cores — the GPU-path FP16 kwarg in `Vision/marcus_yolo.py` is meaningful here, not placebo.
---
## 5. Ultralytics / YOLO runtime
| Item | Value |
|---|---|
| `ultralytics` | `8.4.21` |
| Weights | `Models/yolov8m.pt` (~50 MB, auto-fetched if missing) |
| `yolo checks` GPU line | `GPU: Orin, 15389MiB`, `CUDA: 11.4` |
| Marcus config device | `cuda` (hard-required — no CPU fallback) |
| Marcus config half | `true` (FP16) |
| Marcus config imgsz | `320` |
| First inference warmup | ~45 s (cuDNN kernel autotune) |
| **Steady-state FPS on Orin** | **~21.9 fps** at imgsz=320 FP16 |
The 21.9 fps figure is measured via the smoke test in section 11 below. It comfortably exceeds the 15 fps camera stream, so YOLO is no longer the pipeline bottleneck.
---
## 6. Ollama / vision-language model
| Item | Value |
|---|---|
| Ollama CLI client | `0.20.0` |
| Ollama server | `0.20.0` (`curl http://localhost:11434/api/version`) |
| Python `ollama` package | `0.6.1` (no `__version__` attribute — use `pip show ollama`) |
| Models installed | `qwen2.5vl:3b` (3.2 GB), `llava:7b` (4.7 GB) |
| Marcus-configured model | `qwen2.5vl:3b` (`Config/config_Brain.json`) |
| Resident VRAM when loaded | ~11 GB (includes KV cache + vision projector) |
| Processor placement | **`100% GPU`** per `ollama ps` |
Headroom note: with Qwen2.5-VL resident (~11 GB) + YOLO (~0.5 GB) + camera buffers + the ZMQ bridge, you have ~4 GB free on the 16 GB Orin NX. Comfortable but not unlimited — if image-search (which sends two images to Qwen at once) ever OOMs, enable quantized KV cache via `OLLAMA_KV_CACHE_TYPE=q8_0`.
---
## 7. Marcus runtime Python dependencies
Captured from `importlib` on 2026-04-12, `marcus` env on the Jetson.
| Module | Version | Site |
|---|---|---|
| `numpy` | 1.24.4 | user |
| `cv2` (opencv-python) | 4.13.0 | user |
| `PIL` (Pillow) | 10.4.0 | user |
| `yaml` (PyYAML) | 6.0.3 | user |
| `zmq` (pyzmq) | 27.1.0 | user |
| `websockets` | 13.1 | conda env |
| `pyrealsense2` | 2.55.1.6486 | user |
| `dotenv` | (no `__version__`) | user |
| `ollama` (python client) | 0.6.1 | user |
| `requests` | 2.32.4 | user |
| `ultralytics` | 8.4.21 | user |
| `torch` | 2.1.0a0+41361538.nv23.06 | user |
| `torchvision` | 0.16.1 (egg) | user |
| `matplotlib` | 3.7.5 | user (via ultralytics) |
| `scipy` | 1.10.1 | user (via ultralytics) |
| `psutil` | 7.2.2 | user (via ultralytics) |
| `polars` | 1.8.2 | user (via ultralytics) |
| `ultralytics-thop` | 2.0.18 | user |
"user" = `~/.local/lib/python3.8/site-packages`. Most Marcus deps live here rather than in the conda env's site-packages because of how JetPack ships system libs with `--user` installs.
---
## 8. Marcus project modules — import status
All 16 project modules import cleanly from the `marcus` env at `/home/unitree/Marcus`:
```
OK Core.config_loader
OK Core.env_loader
OK Vision.marcus_yolo
OK Vision.marcus_imgsearch
OK API.llava_api
OK API.yolo_api
OK API.camera_api
OK API.zmq_api
OK API.imgsearch_api
OK API.odometry_api
OK API.memory_api
OK API.arm_api
OK Navigation.goal_nav
OK Navigation.patrol
OK Navigation.marcus_odometry
OK Brain.marcus_brain
OK Brain.marcus_memory
OK Autonomous.marcus_autonomous
```
---
## 9. Installation recipe (reproducing this environment)
Run these steps on a fresh `marcus` conda env, in order. They reproduce the exact stack above.
### 9.1 Fix the Jetson clock if needed
Jetsons have no RTC battery; after power cycles the clock may reset to 1970 and break TLS (cert validation fails with "issued certificate not yet valid"). `systemd-timesyncd` may refuse NTP — fall back to manual date:
```bash
sudo systemctl restart systemd-timesyncd
sudo timedatectl set-ntp true # may fail on Jetson
# fallback:
sudo date -s "YYYY-MM-DD HH:MM:SS"
sudo hwclock --systohc
```
### 9.2 Remove CPU-only torch and install the NVIDIA Jetson wheel
```bash
conda activate marcus
pip uninstall -y torch torchvision torchaudio
cd ~
wget https://developer.download.nvidia.com/compute/redist/jp/v512/pytorch/torch-2.1.0a0+41361538.nv23.06-cp38-cp38-linux_aarch64.whl
# (JP 5.1.1 uses the same nv23.06 wheel; v511 returns 404 on this build)
pip install ~/torch-2.1.0a0+41361538.nv23.06-cp38-cp38-linux_aarch64.whl
```
Verify before proceeding:
```bash
python3 -c "import torch; print(torch.__version__, torch.cuda.is_available(), torch.cuda.get_device_name(0))"
# Expect: 2.1.0a0+41361538.nv23.06 True Orin
```
### 9.3 Build torchvision 0.16.1 from source (matches torch 2.1.0)
```bash
sudo apt install -y libjpeg-dev zlib1g-dev libpython3-dev \
libavcodec-dev libavformat-dev libswscale-dev
cd ~
git clone --branch v0.16.1 https://github.com/pytorch/vision torchvision
cd torchvision
export BUILD_VERSION=0.16.1
python3 setup.py install --user
```
Build takes ~1525 min on Orin NX. Verify:
```bash
python3 -c "
import torch, torchvision
x = torch.rand(5, 4).cuda(); s = torch.rand(5).cuda()
print(torchvision.__version__, torchvision.ops.nms(x, s, 0.5).device)
"
# Expect: 0.16.1 cuda:0
```
### 9.4 Ollama server + model
```bash
ollama serve > /tmp/ollama.log 2>&1 &
sleep 3
ollama list # confirm qwen2.5vl:3b present
ollama pull qwen2.5vl:3b # if missing (~3 GB)
ollama run qwen2.5vl:3b "hi" # warm model into VRAM
ollama ps # PROCESSOR must say "100% GPU"
```
### 9.5 Other deps
Already present via `pip install --user` from earlier setup — see section 7 for versions. No action needed unless reinstalling from scratch.
---
## 10. Marcus launch sequence
Full terminal-mode bring-up:
```bash
# Terminal 1 — Holosoma locomotion policy (runs in hsinference env, not marcus)
source ~/.holosoma_deps/miniconda3/bin/activate hsinference
cd ~/holosoma
~/.holosoma_deps/miniconda3/envs/hsinference/bin/python3 \
src/holosoma_inference/holosoma_inference/run_policy.py \
inference:g1-29dof-loco \
--task.model-path src/holosoma_inference/holosoma_inference/models/loco/g1_29dof/fastsac_g1_29dof.onnx \
--task.velocity-input zmq --task.state-input zmq --task.interface eth0
# Terminal 2 — Ollama server (leave running)
ollama serve > /tmp/ollama.log 2>&1 &
# Terminal 3 — Marcus brain
conda activate marcus
cd ~/Marcus
python3 run_marcus.py
```
Expected Marcus YOLO init line:
```
[YOLO] Model loaded ✅ | device: cuda (Orin) | FP16 | 19 tracked classes
```
If instead you get `RuntimeError: [YOLO] CUDA not available — torch.cuda.is_available() == False`, the torch install is wrong or was overwritten by a `pip install torch` somewhere — redo section 9.2.
---
## 11. Verification commands (copy/paste)
Full-stack version check:
```bash
cat /etc/nv_tegra_release
dpkg -l | grep -E "nvidia-l4t-core|cuda-runtime|libcudnn8" | awk '{print $2, $3}'
nvcc --version | tail -n2
python3 --version
python3 -c "import torch, torchvision; print('torch', torch.__version__, '| cuda', torch.cuda.is_available(), '| cudnn', torch.backends.cudnn.version(), '| gpu', torch.cuda.get_device_name(0), '| tv', torchvision.__version__)"
python3 -c "import ultralytics; print('ultralytics', ultralytics.__version__)"
ollama --version
curl -s http://localhost:11434/api/version
ollama list
ollama ps
```
YOLO warmup + steady-state FPS (the gold-standard GPU smoke test):
```bash
cd ~/Marcus
python3 - <<'EOF'
import sys, os, time, threading
sys.path.insert(0, os.getcwd())
os.environ.setdefault("PROJECT_BASE", "/home/unitree")
os.environ.setdefault("PROJECT_NAME", "Marcus")
import numpy as np
import Vision.marcus_yolo as my
from Vision.marcus_yolo import (
start_yolo, yolo_fps, yolo_is_running, _resolve_device, YOLO_DEVICE
)
dev, half = _resolve_device(YOLO_DEVICE)
print(f"[resolve] device={dev!r} half={half}")
raw, lock = [None], threading.Lock()
assert start_yolo(raw_frame_ref=raw, frame_lock=lock)
raw[0] = np.random.randint(0, 255, (240, 424, 3), dtype=np.uint8)
for i in range(15):
time.sleep(1)
print(f" t={i+1:2d}s fps={yolo_fps():.1f}")
time.sleep(5)
print(f"[final] fps={yolo_fps():.1f}")
my._yolo_running[0] = False
time.sleep(0.3)
EOF
```
GPU live telemetry while Marcus runs:
```bash
tegrastats --interval 500 | grep -oE "GR3D_FREQ [0-9]+%"
```
`nvidia-smi` is absent on Jetson — `tegrastats` is the equivalent.
---
## 12. Known quirks
1. **No RTC battery** — clock resets to 1970 on every full power cycle. Fix before any `wget`/`pip install` that hits HTTPS. See 9.1.
2. **`ollama` python lib has no `__version__`** — use `pip show ollama` instead of `ollama.__version__`.
3. **`nvidia-smi` not available** — normal on Jetson. Use `tegrastats` and `torch.cuda.*` APIs.
4. **Ollama server "could not connect" warning** on first `ollama list`/`ollama ps` just means the server isn't running yet. Start it with `ollama serve &` before Marcus.
5. **YOLO first inference ~45 s** — cuDNN kernel autotune + FP16 conversion on cold start. The first user command after `python3 run_marcus.py` will feel slow; subsequent commands are steady-state. A YOLO warmup pass in `init_brain()` would hide this — open item.
6. **Holosoma and Marcus share ZMQ port 5556**`run_marcus.py` (terminal) and `Server/marcus_server.py` (websocket) cannot run simultaneously. Pick one.
7. **NVIDIA torch wheel is at `/jp/v512/`** on developer.download.nvidia.com even though this host is JetPack 5.1.1. The `nv23.06` wheel is shared across JP 5.1.x (same CUDA 11.4 + cuDNN 8.6 runtime). `/jp/v511/pytorch/` 404s — use `v512`.
8. **PyPI torch is CPU-only on aarch64** — any `pip install torch` with no wheel argument will silently replace the NVIDIA build with a CPU wheel and break Marcus startup (Marcus is now hard-configured to refuse CPU). If that happens, redo 9.2.
---
## 13. GPU-only policy (enforced in code)
As of 2026-04-12, `Vision/marcus_yolo.py::_resolve_device` raises `RuntimeError` instead of falling back to CPU when any of:
- `Config/config_Vision.json` has `yolo_device: "cpu"`
- `torch` is not installed
- `torch.cuda.is_available()` returns False
`API/yolo_api.py::init_yolo` was also updated to **propagate** that `RuntimeError` (previously it caught `Exception` and silently disabled YOLO, leaving Marcus running blind). The brain crashes at `init_brain()` with a clear message if the GPU is unreachable — preferred over silent degradation on a safety-sensitive robot.
Config file (`Config/config_Vision.json`):
```json
{
"yolo_model_path": "Models/yolov8m.pt",
"yolo_confidence": 0.45,
"yolo_iou": 0.45,
"yolo_device": "cuda",
"yolo_half": true,
"yolo_img_size": 320,
"tracked_classes": [ ... ],
"ppe_violation_classes": [ "no-helmet", "no_helmet", "no-vest", "no_vest" ]
}
```
---
## 14. Change log
| Date | Change |
|---|---|
| 2026-04-12 | Initial environment.md — full stack captured, GPU bring-up verified end to end. Steady-state YOLOv8m FPS on Orin NX measured at 21.9. Ollama Qwen2.5-VL verified at 100% GPU. |
| 2026-04-12 | `Vision/marcus_yolo.py` rewired to load `config_Vision.json`, added `_resolve_device()` with hard-fail on missing CUDA (GPU-only policy). `API/yolo_api.py` updated to propagate `RuntimeError`. `Config/config_Vision.json` set `yolo_device=cuda`, `yolo_half=true`. |
| 2026-04-12 | Installed NVIDIA Jetson torch `2.1.0a0+41361538.nv23.06` (replacing CPU-only PyPI `2.4.1`) + built torchvision `0.16.1` from source against it. Verified `nms device = cuda:0`. |