Marcus/Doc/environment.md

# Marcus — Environment & Version Reference

**Project**: Marcus | YS Lootah Technology
**Hardware**: Unitree G1 EDU Humanoid (29 DOF) + Jetson Orin NX 16 GB
**Deployment host**: `unitree@192.168.123.164` (hostname `ubuntu`)
**Conda env**: `marcus`
**Captured**: 2026-04-12

This document is the canonical record of the verified GPU-accelerated software stack running on the Jetson Orin NX. It covers system software, Python environment, Marcus runtime dependencies, installation recipe, verification commands, and known quirks. Pair it with `architecture.md` (what the code does) and `controlling.md` (how to drive it).

---

## 1. Hardware

| Item | Value |
|---|---|
| Robot | Unitree G1 EDU humanoid, 29 DoF |
| Compute | Jetson Orin NX 16 GB (integrated Ampere GPU, 8.7 capability, tensor cores) |
| Camera | Intel RealSense D435 (424x240 @ 15 fps, BGR8) |
| LiDAR | (optional) loaded via `API/lidar_api.py` + `Lidar/SLAM_worker.py` |
| Network | `eth0` 192.168.123.164 (Holosoma + Marcus), `wlan0` 10.255.254.86 |

---

## 2. System software (Jetson)

| Layer | Version | Source of truth |
|---|---|---|
| Kernel | `Linux 5.10.104-tegra aarch64` | `uname -a` |
| OS | Ubuntu 20.04.6 LTS | `/etc/os-release` |
| L4T | R35.3.1 (2023-03-19 build, GCID 32827747) | `/etc/nv_tegra_release` |
| JetPack | **5.1.1** (derived from L4T R35.3.1) | `nvidia-l4t-core 35.3.1-20230319081403` |
| CUDA runtime | `11.4.19-1` | `dpkg -l cuda-runtime-11-4` |
| CUDA toolkit (nvcc) | `11.4.315` (Built 2022-10-23) | `nvcc --version` |
| cuDNN | `8.6.0.166-1+cuda11.4` | `dpkg -l libcudnn8` |
| CUDA install paths | `/usr/local/cuda`, `/usr/local/cuda-11`, `/usr/local/cuda-11.4` | `ls /usr/local` |

---

## 3. Python environment

| Layer | Version |
|---|---|
| Conda | `25.11.1` |
| Active env | `marcus` at `/home/unitree/miniconda3/envs/marcus` |
| Python | `3.8.20` (Jetson stock Python for JetPack 5) |
| pip | `25.0.1` (user site `~/.local/lib/python3.8/site-packages`) |
| `which python3` | `/home/unitree/miniconda3/envs/marcus/bin/python3` |

Other conda envs on the machine (not used by Marcus): `base`, `gemini`, `gmr`, `marcus_tts`, `saqr`, `teleimager`, `tv`, `twist2`, `unitree_lerobot`, plus the Holosoma-side `hsinference` under `~/.holosoma_deps/miniconda3`.

Note: Python 3.8 is EOL (Oct 2024). It is retained because JetPack 5 ships it and NVIDIA's pre-built Jetson torch wheels for this generation target cp38. Upgrading requires either JetPack 6 or a from-source torch build.

---

## 4. PyTorch stack (GPU-critical)

**The torch install must be the NVIDIA Jetson wheel, not the PyPI wheel.** PyPI `torch` for aarch64 is CPU-only; only NVIDIA's builds expose CUDA on Jetson.

| Item | Expected | Verified |
|---|---|---|
| `torch.__version__` | `2.1.0a0+41361538.nv23.06` | yes |
| `torch.cuda.is_available()` | `True` | yes |
| `torch.version.cuda` | `11.4` | yes |
| `torch.backends.cudnn.version()` | `8600` (= cuDNN 8.6.0) | yes |
| `torch.cuda.get_device_name(0)` | `Orin` | yes |
| `torch.cuda.get_device_capability(0)` | `(8, 7)` (Ampere + tensor cores) | yes |
| `torchvision.__version__` | `0.16.1` (built from source against the Jetson torch) | yes |
| `torchvision.ops.nms(...).device` | `cuda:0` | yes |

Capability 8.7 gives us FP16 tensor cores — the GPU-path FP16 kwarg in `Vision/marcus_yolo.py` is meaningful here, not placebo.

---

## 5. Ultralytics / YOLO runtime

| Item | Value |
|---|---|
| `ultralytics` | `8.4.21` |
| Weights | `Models/yolov8m.pt` (~50 MB, auto-fetched if missing) |
| `yolo checks` GPU line | `GPU: Orin, 15389MiB`, `CUDA: 11.4` |
| Marcus config device | `cuda` (hard-required — no CPU fallback) |
| Marcus config half | `true` (FP16) |
| Marcus config imgsz | `320` |
| First inference warmup | ~4–5 s (cuDNN kernel autotune) |
| **Steady-state FPS on Orin** | **~21.9 fps** at imgsz=320 FP16 |

The 21.9 fps figure is measured via the smoke test in section 11 below. It comfortably exceeds the 15 fps camera stream, so YOLO is no longer the pipeline bottleneck.

---

## 6. Ollama / vision-language model

| Item | Value |
|---|---|
| Ollama CLI client | `0.20.0` |
| Ollama server | `0.20.0` (`curl http://localhost:11434/api/version`) |
| Python `ollama` package | `0.6.1` (no `__version__` attribute — use `pip show ollama`) |
| Models installed | `qwen2.5vl:3b` (3.2 GB), `llava:7b` (4.7 GB) |
| Marcus-configured model | `qwen2.5vl:3b` (`Config/config_Brain.json`) |
| Resident VRAM when loaded | ~11 GB (includes KV cache + vision projector) |
| Processor placement | **`100% GPU`** per `ollama ps` |

Headroom note: with Qwen2.5-VL resident (~11 GB) + YOLO (~0.5 GB) + camera buffers + the ZMQ bridge, you have ~4 GB free on the 16 GB Orin NX. Comfortable but not unlimited — if image-search (which sends two images to Qwen at once) ever OOMs, enable quantized KV cache via `OLLAMA_KV_CACHE_TYPE=q8_0`.

---

## 7. Marcus runtime Python dependencies

Captured from `importlib` on 2026-04-12, `marcus` env on the Jetson.

| Module | Version | Site |
|---|---|---|
| `numpy` | 1.24.4 | user |
| `cv2` (opencv-python) | 4.13.0 | user |
| `PIL` (Pillow) | 10.4.0 | user |
| `yaml` (PyYAML) | 6.0.3 | user |
| `zmq` (pyzmq) | 27.1.0 | user |
| `websockets` | 13.1 | conda env |
| `pyrealsense2` | 2.55.1.6486 | user |
| `dotenv` | (no `__version__`) | user |
| `ollama` (python client) | 0.6.1 | user |
| `requests` | 2.32.4 | user |
| `ultralytics` | 8.4.21 | user |
| `torch` | 2.1.0a0+41361538.nv23.06 | user |
| `torchvision` | 0.16.1 (egg) | user |
| `matplotlib` | 3.7.5 | user (via ultralytics) |
| `scipy` | 1.10.1 | user (via ultralytics) |
| `psutil` | 7.2.2 | user (via ultralytics) |
| `polars` | 1.8.2 | user (via ultralytics) |
| `ultralytics-thop` | 2.0.18 | user |

"user" = `~/.local/lib/python3.8/site-packages`. Most Marcus deps live here rather than in the conda env's site-packages because of how JetPack ships system libs with `--user` installs.

---

## 8. Marcus project modules — import status

All 16 project modules import cleanly from the `marcus` env at `/home/unitree/Marcus`:

```
OK   Core.config_loader
OK   Core.env_loader
OK   Vision.marcus_yolo
OK   Vision.marcus_imgsearch
OK   API.llava_api
OK   API.yolo_api
OK   API.camera_api
OK   API.zmq_api
OK   API.imgsearch_api
OK   API.odometry_api
OK   API.memory_api
OK   API.arm_api
OK   Navigation.goal_nav
OK   Navigation.patrol
OK   Navigation.marcus_odometry
OK   Brain.marcus_brain
OK   Brain.marcus_memory
OK   Autonomous.marcus_autonomous
```

---

## 9. Installation recipe (reproducing this environment)

Run these steps on a fresh `marcus` conda env, in order. They reproduce the exact stack above.

### 9.1 Fix the Jetson clock if needed

Jetsons have no RTC battery; after power cycles the clock may reset to 1970 and break TLS (cert validation fails with "issued certificate not yet valid"). `systemd-timesyncd` may refuse NTP — fall back to manual date:

```bash
sudo systemctl restart systemd-timesyncd
sudo timedatectl set-ntp true                      # may fail on Jetson
# fallback:
sudo date -s "YYYY-MM-DD HH:MM:SS"
sudo hwclock --systohc
```

### 9.2 Remove CPU-only torch and install the NVIDIA Jetson wheel

```bash
conda activate marcus
pip uninstall -y torch torchvision torchaudio

cd ~
wget https://developer.download.nvidia.com/compute/redist/jp/v512/pytorch/torch-2.1.0a0+41361538.nv23.06-cp38-cp38-linux_aarch64.whl
# (JP 5.1.1 uses the same nv23.06 wheel; v511 returns 404 on this build)

pip install ~/torch-2.1.0a0+41361538.nv23.06-cp38-cp38-linux_aarch64.whl
```

Verify before proceeding:

```bash
python3 -c "import torch; print(torch.__version__, torch.cuda.is_available(), torch.cuda.get_device_name(0))"
# Expect: 2.1.0a0+41361538.nv23.06 True Orin
```

### 9.3 Build torchvision 0.16.1 from source (matches torch 2.1.0)

```bash
sudo apt install -y libjpeg-dev zlib1g-dev libpython3-dev \
                    libavcodec-dev libavformat-dev libswscale-dev

cd ~
git clone --branch v0.16.1 https://github.com/pytorch/vision torchvision
cd torchvision
export BUILD_VERSION=0.16.1
python3 setup.py install --user
```

Build takes ~15–25 min on Orin NX. Verify:

```bash
python3 -c "
import torch, torchvision
x = torch.rand(5, 4).cuda(); s = torch.rand(5).cuda()
print(torchvision.__version__, torchvision.ops.nms(x, s, 0.5).device)
"
# Expect: 0.16.1 cuda:0
```

### 9.4 Ollama server + model

```bash
ollama serve > /tmp/ollama.log 2>&1 &
sleep 3
ollama list                          # confirm qwen2.5vl:3b present
ollama pull qwen2.5vl:3b             # if missing (~3 GB)
ollama run qwen2.5vl:3b "hi"         # warm model into VRAM
ollama ps                            # PROCESSOR must say "100% GPU"
```

### 9.5 Other deps

Already present via `pip install --user` from earlier setup — see section 7 for versions. No action needed unless reinstalling from scratch.

---

## 10. Marcus launch sequence

Full terminal-mode bring-up:

```bash
# Terminal 1 — Holosoma locomotion policy (runs in hsinference env, not marcus)
source ~/.holosoma_deps/miniconda3/bin/activate hsinference
cd ~/holosoma
~/.holosoma_deps/miniconda3/envs/hsinference/bin/python3 \
  src/holosoma_inference/holosoma_inference/run_policy.py \
  inference:g1-29dof-loco \
  --task.model-path src/holosoma_inference/holosoma_inference/models/loco/g1_29dof/fastsac_g1_29dof.onnx \
  --task.velocity-input zmq --task.state-input zmq --task.interface eth0

# Terminal 2 — Ollama server (leave running)
ollama serve > /tmp/ollama.log 2>&1 &

# Terminal 3 — Marcus brain
conda activate marcus
cd ~/Marcus
python3 run_marcus.py
```

Expected Marcus YOLO init line:

```
[YOLO] Model loaded ✅ | device: cuda (Orin) | FP16 | 19 tracked classes
```

If instead you get `RuntimeError: [YOLO] CUDA not available — torch.cuda.is_available() == False`, the torch install is wrong or was overwritten by a `pip install torch` somewhere — redo section 9.2.

---

## 11. Verification commands (copy/paste)

Full-stack version check:

```bash
cat /etc/nv_tegra_release
dpkg -l | grep -E "nvidia-l4t-core|cuda-runtime|libcudnn8" | awk '{print $2, $3}'
nvcc --version | tail -n2
python3 --version
python3 -c "import torch, torchvision; print('torch', torch.__version__, '| cuda', torch.cuda.is_available(), '| cudnn', torch.backends.cudnn.version(), '| gpu', torch.cuda.get_device_name(0), '| tv', torchvision.__version__)"
python3 -c "import ultralytics; print('ultralytics', ultralytics.__version__)"
ollama --version
curl -s http://localhost:11434/api/version
ollama list
ollama ps
```

YOLO warmup + steady-state FPS (the gold-standard GPU smoke test):

```bash
cd ~/Marcus
python3 - <<'EOF'
import sys, os, time, threading
sys.path.insert(0, os.getcwd())
os.environ.setdefault("PROJECT_BASE", "/home/unitree")
os.environ.setdefault("PROJECT_NAME", "Marcus")
import numpy as np
import Vision.marcus_yolo as my
from Vision.marcus_yolo import (
    start_yolo, yolo_fps, yolo_is_running, _resolve_device, YOLO_DEVICE
)

dev, half = _resolve_device(YOLO_DEVICE)
print(f"[resolve] device={dev!r} half={half}")

raw, lock = [None], threading.Lock()
assert start_yolo(raw_frame_ref=raw, frame_lock=lock)
raw[0] = np.random.randint(0, 255, (240, 424, 3), dtype=np.uint8)

for i in range(15):
    time.sleep(1)
    print(f"  t={i+1:2d}s  fps={yolo_fps():.1f}")

time.sleep(5)
print(f"[final] fps={yolo_fps():.1f}")
my._yolo_running[0] = False
time.sleep(0.3)
EOF
```

GPU live telemetry while Marcus runs:

```bash
tegrastats --interval 500 | grep -oE "GR3D_FREQ [0-9]+%"
```

`nvidia-smi` is absent on Jetson — `tegrastats` is the equivalent.

---

## 12. Known quirks

1. **No RTC battery** — clock resets to 1970 on every full power cycle. Fix before any `wget`/`pip install` that hits HTTPS. See 9.1.
2. **`ollama` python lib has no `__version__`** — use `pip show ollama` instead of `ollama.__version__`.
3. **`nvidia-smi` not available** — normal on Jetson. Use `tegrastats` and `torch.cuda.*` APIs.
4. **Ollama server "could not connect" warning** on first `ollama list`/`ollama ps` just means the server isn't running yet. Start it with `ollama serve &` before Marcus.
5. **YOLO first inference ~4–5 s** — cuDNN kernel autotune + FP16 conversion on cold start. The first user command after `python3 run_marcus.py` will feel slow; subsequent commands are steady-state. A YOLO warmup pass in `init_brain()` would hide this — open item.
6. **Holosoma and Marcus share ZMQ port 5556** — `run_marcus.py` (terminal) and `Server/marcus_server.py` (websocket) cannot run simultaneously. Pick one.
7. **NVIDIA torch wheel is at `/jp/v512/`** on developer.download.nvidia.com even though this host is JetPack 5.1.1. The `nv23.06` wheel is shared across JP 5.1.x (same CUDA 11.4 + cuDNN 8.6 runtime). `/jp/v511/pytorch/` 404s — use `v512`.
8. **PyPI torch is CPU-only on aarch64** — any `pip install torch` with no wheel argument will silently replace the NVIDIA build with a CPU wheel and break Marcus startup (Marcus is now hard-configured to refuse CPU). If that happens, redo 9.2.

---

## 13. GPU-only policy (enforced in code)

As of 2026-04-12, `Vision/marcus_yolo.py::_resolve_device` raises `RuntimeError` instead of falling back to CPU when any of:

- `Config/config_Vision.json` has `yolo_device: "cpu"`
- `torch` is not installed
- `torch.cuda.is_available()` returns False

`API/yolo_api.py::init_yolo` was also updated to **propagate** that `RuntimeError` (previously it caught `Exception` and silently disabled YOLO, leaving Marcus running blind). The brain crashes at `init_brain()` with a clear message if the GPU is unreachable — preferred over silent degradation on a safety-sensitive robot.

Config file (`Config/config_Vision.json`):

```json
{
  "yolo_model_path": "Models/yolov8m.pt",
  "yolo_confidence": 0.45,
  "yolo_iou": 0.45,
  "yolo_device": "cuda",
  "yolo_half": true,
  "yolo_img_size": 320,
  "tracked_classes": [ ... ],
  "ppe_violation_classes": [ "no-helmet", "no_helmet", "no-vest", "no_vest" ]
}
```

---

## 14. Change log

| Date | Change |
|---|---|
| 2026-04-12 | Initial environment.md — full stack captured, GPU bring-up verified end to end. Steady-state YOLOv8m FPS on Orin NX measured at 21.9. Ollama Qwen2.5-VL verified at 100% GPU. |
| 2026-04-12 | `Vision/marcus_yolo.py` rewired to load `config_Vision.json`, added `_resolve_device()` with hard-fail on missing CUDA (GPU-only policy). `API/yolo_api.py` updated to propagate `RuntimeError`. `Config/config_Vision.json` set `yolo_device=cuda`, `yolo_half=true`. |
| 2026-04-12 | Installed NVIDIA Jetson torch `2.1.0a0+41361538.nv23.06` (replacing CPU-only PyPI `2.4.1`) + built torchvision `0.16.1` from source against it. Verified `nms device = cuda:0`. |