kassam 8491be7f1e Initial project commit

2026-04-12 18:50:22 +04:00

18 KiB

Raw Blame History

Marcus — Full API & Developer Reference

Project: Marcus | YS Lootah Technology | Jetson Orin NX + G1 EDU Scripts: ~/Models_marcus/marcus_llava.py + ~/Models_marcus/marcus_yolo.py Updated: April 4, 2026

Configuration Variables
ZMQ — Holosoma Communication
Camera Functions
YOLO Vision Module
LLaVA AI Functions
Arm SDK
Movement Functions
Prompt Engineering
Goal Navigation
Autonomous Patrol
Main Loop
JSON Schema Reference
Environment & Paths
Quick Reference Card

1. Configuration Variables

Defined at the top of marcus_llava.py. Edit here to change global behavior.

Variable	Default	Description
`ZMQ_HOST`	`"127.0.0.1"`	Holosoma ZMQ host
`ZMQ_PORT`	`5556`	Holosoma ZMQ port
`ZMQ_YOLO_PORT`	`5557`	YOLO ZMQ port (standalone mode)
`OLLAMA_MODEL`	`"llava:7b"`	LLaVA model via Ollama
`CAM_WIDTH`	`424`	Camera capture width (px)
`CAM_HEIGHT`	`240`	Camera capture height (px)
`CAM_FPS`	`15`	Camera frame rate
`CAM_QUALITY`	`70`	JPEG quality sent to LLaVA
`STOP_ITERATIONS`	`20`	gradual_stop message count
`STOP_DELAY`	`0.05`	seconds between stop messages
`STEP_PAUSE`	`0.3`	pause between consecutive action steps
`ARM_SDK_PATH`	`/home/unitree/unitree_sdk2_python`	Arm SDK path
`ARM_INTERFACE`	`"eth0"`	Network interface for arm SDK

Defined at top of marcus_yolo.py:

Variable	Default	Description
`YOLO_MODEL_PATH`	`.../Model/yolov8m.pt`	YOLO model path
`YOLO_CONFIDENCE`	`0.45`	Minimum detection confidence
`YOLO_IOU`	`0.45`	NMS IOU threshold
`YOLO_DEVICE`	`"cpu"`	Inference device ("cpu" or "cuda")
`YOLO_IMG_SIZE`	`320`	Inference image size (smaller = faster)

2. ZMQ — Holosoma Communication

Setup

ctx  = zmq.Context()
sock = ctx.socket(zmq.PUB)
sock.bind("tcp://127.0.0.1:5556")
time.sleep(0.5)

`send_vel(vx, vy, vyaw)`

Send velocity command to Holosoma.

def send_vel(vx: float = 0.0, vy: float = 0.0, vyaw: float = 0.0):
    sock.send_string(json.dumps({"vel": {"vx": vx, "vy": vy, "vyaw": vyaw}}))

Parameter	Unit	Safe range	Effect
`vx`	m/s	-0.2 to 0.4	Forward (+) / Backward (-)
`vy`	m/s	-0.2 to 0.2	Lateral
`vyaw`	rad/s	-0.3 to 0.3	Turn left (+) / right (-)

send_vel(vx=0.3)        # walk forward
send_vel(vx=-0.2)       # walk backward
send_vel(vyaw=0.3)      # turn left
send_vel(vyaw=-0.3)     # turn right
send_vel(0, 0, 0)       # zero velocity (use gradual_stop() instead)

`gradual_stop()`

Smooth deceleration to zero over ~1 second.

def gradual_stop():
    for _ in range(STOP_ITERATIONS):   # 20 iterations
        send_vel(0.0, 0.0, 0.0)
        time.sleep(STOP_DELAY)         # 0.05s each = 1s total

Always use this instead of a single zero-velocity message. ZMQ PUB/SUB can drop messages — 20 guarantees delivery.

`send_cmd(cmd)`

def send_cmd(cmd: str):
    sock.send_string(json.dumps({"cmd": cmd}))

Command	Effect
`"start"`	Activate policy
`"walk"`	Switch to walking mode
`"stand"`	Return to standing
`"stop"`	Deactivate (only after gradual_stop)

Startup sequence:

send_cmd("start"); time.sleep(0.5)
send_cmd("walk");  time.sleep(0.5)
# Now ready for velocity commands

3. Camera Functions

Architecture

Two consumers share one camera feed:

latest_frame_b64[0] — base64 JPEG for LLaVA
_raw_frame[0] — raw BGR numpy array for YOLO

Both protected by separate locks (camera_lock, _raw_lock).

`camera_loop()`

Background thread — auto-reconnects on USB drops.

def camera_loop():
    while camera_alive[0]:
        pipeline = rs.pipeline()
        cfg.enable_stream(rs.stream.color, 424, 240, rs.format.bgr8, 15)
        pipeline.start(cfg)
        while camera_alive[0]:
            frames = pipeline.wait_for_frames(timeout_ms=3000)
            frame  = np.asanyarray(...)
            with _raw_lock:
                _raw_frame[0] = frame.copy()           # → YOLO
            with camera_lock:
                latest_frame_b64[0] = encode_jpeg(frame)  # → LLaVA

`get_frame()`

Returns latest base64 JPEG for LLaVA.

def get_frame():
    with camera_lock:
        return latest_frame_b64[0]   # None if not ready

Camera specs:

Property	Value
Device	RealSense D435I (serial: 243622073459)
Capture	424×240 @ 15fps
Format	BGR8
Encoding	JPEG quality 70, base64 UTF-8
Why 424×240	Reduces USB bandwidth drops during Ollama GPU inference

4. YOLO Vision Module

Import (in marcus_llava.py)

from marcus_yolo import (
    start_yolo,
    yolo_sees, yolo_count, yolo_closest,
    yolo_summary, yolo_ppe_violations,
    yolo_person_too_close, yolo_all_classes, yolo_fps,
)

# Start YOLO sharing the camera frame
YOLO_AVAILABLE = start_yolo(raw_frame_ref=_raw_frame, frame_lock=_raw_lock)

`start_yolo(raw_frame_ref, frame_lock)`

Loads YOLO model and starts inference background thread.

def start_yolo(raw_frame_ref=None, frame_lock=None) -> bool:

Returns True on success, False if model fails to load.

`yolo_sees(class_name, min_confidence)`

yolo_sees("person")          # True if person detected
yolo_sees("chair", 0.6)      # True with stricter confidence

Returns bool. Instant — no LLaVA call.

`yolo_count(class_name)`

n = yolo_count("person")     # 0, 1, 2...

`yolo_closest(class_name)`

Returns the Detection object with the largest bounding box (closest to robot).

p = yolo_closest("person")
if p:
    print(p.position)          # "left" / "center" / "right"
    print(p.distance_estimate) # "very close" / "close" / "medium" / "far"
    print(p.confidence)        # 0.0 to 1.0
    print(p.size_ratio)        # fraction of frame area

`yolo_summary()`

yolo_summary()
# → "1 person (center, close) | 2 chairs (right, medium) | 1 laptop (left, far)"

`yolo_ppe_violations()`

violations = yolo_ppe_violations()
# → ["no helmet (left)", "no vest (center)"]
# Requires custom PPE model — returns [] with yolov8m.pt

`yolo_person_too_close(threshold)`

if yolo_person_too_close(threshold=0.25):
    gradual_stop()   # person covers >25% of frame

`yolo_all_classes()`

classes = yolo_all_classes()
# → {"person", "chair", "laptop"}

`yolo_fps()`

print(f"{yolo_fps():.1f}fps")   # e.g. 4.4fps on CPU

Detection class properties

Property	Type	Description
`class_name`	str	e.g. "person"
`confidence`	float	0.0 to 1.0
`position`	str	"left" / "center" / "right"
`distance_estimate`	str	"very close" / "close" / "medium" / "far"
`size_ratio`	float	bbox area / frame area
`cx`, `cy`	int	bbox center coordinates
`x1, y1, x2, y2`	int	bounding box corners

5. LLaVA AI Functions

`ask(command, img_b64)`

Main command processor.

def ask(command: str, img_b64) -> dict:

Parameter	Description
`command`	Natural language command
`img_b64`	Base64 JPEG camera frame

Returns dict with actions, arm, speak, abort.

Options:

options={"temperature": 0.0, "num_predict": 200}

Response time: 4-8s (14s first call warmup)

`ask_goal(goal, img_b64)`

Used in goal navigation loop.

def ask_goal(goal: str, img_b64) -> dict:

Returns: reached (bool), next_move (str), duration (float), speak (str)

`ask_patrol(img_b64)`

Used in autonomous patrol.

Returns: observation (str), alert (str|None), next_move (str), duration (float)

`_call_llava(prompt, img_b64, num_predict)`

Internal helper — sends to Ollama API.

r = ollama.chat(
    model="llava:7b",
    messages=[{"role": "user", "content": prompt, "images": [img_b64]}],
    options={"temperature": 0.0, "num_predict": 200}
)

`_parse_json(raw)`

Extracts JSON from LLaVA response. Strips markdown fences automatically.

raw = '```json\n{"move": "left"}\n```'
d   = _parse_json(raw)   # → {"move": "left"}

6. Arm SDK

Class: G1ArmActionClient (from unitree_sdk2py.g1.arm.g1_arm_action_client) Method: ExecuteAction(action_id: int) -> int (returns 0 on success)

`do_arm(action)`

def do_arm(action):   # action: str name or int ID

Action ID Map

Friendly name	Action ID	Description
`wave`	26	High wave
`raise_right`	23	Right hand up
`raise_left`	15	Both hands up
`both_up`	15	Both hands up
`clap`	17	Clap hands
`high_five`	18	High five
`hug`	19	Hug pose
`heart`	20	Heart shape
`right_heart`	21	Right hand heart
`reject`	22	Reject gesture
`shake_hand`	27	Shake hand
`face_wave`	25	Wave at face level
`lower`	99	Release to default

Notes

Runs in background thread — does not block movement
Error 7404 = robot was moving during arm command — always gradual_stop() first
ALL_ARM_NAMES set intercepts arm words that LLaVA puts in actions list

7. Movement Functions

`execute_action(move, duration)`

Executes a single movement step.

def execute_action(move: str, duration: float):

Intercepts arm names → routes to do_arm()
Calls gradual_stop() after each step
Waits STEP_PAUSE (0.3s) between steps

`_merge_actions(actions)`

Merges consecutive same-direction steps into one smooth movement.

# LLaVA returns:
[{"move":"right","duration":1.0}, {"move":"right","duration":1.0},
 {"move":"right","duration":1.0}, {"move":"right","duration":1.0},
 {"move":"right","duration":1.0}]

# _merge_actions produces:
[{"move":"right","duration":5.0}]  # one smooth 5-second rotation

`execute(d)`

Runs full LLaVA decision.

def execute(d: dict):
    # 1. Check abort
    # 2. _merge_actions() — smooth consecutive steps
    # 3. execute_action() for each step in order
    # 4. do_arm() in background thread

`_move_step(move, duration)`

Lightweight step for goal/patrol loops — no full gradual_stop() between checks.

def _move_step(move: str, duration: float):
    # send velocity for duration seconds
    # single zero-vel + 0.1s pause — then immediately check YOLO again

MOVE_MAP

MOVE_MAP = {
    "forward":  ( 0.3,  0.0,  0.0),   # vx m/s
    "backward": (-0.2,  0.0,  0.0),
    "left":     ( 0.0,  0.0,  0.3),   # vyaw rad/s
    "right":    ( 0.0,  0.0, -0.3),
}

8. Prompt Engineering

MAIN_PROMPT

Controls LLaVA's response format for all standard commands.

Key rules embedded in prompt:

actions is a list — one entry per step
arm is never a move value
"90 degrees" = 5.0s duration
"1 step" = 1.0s duration

To add arm examples or change behavior — edit MAIN_PROMPT examples section.

GOAL_PROMPT

Used inside navigate_to_goal() as LLaVA fallback. Forces {"reached": bool, "next_move": str, "duration": float, "speak": str}.

PATROL_PROMPT

Used inside patrol() for scene assessment. Forces {"observation": str, "alert": str|null, "next_move": str, "duration": float}.

`navigate_to_goal(goal, max_steps)`

def navigate_to_goal(goal: str, max_steps: int = 40):

Flow:

Extract YOLO target from goal text (_goal_yolo_target())
Move left 0.4s (lightweight step)
After MIN_STEPS_BEFORE_CHECK (3) steps — check YOLO every step
If yolo_sees(target) → gradual_stop() → print result → return
Falls back to LLaVA if class not in YOLO set

Why minimum steps? Prevents false stop from stale camera frame when robot hasn't moved yet.

YOLO class aliases in goals

_GOAL_ALIASES = {
    "guy": "person", "man": "person", "woman": "person",
    "human": "person", "people": "person", "someone": "person",
    "table": "dining table", "sofa": "couch",
}

Examples

navigate_to_goal("stop when you see a person")
navigate_to_goal("keep turning left until you see a guy")
navigate_to_goal("find a chair and stop in front of it")
navigate_to_goal("stop when you are close to the laptop")
navigate_to_goal("stop at the end of the corridor")   # LLaVA fallback

10. Autonomous Patrol

`patrol(duration_minutes, alert_callback)`

def patrol(duration_minutes: float = 5.0, alert_callback=None):

Each patrol step:

YOLO PPE violations check (instant)
yolo_person_too_close() safety check — pauses if True
LLaVA scene assessment → navigation decision
_move_step() to next position

Custom alert handler:

def my_alert(text: str):
    print(f"SECURITY: {text}")
    # send notification, sound alarm, etc.

patrol(duration_minutes=10.0, alert_callback=my_alert)

11. Main Loop

while True:
    cmd = input("Command: ").strip()

    if cmd.lower() in ("q", "quit", "exit"):
        break

    # YOLO query — never sent to LLaVA
    if any(w in cmd.lower() for w in ("yolo", "are you using yolo", "vision")):
        print(f"  YOLO: {yolo_summary()} | {yolo_fps():.1f}fps")
        continue

    # Goal navigation
    if cmd.lower().startswith("goal:"):
        navigate_to_goal(cmd[5:].strip())
        continue

    # Patrol
    if cmd.lower() == "patrol":
        patrol(duration_minutes=...)
        continue

    # Standard LLaVA command
    img = get_frame()
    d   = ask(cmd, img)
    execute(d)

12. JSON Schema Reference

Standard command response

{
  "actions": [
    {"move": "forward|backward|left|right|stop", "duration": 2.0},
    {"move": "right", "duration": 2.0}
  ],
  "arm": "wave|raise_right|raise_left|clap|high_five|hug|heart|shake_hand|face_wave|null",
  "speak": "What Marcus says out loud",
  "abort": null
}

{
  "reached": false,
  "next_move": "left",
  "duration": 0.5,
  "speak": "I see boxes but no person yet"
}

Patrol assessment response

{
  "observation": "I see a person working at a desk",
  "alert": null,
  "next_move": "forward",
  "duration": 1.0
}

Field definitions

Field	Type	Values
`move`	str\|null	"forward", "backward", "left", "right", "stop", null
`duration`	float	seconds (max 5.0 per step)
`arm`	str\|null	action name or null
`speak`	str	one sentence
`abort`	str\|null	reason string or null
`reached`	bool	true only if goal visually confirmed

13. Environment & Paths

Conda environments

Env	Python	Location	Purpose
`marcus`	3.8	`/home/unitree/miniconda3/envs/marcus`	Marcus brain + YOLO
`hsinference`	3.10	`~/.holosoma_deps/miniconda3/envs/hsinference`	Holosoma policy

Always use full path:

/home/unitree/miniconda3/envs/marcus/bin/python3 ~/Models_marcus/marcus_llava.py

Key file paths

File	Path
Marcus brain	`~/Models_marcus/marcus_llava.py`
YOLO module	`~/Models_marcus/marcus_yolo.py`
YOLO model	`~/Models_marcus/Model/yolov8m.pt`
Loco model	`~/holosoma/.../models/loco/g1_29dof/fastsac_g1_29dof.onnx`
LLaVA weights	`~/.ollama/models/`
Arm SDK	`~/unitree_sdk2_python/`

Python imports

import ollama          # LLaVA via Ollama
import zmq             # Holosoma communication
import json, time, base64, threading, sys, io
import numpy as np
import pyrealsense2 as rs
from PIL import Image
from marcus_yolo import start_yolo, yolo_sees, yolo_summary  # YOLO
from unitree_sdk2py.g1.arm.g1_arm_action_client import G1ArmActionClient  # Arm

14. Quick Reference Card

STARTUP:
  Tab 1: source ~/.holosoma_deps/miniconda3/bin/activate hsinference
          cd ~/holosoma && sudo jetson_clocks
          python3 run_policy.py inference:g1-29dof-loco \
            --task.velocity-input zmq --task.state-input zmq --task.interface eth0

  Tab 2: ollama serve &
          /home/unitree/miniconda3/envs/marcus/bin/python3 ~/Models_marcus/marcus_llava.py
          (YOLO starts automatically — no Tab 3 needed)

COMMANDS:
  walk forward · turn right · turn left · move back
  turn right 90 degrees · turn left 3 steps
  what do you see · inspect the office
  wave · raise your right arm · clap · high five
  goal: stop when you see a person
  goal: keep turning left until you see a guy
  patrol
  are you using yolo
  q

VELOCITIES:
  forward  vx=+0.3 m/s    backward vx=-0.2 m/s
  left     vyaw=+0.3       right    vyaw=-0.3

KEY FUNCTIONS:
  send_vel(vx, vy, vyaw)    gradual_stop()       send_cmd(str)
  get_frame() → b64         ask(cmd, img) → dict  execute(dict)
  yolo_sees("person")       yolo_summary()        yolo_closest("person")
  navigate_to_goal(goal)    patrol(minutes)        do_arm("wave")

ARM IDs:
  wave=26  raise_right=23  raise_left=15  clap=17
  high_five=18  hug=19  heart=20  reject=22  shake_hand=27

SAFETY:
  gradual_stop() — always — never cut velocity abruptly
  Never send_cmd("stop") while moving
  camera_alive[0] = False — stops camera thread on exit
  Error 7404 — robot was moving during arm command — stop first

Marcus — YS Lootah Technology | Kassam | April 2026

18 KiB Raw Blame History Unescape Escape

Marcus — Full API & Developer Reference

Table of Contents

1. Configuration Variables

2. ZMQ — Holosoma Communication

Setup

send_vel(vx, vy, vyaw)

gradual_stop()

send_cmd(cmd)

3. Camera Functions

Architecture

camera_loop()

get_frame()

4. YOLO Vision Module

Import (in marcus_llava.py)

start_yolo(raw_frame_ref, frame_lock)

yolo_sees(class_name, min_confidence)

yolo_count(class_name)

yolo_closest(class_name)

yolo_summary()

yolo_ppe_violations()

yolo_person_too_close(threshold)

yolo_all_classes()

yolo_fps()