GoWelcome/README.md

18 KiB
Raw Blame History

GoWelcome

An autonomous Unitree Go2 "backyard greeter". The robot dog wanders a yard, spots a person with on-board vision, walks up to them via visual servoing, and plays a friendly greeting (audio + gestures) -- all while steering clear of roads and vehicles.

GoWelcome is mapless, reactive, and NON-ROS2: a single Python process running a fixed-rate state machine on top of the official unitree_sdk2py. No SLAM, no global planner -- just perceive, decide, move.


Behaviour: the 5-state machine

Arbitration priority each tick (highest first): e-stop / pausestale-perception haltAVOID_DANGER (cars/obstacles)GPS hold (no fix) → finish GREETBOUNDARY (geofence)APPROACHWANDER + dog-play. So cars always win, the dog never chases a person out of the area, and idle play only happens when nothing else is going on.

   +----------+  person (conf>thr)  +-----------+  box fills   +---------+
   |  WANDER  | ------------------> | APPROACH  | -----------> |  GREET  |
   |  + play  | <------ lost ------ |           |  stop_ratio  | wav+wag |
   +----------+   cooldown <----------------------------------+---------+
     ^   ^  ^
     |   |  +--- back inside ----  +--------------+  near geofence edge (GPS)
     |   |                         |  BOUNDARY    |  home back toward centre
     |   +------ clear ----------  +--------------+
     |                              ^
     |   road / vehicle near   any state can preempt to
     +------------------------  +--------------+
                                | AVOID_DANGER |  back up + pivot away
                                +--------------+
  • WANDER -- cruise forward (wander.forward_speed) with a gentle yaw sweep, scanning the yard, and occasionally performing an idle dog-play action (see below). It also keeps its distance from the road/cars: as pavement or a vehicle appears ahead it veers toward the clear side and slows down (the vision-only containment — see below). Transitions to APPROACH when a person is detected above perception.person_conf.
  • APPROACH -- run the visual servo (below) to centre and close on the person. Falls back to WANDER if the person is missing for loop.person_lost_frames. Transitions to GREET when the person's box fills servo.stop_height_ratio of the frame height.
  • GREET -- settle to a full stop (greet.settle_time), play the greeting wav, run greet.gestures (spaced by greet.gesture_gap), then return to WANDER and ignore people for greet.cooldown seconds.
  • AVOID_DANGER -- preempts every other state (highest safety). Triggered by HSV road coverage over perception.road_trigger_coverage, or a vehicle box taller than perception.danger_min_height_ratio. Backs up (avoid.backup_speed) then pivots away from the road until clear for avoid.clear_frames frames.
  • BOUNDARY -- optional GPS keep-in-area behaviour (off by default; see below). When enabled and the dog nears the geofence edge it homes back toward the centre, and won't leave the area even to chase a person.

A perception time-out (safety.perception_timeout) in any state stops the robot until fresh frames arrive.


Stay-in-area: vision (default), GPS optional

The dog patrols an open area with no physical fence. By default it stays in the area with vision — no GPS needed:

  • Soft road/car repulsion (WANDER): as pavement appears in the lower frame (HSV road mask), or a vehicle is detected (YOLO), the dog veers toward the clear side and slows down before reaching the hard reaction — actively keeping its distance from the road/cars (avoid.soft_road_coverage, road_repulsion_gain, car_repulsion_gain).
  • Hard reaction (AVOID_DANGER): up close (road fills the centre past perception.road_trigger_coverage, or a near car), it backs up and pivots away. The firmware LiDAR hard-stop sits underneath all of it.

This relies on a usable visual border (a clear grass→pavement edge, decent lighting) and is the recommended setup. Tune the gains in config.py (AvoidConfig); set avoid.soft_avoid_enabled = False to keep only the hard reaction.

Optional: GPS geofence (--geofence)

For a hard metric boundary, add an external GPS receiver and enable the geofence with --geofence (off by default). It adds the BOUNDARY state, homing the dog back toward a centre point.

⚠️ The Go2 has no built-in GPS. This requires an external GPS receiver on the onboard computer (USB u-blox-class into the Jetson), read via gpsd or serial NMEA. Standard GPS is accurate to ±25 m, so keep geofence.radius_m well inside the real edge (use RTK GPS for tight bounds near a road).

How it works:

  1. On startup the first good fix becomes the fence centre (center_mode: onstart), or set explicit center_lat/lon (center_mode: fixed), or press "Set fence centre here" on the dashboard.
  2. The dog roams freely within geofence.radius_m. Within geofence.margin_m of the edge it enters BOUNDARY and homes back toward the centre, steering with the GPS course-over-ground (no compass needed), until release_m back inside (hysteresis).
  3. Fail-safe: if GPS is lost/stale (gps.stale_after) the dog stops (geofence.no_fix_behavior: stop) rather than roam blind near the edge.

GPS source (gps.source): auto (probe gpsd, else serial), gpsd, serial (NMEA on gps.serial_port), or mock (a simulated receiver that integrates the commanded motion — for testing). --gps/--radius imply --geofence.


Act like a dog (idle play)

While WANDERing, an idle scheduler occasionally performs a random dog action (play.actions: stretch, wiggle, scrape/dig, dance1, wallow, ...), pausing briefly to do the trick. Intensity is runtime-settable (default moderate): calm (~75 s between actions), moderate (~30 s), playful (~15 s) — change it any time from the dashboard or with --play. (The greeting itself adds Hello/Heart "wags".) Play never overrides safety, the geofence, or a greeting.


Architecture

   camera frames
        |
        v
+---------------------+      latest()        +-------------------------+
|  PerceptionThread   | -------------------> |  GoWelcomeStateMachine   |
|  (background thread) |   PerceptionResult   |  step(dt) -> State        |
|  YOLO + HSV road     |                      |  WANDER/APPROACH/GREET/   |
+---------------------+                      |  AVOID_DANGER             |
        ^                                     +-------------------------+
        | get_frame()                                   |
        |                                               | drive() / stop() /
        |                                               | gesture() / play_greeting()
        |                                               v
        |                                  +--------------------------+
        +--------------------------------- |      RobotInterface       |
                                           |  (abstract contract)      |
                                           +--------------------------+
                    /                     |                      \
       +------------------+   +------------------+   +------------------+
       | Go2WebRTCRobot   |   |    Go2Robot      |   |    MockRobot     |
       | unitree_webrtc_  |   | unitree_sdk2py   |   | webcam / video,  |
       | connect (DEFAULT)|   | over CycloneDDS  |   | no hardware, for |
       | wifi, AIR/PRO/EDU|   | (--transport dds)|   | off-robot dev/CI |
       | + AudioHub audio |   | wired / EDU      |   |                  |
       +------------------+   +------------------+   +------------------+
              (async bridge: WebRTC event loop on its own thread)
  • config.py -- every tunable lives here in grouped dataclasses (GoWelcomeConfig). CLI flags in main.py override a handful at startup.
  • gowelcome/types.py -- frozen data contracts: State, Detection, RoadInfo, PerceptionResult. The shared language between layers.
  • gowelcome/robot/interface.py -- the RobotInterface and AudioBackend ABCs plus the GESTURES vocabulary. The behaviour layer talks only to these.
  • PerceptionThread -- grabs frames from the robot, runs YOLO + the HSV road mask off the control loop, and publishes the newest PerceptionResult via latest().
  • GoWelcomeStateMachine -- the reactive brain; step(dt) reads the latest perception and issues robot commands, returning the current State.

Velocity convention everywhere (matches SportClient.Move): vx forward+, vy left+, vyaw CCW/left+ (rad/s).


Transports & greeting audio

GoWelcome talks to the robot through one of two transports (--transport):

Transport Library Works on Greeting audio When
webrtc (default) unitree_webrtc_connect (app protocol) Go2 AIR/PRO/EDU over wifi from the dog's speaker (AudioHub) default; no jailbreak
dds official unitree_sdk2py (CycloneDDS) Go2 EDU, wired none on Go2 → host speaker --transport dds

Greeting from the dog (WebRTC default). On startup GoWelcome uploads assets/greeting.wav to the robot via AudioHub and plays it by uuid on each greeting — sound comes from the Go2's own speaker. Pick the method with --audio-method:

  • audiohub (default) — upload once, play_by_uuid per greeting (persistent, low latency).
  • stream — stream the file live each greeting via an aiortc MediaPlayer.

DDS transport audio. The official SDK has no Go2 audio path (its AudioClient is G1-only). On --transport dds, greeting audio falls back to a pluggable host backend (--audio host|go2|null) that plays on the machine running GoWelcome. The field-proven pattern (from the team's G1 Sanad stack) is a USB/Bluetooth speaker on the onboard computer, pinned by its PulseAudio sink:

pactl list short sinks                      # find your speaker's sink
python main.py --transport dds --interface eth0 \
  --audio-device alsa_output.usb-Anker_PowerConf_A3321-DEV-SN1-01.analog-stereo

Drop your clip at assets/greeting.wav — see assets/greeting.README.md. (DDS host playback wants 16 kHz mono 16-bit PCM; AudioHub accepts any wav.)


Install

Off-robot (mock / development / tests)

python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt        # numpy, opencv-python, ultralytics, simpleaudio

You do not need any robot library for --mock.

On the real robot — WebRTC (default)

sudo apt install -y portaudio19-dev
pip install -r requirements.txt          # includes unitree_webrtc_connect

For Go2 firmware ≥ 1.1.15 you also need the per-device AES-128 key (once): fetch it with the connector's examples/fetch_aes_key.py, then pass --aes-key.

On the real robot — DDS (alternative, EDU/wired)

Install the official Unitree SDK on the robot's host (not a plain pip install):

# https://github.com/unitreerobotics/unitree_sdk2_python
git clone https://github.com/unitreerobotics/unitree_sdk2_python
cd unitree_sdk2_python && pip install -e .   # pulls in cyclonedds

pyserial is not required.


Run

# Off-robot, webcam index 0, silent audio:
python main.py --mock --audio null --source 0
#   or:  ./scripts/run_mock.sh

# Off-robot from a video file with the debug window:
python main.py --mock --source backyard.mp4

# Real Go2 over WebRTC (default) — greeting plays from the dog's speaker:
python main.py --robot-ip 192.168.1.50            # add --aes-key <hex> on fw >= 1.1.15
#   or:  ./welcome.sh --robot-ip 192.168.1.50     # TEST SUSPENDED FIRST

# Real Go2 over DDS (EDU/wired):
python main.py --transport dds --interface eth0

# Useful flags:
python main.py --robot-ip 192.168.1.50 --device cuda --headless --web
python main.py --mock --dry-run          # perceive + decide, never move
Flag Config field set
--mock mock
--transport transport (webrtc/dds)
--robot-ip webrtc.ip (localsta)
--serial webrtc.serial_number
--aes-key webrtc.aes_128_key (fw ≥ 1.1.15)
--connection webrtc.connection_method
--audio-method webrtc.audio_method (audiohub/stream)
--interface network.interface (dds)
--device perception.device
--model perception.model_path
--source camera.mock_source
--wav greet.wav_path
--audio audio.backend (host/go2/null)
--audio-device audio.output_device (PulseAudio sink)
--no-avoidance safety.use_lidar_avoidance = False
--headless headless (no cv2 window)
--dry-run dry_run (decide but never move)
--conf perception.person_conf
--web web.enabled (control dashboard)
--web-port web.port (default 8080)
--geofence enable GPS geofence (default vision-only)
--gps gps.source (auto/gpsd/serial/mock)
--radius geofence.radius_m (metres)
--play play.mode (calm/moderate/playful)

A live cv2 window (unless --headless) draws green person boxes, red danger boxes, the road-coverage percentage, and the current state. Press ESC in the window (or Ctrl-C in the terminal) to quit.

Control dashboard (HTTP)

Add --web and open http://<dog-ip>:8080/ from any laptop/phone on the network — ideal headless on the dog. The page shows the live camera (with the detection/state overlay) plus controls: change play mode (calm/moderate/playful), pause/resume, E-STOP, and "set fence centre here", with a live status panel (state, GPS fix, in/out of fence).

./welcome.sh --robot-ip 192.168.1.50 --headless --web    # dog, browse to its IP:8080
./welcome.sh --mock --source 0 --web --web-port 9000      # off-robot demo

Endpoints: / (dashboard), /stream.mjpg (raw MJPEG), /snapshot.jpg, /status.json (status), POST /control (commands), /healthz. Stdlib-only (http.server + cv2.imencode) — no extra dependency, multiple viewers.

⚠️ Security: the viewer binds 0.0.0.0 with no authentication — anyone on the same network can watch the camera at http://<dog-ip>:<port>/. That's intended for a trusted home LAN. On an untrusted network, set web.host = "127.0.0.1" in config.py (view only via an SSH tunnel), or leave --web off.


Tuning

Everything is in config.py, grouped by subsystem. Common knobs:

  • Detection -- perception.person_conf (default 0.80), perception.device, perception.model_path, perception.danger_classes, perception.danger_min_height_ratio.
  • Road mask -- perception.road_hsv_lower/upper, road_crop_frac, road_trigger_coverage.
  • Approach feel -- servo.kp_yaw, servo.max_yaw_rate, servo.yaw_deadband, servo.kp_forward, servo.max_forward, servo.stop_height_ratio, servo.yaw_sign (flip if your camera mounting inverts left/right).
  • Wander -- wander.forward_speed, wander.yaw_sweep_rate/period.
  • Greeting -- greet.gestures, greet.gesture_gap, greet.cooldown.
  • Safety caps -- safety.max_vx/max_vy/max_vyaw, safety.perception_timeout, safety.command_timeout.

Visual-servoing math

The servo turns a single person bounding box into a (vx, vyaw) command each tick. Let frame_w, frame_h be the frame size and the box have centre cx and height h.

Horizontal (yaw) error -- normalised to [-1, 1], + = right of centre:

err = (cx - frame_w/2) / (frame_w/2)        # Detection.horizontal_offset

Yaw command -- a P(ID) controller on err, with a deadband and clamp:

vyaw = yaw_sign * PID(err)        clamped to +/- servo.max_yaw_rate
       |err| < servo.yaw_deadband -> vyaw = 0

With the default yaw_sign = -1: a target to the right (err > 0) yields vyaw < 0 (a clockwise/right turn) -- the robot turns toward the person. Flip yaw_sign if your mounting inverts this.

Distance proxy -- how much of the frame height the box fills:

height_ratio = h / frame_h        # Detection.height_ratio
arrived = height_ratio >= servo.stop_height_ratio   (default 0.50)

Forward command -- proportional to remaining distance, throttled when the heading error is large so the dog squares up before charging, and zeroed on arrival:

vx = kp_forward * (stop_height_ratio - height_ratio)
vx *= exp(-forward_heading_falloff * |err|)         # slow down off-axis
vx  = clamp(vx, 0, servo.max_forward)               # never reverse to approach
arrived -> vx = 0

All commands then pass the global safety caps (safety.max_*) before reaching drive().


Safety notes

  • E-stop = Ctrl-C. SIGINT/SIGTERM set a stop flag; the loop then stops the robot and runs a clean shutdown() (zero velocity, release avoidance, close camera/audio). ESC in the debug window does the same.
  • Test suspended first. Always run a new build with the Go2 hung off the ground and a hand on Ctrl-C before letting it walk.
  • LiDAR firmware hard-stop. On real hardware, drive() routes through the Go2 ObstaclesAvoidClient when safety.use_lidar_avoidance is on (default). This is a firmware-level last line of defence on top of GoWelcome's own AVOID_DANGER logic. --no-avoidance disables it (use with care).
  • Velocity caps. Every command is clamped by safety.max_vx/vy/vyaw after the controllers run, so a controller bug can't command an unsafe speed.
  • Stale perception. If no fresh frame arrives within safety.perception_timeout, the robot stops.
  • --dry-run runs the full perception + decision pipeline but never sends a non-zero velocity -- handy for validating behaviour safely.