2026-04-12 18:52:37 +04:00
2026-04-12 18:52:37 +04:00
2026-04-12 18:52:37 +04:00
2026-04-12 18:52:37 +04:00
2026-04-12 18:52:37 +04:00
2026-04-12 18:52:37 +04:00
2026-04-12 18:52:37 +04:00
2026-04-12 18:52:37 +04:00
2026-04-12 18:52:37 +04:00
2026-04-12 18:52:37 +04:00
2026-04-12 18:52:37 +04:00
2026-04-12 18:52:37 +04:00
2026-04-12 18:52:37 +04:00

AI Photographer

Production-oriented robot photographer stack for Unitree G1.

Quick Start

  1. Set API key in config:
    • edit Data/Settings/config.json -> gemini.api_key
  2. Run launcher:
    • cd Scripts && ./photo_sanad.sh

Launcher behavior:

  • resets mode.current_mode to the configured mode.default_mode on each launch,
  • resolves the active PulseAudio speaker/microphone and exports them for the runtime,
  • starts the direct camera service (Core/direct_camera_service.py in the teleimager conda env) for the full runtime path,
  • starts Gemini runtime (gemini conda env),
  • runs Gemini/voice_sanad.py as the main process,
  • keeps manual as the default startup mode unless changed in Data/Settings/config.json,
  • keeps Gemini/voice_sanad.py, the dashboard server, the direct camera server, and replay/trigger services running in the full runtime path across mode switches,
  • arms autonomous services in the full runtime path so switching dashboard mode from manual to ai works without a restart.

Optional startup profile:

  • MANUAL_LEAN_RUNTIME=1
    • manual-mode voice + dashboard only
    • skips direct camera, DDS, replay, uploader, and autonomous startup
    • capture/replay services are unavailable in that profile
    • this is an explicit reduced profile, not the normal production mode

Project Layout

  • Scripts/
    • startup and ops shell entrypoints
    • photo_sanad.sh, fix_realsense_usb.sh
    • direct_camera_samples_server.py remains as a compatibility wrapper
    • fix_realsense_usb.sh supports --check, --fix, and --serials
  • Data/
    • categorized runtime assets and state
    • Settings/config.json
    • Scripts/photo_command_ai.txt, Scripts/sanad_script.txt
    • Runtime/upload_db.json, generated runtime JSON state files
    • Settings/config.json -> camera.preferred_realsense_serial selects the preferred default RealSense by serial
    • generated runtime JSON files are created on demand during execution
  • Data/Audio/
    • fixed prerecorded AI situation prompts (*.wav)
    • matching raw Gemini captures (*_raw.wav)
  • Data/Settings/audio_prompt_records.json
    • prompt recording metadata for files in Data/Audio/
  • photos/people/
    • AI face-recognition registry
    • each returning guest gets a folder with face/scene references, metadata, and captured-photo links
  • photos/Captures/
    • final saved runtime captures from dashboard, manual trigger, and AI capture flow
  • photos/samples/
    • standalone direct-camera sample captures
  • Logs/
    • one stable log file per runtime component
    • examples: voice_sanad.log, gemini_voice.log, photo_server.log, direct_camera.log
  • Web/
    • operator dashboard frontend (gallery.html, gallery.js, style.css)
    • direct camera service frontend (direct_camera.html, direct_camera.js, direct_camera.css)
  • Data/G1/
    • replay/gesture motion files (*.jsonl)
    • dashboard-recorded replay captures are saved directly here
  • Core/
    • shared runtime foundations (settings.py, Logger.py, error_events.py, direct_camera_service.py)
    • direct_camera_service.py handles camera backend/API and serves its UI from Web/
  • Gemini/
    • voice orchestration (voice_sanad.py, gemini_voice.py, sanad_text_utils.py)
  • Server/
    • dashboard/API/capture/upload (photo_server.py, capture_service.py, direct_camera_client.py, uploader.py)
  • Modes/AI/
    • autonomous vision/intent/session manager (autonomous_manager.py, vision_detector.py, camera_module.py)
  • Modes/Manual/
    • controller + replay + trigger loop (controller.py, replay_engine.py, trigger_loop.py)

Runtime Modes

Mode is persisted in Data/Settings/config.json under mode.current_mode:

  • manual
    • Gemini conversation stays available when gemini.mic_enabled=true
    • voice request_photo / yes_photo / no_photo disabled
    • R2+X replay/capture path stays available in the full runtime path
    • Gemini/voice_sanad.py, dashboard server, direct camera server, and replay services stay running
    • autonomous services can be armed in the background, but autonomous flow stays paused until mode becomes ai
  • ai
    • voice request_photo / yes_photo / no_photo enabled
    • R2+X replay/capture path still works
    • Gemini/voice_sanad.py, dashboard server, direct camera server, and replay services continue running without restart
    • autonomous flow runs live when AUTONOMOUS_ENABLE=1
    • on stable visual intent, AI identifies or enrolls a single guest, optionally greets with a short hand replay, asks for photo confirmation, guides guests into frame, then captures using the active replay during the shot when AI capture replay is enabled
    • returning guests are recognized from photos/people/ and can be greeted as returning visitors

Command-mode functionality was extracted from this project and moved to:

  • G1_Lootah/AI_Command

Remote Controls

  • R2+X
    • replay + photographer talk + capture pipeline
  • R2+L1
    • global hard cancel safety combo
    • active in runtime loops to cancel pending capture/replay and reset active interaction

Mode APIs:

  • GET /api/mode
  • GET /api/set_mode?mode=manual|ai
  • GET /api/mode_policy
  • GET /api/mic
  • GET /api/set_mic?enabled=0|1
  • GET /api/detector_backend
  • GET /api/set_detector_backend?backend=normal|yolo
  • GET /api/ai_readiness
  • GET /api/ai_options
  • GET /api/set_ai_options?hard_target_lock_enabled=0|1&retake_prompt_enabled=0|1&autonomous_greeting_replay_enabled=0|1&autonomous_greeting_replay_file=...&autonomous_capture_replay_enabled=0|1&face_recognition_enabled=0|1&face_recognition_threshold=...
  • GET /api/autonomous_state
  • GET /api/runtime_health

Autonomous Flow

Autonomous services are armed by environment:

  • AUTONOMOUS_ENABLE=1
    • allows Modes/AI/autonomous_manager.py to run inside voice_sanad.py
    • in manual mode it stays paused
    • core services still remain up in manual (voice_sanad.py, dashboard, direct camera server, replay/trigger)
    • switching dashboard mode to ai starts autonomous flow live without a restart
  • AUTONOMOUS_ENABLE=0
    • disables autonomous manager entirely
    • manual trigger loop + voice runtime still work

Session state machine:

  • IDLE -> WAIT_CONFIRM -> FRAMING -> COUNTDOWN -> RETAKE_CONFIRM (optional) -> COMPLETE -> IDLE
  • strict readiness block state: IDLE_BLOCKED when required YOLO readiness is not met

Dashboard / API Highlights

  • GET / gallery dashboard
  • GET /preview.mjpg live preview
    • preview is off by default and starts only when requested from the dashboard
    • preview camera/OpenCV is loaded lazily when preview is requested
  • Camera control APIs:
    • GET /api/camera_health
    • GET /api/camera_sources
    • GET /api/set_camera_source?source=...
    • GET /api/set_camera_resolution?width=...&height=...&fps=...
    • GET /api/set_preferred_camera?serial=...
    • dashboard can switch camera source, show active camera info, change resolution live, and save a preferred RealSense serial into Data/Settings/config.json
  • Audio prompt APIs:
    • GET /api/audio_prompts
    • GET /api/set_audio_prompt_mode?mode=audio|gemini
    • GET /api/set_audio_prompt_fallback?enabled=0|1
    • GET /api/audio_prompt_record_status
    • GET /api/download_audio_prompt?key=...
    • GET /api/delete_audio_prompt?key=...
    • POST /api/upload_audio_prompt
    • POST /api/audio_prompt_record
    • dashboard can upload, replace, download, delete, inspect, and record prerecorded AI prompt clips stored in Data/Audio/
    • dashboard can switch fixed AI situation speech between recorded audio and Gemini without restart
  • GET /api/autonomous_state runtime autonomous state panel data (lock/retake/health fields)
  • GET /api/runtime_health component health (WS/mic/speaker/gate/restarts)
  • GET /api/mic and GET /api/set_mic
    • microphone ON/OFF toggle for both modes
  • GET /api/ai_readiness strict AI readiness + block reason
  • GET /api/ai_options and GET /api/set_ai_options for hard lock/retake toggles
  • Replay APIs:
    • GET /api/replays
    • GET /api/get_replay
    • GET /api/set_replay?name=...
    • GET /api/delete_replay?name=...
    • GET /api/rename_replay?old=...&new=...
    • GET /api/download_replay?name=...
    • GET /api/replay_record_status
    • GET /api/replay_record_start?name=...&seconds=...
    • GET /api/replay_test_status
    • GET /api/test_replay?name=...
    • POST /api/upload_replay
    • dashboard can record new replays into Data/G1, replay-test them, rename them, download them, delete them, upload new .jsonl replays, and set the active replay
    • replay recording and replay test are allowed only while runtime mode is manual
  • People APIs:
    • GET /api/people
    • GET /api/person_image?id=...&kind=face|scene
    • GET /api/download_person?id=...
    • GET /api/delete_person?id=...
    • GET /api/reset_people
    • POST /api/upload_person
    • dashboard can upload face photos for future recognition, download a saved guest package, delete one guest, or reset the full registry
  • GET /api/capture capture via unified pipeline
  • GET /api/photos, GET /api/sessions, GET /api/delete, GET /api/reupload
  • GET /api/errors structured error counters

Configuration

Source of truth:

  • Data/Settings/config.json loaded by Core/settings.py

Environment overrides are supported (timing, ports, upload settings, camera, etc.).

Direct camera serial selection precedence:

  • REALSENSE_SERIAL
  • PREFERRED_REALSENSE_SERIAL
  • Data/Settings/config.json -> camera.preferred_realsense_serial
  • teleimager camera config serial
  • any other detected RealSense

Dashboard camera behavior:

  • main production dashboard can switch between available camera sources without restarting the runtime
  • camera status panel shows requested source, active source, backend, active profile, preferred serial, and active RealSense serial
  • resolution changes are applied live through the direct camera service
  • Save As Default stores the preferred RealSense serial into Data/Settings/config.json

AI prerecorded prompt behavior:

  • Data/Audio/ stores fixed WAV clips by prompt key
  • Data/Settings/audio_prompt_records.json stores prompt recording metadata and raw-output file references
  • audio_prompts.files in Data/Settings/config.json maps each key to its filename
  • audio_prompts.mode controls fixed AI situation speech:
    • audio: use recorded clips first for AI situation prompts
    • gemini: use Gemini speech instead for those same fixed prompts
  • audio_prompts.fallback_to_gemini controls whether missing prompt clips fall back to Gemini text
  • dashboard prompt library manages upload/download/delete, text-to-record generation, speech mode, and fallback state
  • imported prerecorded prompts currently cover 18 situation keys; missing keys continue through Gemini fallback until recorded

AI replay behavior:

  • vision.autonomous_greeting_replay_enabled controls the short greeting gesture when intent stabilizes
  • vision.autonomous_greeting_replay_file selects the greeting replay file
  • vision.autonomous_capture_replay_enabled controls whether AI photo capture uses the active replay during the shot
  • replay.active_file in Data/Settings/config.json is the single persisted active-replay setting
  • the active replay is shared between manual R2+X, dashboard capture choreography, and AI capture when AI capture replay is enabled
  • replay inventory is shared across the full Data/G1 tree

Face recognition behavior:

  • vision.face_recognition_enabled enables single-guest recognition/enrollment in AI mode
  • vision.face_recognition_threshold controls the similarity threshold for matching a returning guest
  • new guests are enrolled into photos/people/
  • successful AI captures are linked back into the guest folder for future reference

Vision model configuration (Data/Settings/config.json -> vision):

  • detection_backend: normal or yolo (runtime switchable from dashboard in AI mode)
  • yolo_runtime: ultralytics (production) or opencv (legacy ONNX parser)
  • yolo_ultralytics_device: inference device for ultralytics (cpu, 0, 0,1, ...)
  • person_yolo_onnx: path to YOLO ONNX person model
  • face_yolo_onnx: path to YOLO ONNX face model
  • group_min_people: minimum people count to mark a group
  • group_link_distance_px: max centroid-link distance for group clustering

Documentation

  • Current_runtime.md: detailed current runtime behavior and script chain.

Data Layout

  • Data/Settings/
    • config.json
  • Data/Scripts/
    • photo_command_ai.txt
    • sanad_script.txt
  • Data/Runtime/
    • runtime health/state/error JSON files
  • Data/Audio/
    • prerecorded AI prompt WAV files
    • matching _raw.wav Gemini output captures
  • Data/Settings/audio_prompt_records.json
    • prompt recording metadata for files in Data/Audio/

Notes

  • config.py is intentionally removed; runtime config is JSON + env overrides.
  • legacy AI mover/autonomous prototype scripts were removed from the production tree.
  • Generated artifacts (__pycache__, runtime logs) should not be committed.
  • Generated runtime files such as Data/Runtime/runtime_health.json, Data/Runtime/autonomous_state.json, Data/Runtime/error_counters.json, Data/Runtime/error_events.jsonl, and Logs/*.log may be absent in a clean checkout until the runtime starts.
Description
No description provided
Readme
Languages
Python 80.1%
JavaScript 12.6%
Shell 3.8%
CSS 2.5%
HTML 1%