13 KiB
13 KiB
AI Photographer
Production-oriented robot photographer stack for Unitree G1.
Quick Start
- Set API key in config:
- edit
Data/Settings/config.json->gemini.api_key
- edit
- Run launcher:
cd Scripts && ./photo_sanad.sh
Launcher behavior:
- resets
mode.current_modeto the configuredmode.default_modeon each launch, - resolves the active PulseAudio speaker/microphone and exports them for the runtime,
- starts the direct camera service (
Core/direct_camera_service.pyin theteleimagerconda env) for the full runtime path, - starts Gemini runtime (
geminiconda env), - runs
Gemini/voice_sanad.pyas the main process, - keeps
manualas the default startup mode unless changed inData/Settings/config.json, - keeps
Gemini/voice_sanad.py, the dashboard server, the direct camera server, and replay/trigger services running in the full runtime path across mode switches, - arms autonomous services in the full runtime path so switching dashboard mode from
manualtoaiworks without a restart.
Optional startup profile:
MANUAL_LEAN_RUNTIME=1- manual-mode voice + dashboard only
- skips direct camera, DDS, replay, uploader, and autonomous startup
- capture/replay services are unavailable in that profile
- this is an explicit reduced profile, not the normal production mode
Project Layout
Scripts/- startup and ops shell entrypoints
photo_sanad.sh,fix_realsense_usb.shdirect_camera_samples_server.pyremains as a compatibility wrapperfix_realsense_usb.shsupports--check,--fix, and--serials
Data/- categorized runtime assets and state
Settings/config.jsonScripts/photo_command_ai.txt,Scripts/sanad_script.txtRuntime/upload_db.json, generated runtime JSON state filesSettings/config.json -> camera.preferred_realsense_serialselects the preferred default RealSense by serial- generated runtime JSON files are created on demand during execution
Data/Audio/- fixed prerecorded AI situation prompts (
*.wav) - matching raw Gemini captures (
*_raw.wav)
- fixed prerecorded AI situation prompts (
Data/Settings/audio_prompt_records.json- prompt recording metadata for files in
Data/Audio/
- prompt recording metadata for files in
photos/people/- AI face-recognition registry
- each returning guest gets a folder with face/scene references, metadata, and captured-photo links
photos/Captures/- final saved runtime captures from dashboard, manual trigger, and AI capture flow
photos/samples/- standalone direct-camera sample captures
Logs/- one stable log file per runtime component
- examples:
voice_sanad.log,gemini_voice.log,photo_server.log,direct_camera.log
Web/- operator dashboard frontend (
gallery.html,gallery.js,style.css) - direct camera service frontend (
direct_camera.html,direct_camera.js,direct_camera.css)
- operator dashboard frontend (
Data/G1/- replay/gesture motion files (
*.jsonl) - dashboard-recorded replay captures are saved directly here
- replay/gesture motion files (
Core/- shared runtime foundations (
settings.py,Logger.py,error_events.py,direct_camera_service.py) direct_camera_service.pyhandles camera backend/API and serves its UI fromWeb/
- shared runtime foundations (
Gemini/- voice orchestration (
voice_sanad.py,gemini_voice.py,sanad_text_utils.py)
- voice orchestration (
Server/- dashboard/API/capture/upload (
photo_server.py,capture_service.py,direct_camera_client.py,uploader.py)
- dashboard/API/capture/upload (
Modes/AI/- autonomous vision/intent/session manager (
autonomous_manager.py,vision_detector.py,camera_module.py)
- autonomous vision/intent/session manager (
Modes/Manual/- controller + replay + trigger loop (
controller.py,replay_engine.py,trigger_loop.py)
- controller + replay + trigger loop (
Runtime Modes
Mode is persisted in Data/Settings/config.json under mode.current_mode:
manual- Gemini conversation stays available when
gemini.mic_enabled=true - voice
request_photo / yes_photo / no_photodisabled R2+Xreplay/capture path stays available in the full runtime pathGemini/voice_sanad.py, dashboard server, direct camera server, and replay services stay running- autonomous services can be armed in the background, but autonomous flow stays paused until mode becomes
ai
- Gemini conversation stays available when
ai- voice
request_photo / yes_photo / no_photoenabled R2+Xreplay/capture path still worksGemini/voice_sanad.py, dashboard server, direct camera server, and replay services continue running without restart- autonomous flow runs live when
AUTONOMOUS_ENABLE=1 - on stable visual intent, AI identifies or enrolls a single guest, optionally greets with a short hand replay, asks for photo confirmation, guides guests into frame, then captures using the active replay during the shot when AI capture replay is enabled
- returning guests are recognized from
photos/people/and can be greeted as returning visitors
- voice
Command-mode functionality was extracted from this project and moved to:
G1_Lootah/AI_Command
Remote Controls
R2+X- replay + photographer talk + capture pipeline
R2+L1- global hard cancel safety combo
- active in runtime loops to cancel pending capture/replay and reset active interaction
Mode APIs:
GET /api/modeGET /api/set_mode?mode=manual|aiGET /api/mode_policyGET /api/micGET /api/set_mic?enabled=0|1GET /api/detector_backendGET /api/set_detector_backend?backend=normal|yoloGET /api/ai_readinessGET /api/ai_optionsGET /api/set_ai_options?hard_target_lock_enabled=0|1&retake_prompt_enabled=0|1&autonomous_greeting_replay_enabled=0|1&autonomous_greeting_replay_file=...&autonomous_capture_replay_enabled=0|1&face_recognition_enabled=0|1&face_recognition_threshold=...GET /api/autonomous_stateGET /api/runtime_health
Autonomous Flow
Autonomous services are armed by environment:
AUTONOMOUS_ENABLE=1- allows
Modes/AI/autonomous_manager.pyto run insidevoice_sanad.py - in
manualmode it stays paused - core services still remain up in
manual(voice_sanad.py, dashboard, direct camera server, replay/trigger) - switching dashboard mode to
aistarts autonomous flow live without a restart
- allows
AUTONOMOUS_ENABLE=0- disables autonomous manager entirely
- manual trigger loop + voice runtime still work
Session state machine:
IDLE -> WAIT_CONFIRM -> FRAMING -> COUNTDOWN -> RETAKE_CONFIRM (optional) -> COMPLETE -> IDLE- strict readiness block state:
IDLE_BLOCKEDwhen required YOLO readiness is not met
Dashboard / API Highlights
GET /gallery dashboardGET /preview.mjpglive preview- preview is off by default and starts only when requested from the dashboard
- preview camera/OpenCV is loaded lazily when preview is requested
- Camera control APIs:
GET /api/camera_healthGET /api/camera_sourcesGET /api/set_camera_source?source=...GET /api/set_camera_resolution?width=...&height=...&fps=...GET /api/set_preferred_camera?serial=...- dashboard can switch camera source, show active camera info, change resolution live, and save a preferred RealSense serial into
Data/Settings/config.json
- Audio prompt APIs:
GET /api/audio_promptsGET /api/set_audio_prompt_mode?mode=audio|geminiGET /api/set_audio_prompt_fallback?enabled=0|1GET /api/audio_prompt_record_statusGET /api/download_audio_prompt?key=...GET /api/delete_audio_prompt?key=...POST /api/upload_audio_promptPOST /api/audio_prompt_record- dashboard can upload, replace, download, delete, inspect, and record prerecorded AI prompt clips stored in
Data/Audio/ - dashboard can switch fixed AI situation speech between recorded audio and Gemini without restart
GET /api/autonomous_stateruntime autonomous state panel data (lock/retake/health fields)GET /api/runtime_healthcomponent health (WS/mic/speaker/gate/restarts)GET /api/micandGET /api/set_mic- microphone ON/OFF toggle for both modes
GET /api/ai_readinessstrict AI readiness + block reasonGET /api/ai_optionsandGET /api/set_ai_optionsfor hard lock/retake toggles- Replay APIs:
GET /api/replaysGET /api/get_replayGET /api/set_replay?name=...GET /api/delete_replay?name=...GET /api/rename_replay?old=...&new=...GET /api/download_replay?name=...GET /api/replay_record_statusGET /api/replay_record_start?name=...&seconds=...GET /api/replay_test_statusGET /api/test_replay?name=...POST /api/upload_replay- dashboard can record new replays into
Data/G1, replay-test them, rename them, download them, delete them, upload new.jsonlreplays, and set the active replay - replay recording and replay test are allowed only while runtime mode is
manual
- People APIs:
GET /api/peopleGET /api/person_image?id=...&kind=face|sceneGET /api/download_person?id=...GET /api/delete_person?id=...GET /api/reset_peoplePOST /api/upload_person- dashboard can upload face photos for future recognition, download a saved guest package, delete one guest, or reset the full registry
GET /api/capturecapture via unified pipelineGET /api/photos,GET /api/sessions,GET /api/delete,GET /api/reuploadGET /api/errorsstructured error counters
Configuration
Source of truth:
Data/Settings/config.jsonloaded byCore/settings.py
Environment overrides are supported (timing, ports, upload settings, camera, etc.).
Direct camera serial selection precedence:
REALSENSE_SERIALPREFERRED_REALSENSE_SERIALData/Settings/config.json -> camera.preferred_realsense_serial- teleimager camera config serial
- any other detected RealSense
Dashboard camera behavior:
- main production dashboard can switch between available camera sources without restarting the runtime
- camera status panel shows requested source, active source, backend, active profile, preferred serial, and active RealSense serial
- resolution changes are applied live through the direct camera service
Save As Defaultstores the preferred RealSense serial intoData/Settings/config.json
AI prerecorded prompt behavior:
Data/Audio/stores fixed WAV clips by prompt keyData/Settings/audio_prompt_records.jsonstores prompt recording metadata and raw-output file referencesaudio_prompts.filesinData/Settings/config.jsonmaps each key to its filenameaudio_prompts.modecontrols fixed AI situation speech:audio: use recorded clips first for AI situation promptsgemini: use Gemini speech instead for those same fixed prompts
audio_prompts.fallback_to_geminicontrols whether missing prompt clips fall back to Gemini text- dashboard prompt library manages upload/download/delete, text-to-record generation, speech mode, and fallback state
- imported prerecorded prompts currently cover 18 situation keys; missing keys continue through Gemini fallback until recorded
AI replay behavior:
vision.autonomous_greeting_replay_enabledcontrols the short greeting gesture when intent stabilizesvision.autonomous_greeting_replay_fileselects the greeting replay filevision.autonomous_capture_replay_enabledcontrols whether AI photo capture uses the active replay during the shotreplay.active_fileinData/Settings/config.jsonis the single persisted active-replay setting- the active replay is shared between manual
R2+X, dashboard capture choreography, and AI capture when AI capture replay is enabled - replay inventory is shared across the full
Data/G1tree
Face recognition behavior:
vision.face_recognition_enabledenables single-guest recognition/enrollment in AI modevision.face_recognition_thresholdcontrols the similarity threshold for matching a returning guest- new guests are enrolled into
photos/people/ - successful AI captures are linked back into the guest folder for future reference
Vision model configuration (Data/Settings/config.json -> vision):
detection_backend:normaloryolo(runtime switchable from dashboard in AI mode)yolo_runtime:ultralytics(production) oropencv(legacy ONNX parser)yolo_ultralytics_device: inference device for ultralytics (cpu,0,0,1, ...)person_yolo_onnx: path to YOLO ONNX person modelface_yolo_onnx: path to YOLO ONNX face modelgroup_min_people: minimum people count to mark a groupgroup_link_distance_px: max centroid-link distance for group clustering
Documentation
Current_runtime.md: detailed current runtime behavior and script chain.
Data Layout
Data/Settings/config.json
Data/Scripts/photo_command_ai.txtsanad_script.txt
Data/Runtime/- runtime health/state/error JSON files
Data/Audio/- prerecorded AI prompt WAV files
- matching
_raw.wavGemini output captures
Data/Settings/audio_prompt_records.json- prompt recording metadata for files in
Data/Audio/
- prompt recording metadata for files in
Notes
config.pyis intentionally removed; runtime config is JSON + env overrides.- legacy AI mover/autonomous prototype scripts were removed from the production tree.
- Generated artifacts (
__pycache__, runtime logs) should not be committed. - Generated runtime files such as
Data/Runtime/runtime_health.json,Data/Runtime/autonomous_state.json,Data/Runtime/error_counters.json,Data/Runtime/error_events.jsonl, andLogs/*.logmay be absent in a clean checkout until the runtime starts.