kassam/AI_Photographer

Fork 0

kassam abf0fb5688 Initial project commit

2026-04-12 18:52:37 +04:00

13 KiB

Raw Blame History

AI Photographer

Production-oriented robot photographer stack for Unitree G1.

Quick Start

Set API key in config:
- edit Data/Settings/config.json -> gemini.api_key
Run launcher:
- cd Scripts && ./photo_sanad.sh

Launcher behavior:

resets mode.current_mode to the configured mode.default_mode on each launch,
resolves the active PulseAudio speaker/microphone and exports them for the runtime,
starts the direct camera service (Core/direct_camera_service.py in the teleimager conda env) for the full runtime path,
starts Gemini runtime (gemini conda env),
runs Gemini/voice_sanad.py as the main process,
keeps manual as the default startup mode unless changed in Data/Settings/config.json,
keeps Gemini/voice_sanad.py, the dashboard server, the direct camera server, and replay/trigger services running in the full runtime path across mode switches,
arms autonomous services in the full runtime path so switching dashboard mode from manual to ai works without a restart.

Optional startup profile:

MANUAL_LEAN_RUNTIME=1
- manual-mode voice + dashboard only
- skips direct camera, DDS, replay, uploader, and autonomous startup
- capture/replay services are unavailable in that profile
- this is an explicit reduced profile, not the normal production mode

Project Layout

Scripts/
- startup and ops shell entrypoints
- photo_sanad.sh, fix_realsense_usb.sh
- direct_camera_samples_server.py remains as a compatibility wrapper
- fix_realsense_usb.sh supports --check, --fix, and --serials
Data/
- categorized runtime assets and state
- Settings/config.json
- Scripts/photo_command_ai.txt, Scripts/sanad_script.txt
- Runtime/upload_db.json, generated runtime JSON state files
- Settings/config.json -> camera.preferred_realsense_serial selects the preferred default RealSense by serial
- generated runtime JSON files are created on demand during execution
Data/Audio/
- fixed prerecorded AI situation prompts (*.wav)
- matching raw Gemini captures (*_raw.wav)
Data/Settings/audio_prompt_records.json
- prompt recording metadata for files in Data/Audio/
photos/people/
- AI face-recognition registry
- each returning guest gets a folder with face/scene references, metadata, and captured-photo links
photos/Captures/
- final saved runtime captures from dashboard, manual trigger, and AI capture flow
photos/samples/
- standalone direct-camera sample captures
Logs/
- one stable log file per runtime component
- examples: voice_sanad.log, gemini_voice.log, photo_server.log, direct_camera.log
Web/
- operator dashboard frontend (gallery.html, gallery.js, style.css)
- direct camera service frontend (direct_camera.html, direct_camera.js, direct_camera.css)
Data/G1/
- replay/gesture motion files (*.jsonl)
- dashboard-recorded replay captures are saved directly here
Core/
- shared runtime foundations (settings.py, Logger.py, error_events.py, direct_camera_service.py)
- direct_camera_service.py handles camera backend/API and serves its UI from Web/
Gemini/
- voice orchestration (voice_sanad.py, gemini_voice.py, sanad_text_utils.py)
Server/
- dashboard/API/capture/upload (photo_server.py, capture_service.py, direct_camera_client.py, uploader.py)
Modes/AI/
- autonomous vision/intent/session manager (autonomous_manager.py, vision_detector.py, camera_module.py)
Modes/Manual/
- controller + replay + trigger loop (controller.py, replay_engine.py, trigger_loop.py)

Runtime Modes

Mode is persisted in Data/Settings/config.json under mode.current_mode:

manual
- Gemini conversation stays available when gemini.mic_enabled=true
- voice request_photo / yes_photo / no_photo disabled
- R2+X replay/capture path stays available in the full runtime path
- Gemini/voice_sanad.py, dashboard server, direct camera server, and replay services stay running
- autonomous services can be armed in the background, but autonomous flow stays paused until mode becomes ai
ai
- voice request_photo / yes_photo / no_photo enabled
- R2+X replay/capture path still works
- Gemini/voice_sanad.py, dashboard server, direct camera server, and replay services continue running without restart
- autonomous flow runs live when AUTONOMOUS_ENABLE=1
- on stable visual intent, AI identifies or enrolls a single guest, optionally greets with a short hand replay, asks for photo confirmation, guides guests into frame, then captures using the active replay during the shot when AI capture replay is enabled
- returning guests are recognized from photos/people/ and can be greeted as returning visitors

Command-mode functionality was extracted from this project and moved to:

G1_Lootah/AI_Command

Remote Controls

R2+X
- replay + photographer talk + capture pipeline
R2+L1
- global hard cancel safety combo
- active in runtime loops to cancel pending capture/replay and reset active interaction

Mode APIs:

GET /api/mode
GET /api/set_mode?mode=manual|ai
GET /api/mode_policy
GET /api/mic
GET /api/set_mic?enabled=0|1
GET /api/detector_backend
GET /api/set_detector_backend?backend=normal|yolo
GET /api/ai_readiness
GET /api/ai_options
GET /api/set_ai_options?hard_target_lock_enabled=0|1&retake_prompt_enabled=0|1&autonomous_greeting_replay_enabled=0|1&autonomous_greeting_replay_file=...&autonomous_capture_replay_enabled=0|1&face_recognition_enabled=0|1&face_recognition_threshold=...
GET /api/autonomous_state
GET /api/runtime_health

Autonomous Flow

Autonomous services are armed by environment:

AUTONOMOUS_ENABLE=1
- allows Modes/AI/autonomous_manager.py to run inside voice_sanad.py
- in manual mode it stays paused
- core services still remain up in manual (voice_sanad.py, dashboard, direct camera server, replay/trigger)
- switching dashboard mode to ai starts autonomous flow live without a restart
AUTONOMOUS_ENABLE=0
- disables autonomous manager entirely
- manual trigger loop + voice runtime still work

Session state machine:

IDLE -> WAIT_CONFIRM -> FRAMING -> COUNTDOWN -> RETAKE_CONFIRM (optional) -> COMPLETE -> IDLE
strict readiness block state: IDLE_BLOCKED when required YOLO readiness is not met

Dashboard / API Highlights

GET / gallery dashboard
GET /preview.mjpg live preview
- preview is off by default and starts only when requested from the dashboard
- preview camera/OpenCV is loaded lazily when preview is requested
Camera control APIs:
- GET /api/camera_health
- GET /api/camera_sources
- GET /api/set_camera_source?source=...
- GET /api/set_camera_resolution?width=...&height=...&fps=...
- GET /api/set_preferred_camera?serial=...
- dashboard can switch camera source, show active camera info, change resolution live, and save a preferred RealSense serial into Data/Settings/config.json
Audio prompt APIs:
- GET /api/audio_prompts
- GET /api/set_audio_prompt_mode?mode=audio|gemini
- GET /api/set_audio_prompt_fallback?enabled=0|1
- GET /api/audio_prompt_record_status
- GET /api/download_audio_prompt?key=...
- GET /api/delete_audio_prompt?key=...
- POST /api/upload_audio_prompt
- POST /api/audio_prompt_record
- dashboard can upload, replace, download, delete, inspect, and record prerecorded AI prompt clips stored in Data/Audio/
- dashboard can switch fixed AI situation speech between recorded audio and Gemini without restart
GET /api/autonomous_state runtime autonomous state panel data (lock/retake/health fields)
GET /api/runtime_health component health (WS/mic/speaker/gate/restarts)
GET /api/mic and GET /api/set_mic
- microphone ON/OFF toggle for both modes
GET /api/ai_readiness strict AI readiness + block reason
GET /api/ai_options and GET /api/set_ai_options for hard lock/retake toggles
Replay APIs:
- GET /api/replays
- GET /api/get_replay
- GET /api/set_replay?name=...
- GET /api/delete_replay?name=...
- GET /api/rename_replay?old=...&new=...
- GET /api/download_replay?name=...
- GET /api/replay_record_status
- GET /api/replay_record_start?name=...&seconds=...
- GET /api/replay_test_status
- GET /api/test_replay?name=...
- POST /api/upload_replay
- dashboard can record new replays into Data/G1, replay-test them, rename them, download them, delete them, upload new .jsonl replays, and set the active replay
- replay recording and replay test are allowed only while runtime mode is manual
People APIs:
- GET /api/people
- GET /api/person_image?id=...&kind=face|scene
- GET /api/download_person?id=...
- GET /api/delete_person?id=...
- GET /api/reset_people
- POST /api/upload_person
- dashboard can upload face photos for future recognition, download a saved guest package, delete one guest, or reset the full registry
GET /api/capture capture via unified pipeline
GET /api/photos, GET /api/sessions, GET /api/delete, GET /api/reupload
GET /api/errors structured error counters

Configuration

Source of truth:

Data/Settings/config.json loaded by Core/settings.py

Environment overrides are supported (timing, ports, upload settings, camera, etc.).

Direct camera serial selection precedence:

REALSENSE_SERIAL
PREFERRED_REALSENSE_SERIAL
Data/Settings/config.json -> camera.preferred_realsense_serial
teleimager camera config serial
any other detected RealSense

Dashboard camera behavior:

main production dashboard can switch between available camera sources without restarting the runtime
camera status panel shows requested source, active source, backend, active profile, preferred serial, and active RealSense serial
resolution changes are applied live through the direct camera service
Save As Default stores the preferred RealSense serial into Data/Settings/config.json

AI prerecorded prompt behavior:

Data/Audio/ stores fixed WAV clips by prompt key
Data/Settings/audio_prompt_records.json stores prompt recording metadata and raw-output file references
audio_prompts.files in Data/Settings/config.json maps each key to its filename
audio_prompts.mode controls fixed AI situation speech:
- audio: use recorded clips first for AI situation prompts
- gemini: use Gemini speech instead for those same fixed prompts
audio_prompts.fallback_to_gemini controls whether missing prompt clips fall back to Gemini text
dashboard prompt library manages upload/download/delete, text-to-record generation, speech mode, and fallback state
imported prerecorded prompts currently cover 18 situation keys; missing keys continue through Gemini fallback until recorded

AI replay behavior:

vision.autonomous_greeting_replay_enabled controls the short greeting gesture when intent stabilizes
vision.autonomous_greeting_replay_file selects the greeting replay file
vision.autonomous_capture_replay_enabled controls whether AI photo capture uses the active replay during the shot
replay.active_file in Data/Settings/config.json is the single persisted active-replay setting
the active replay is shared between manual R2+X, dashboard capture choreography, and AI capture when AI capture replay is enabled
replay inventory is shared across the full Data/G1 tree

Face recognition behavior:

vision.face_recognition_enabled enables single-guest recognition/enrollment in AI mode
vision.face_recognition_threshold controls the similarity threshold for matching a returning guest
new guests are enrolled into photos/people/
successful AI captures are linked back into the guest folder for future reference

Vision model configuration (Data/Settings/config.json -> vision):

detection_backend: normal or yolo (runtime switchable from dashboard in AI mode)
yolo_runtime: ultralytics (production) or opencv (legacy ONNX parser)
yolo_ultralytics_device: inference device for ultralytics (cpu, 0, 0,1, ...)
person_yolo_onnx: path to YOLO ONNX person model
face_yolo_onnx: path to YOLO ONNX face model
group_min_people: minimum people count to mark a group
group_link_distance_px: max centroid-link distance for group clustering

Documentation

Current_runtime.md: detailed current runtime behavior and script chain.

Data Layout

Data/Settings/
- config.json
Data/Scripts/
- photo_command_ai.txt
- sanad_script.txt
Data/Runtime/
- runtime health/state/error JSON files
Data/Audio/
- prerecorded AI prompt WAV files
- matching _raw.wav Gemini output captures
Data/Settings/audio_prompt_records.json
- prompt recording metadata for files in Data/Audio/

Notes

config.py is intentionally removed; runtime config is JSON + env overrides.
legacy AI mover/autonomous prototype scripts were removed from the production tree.
Generated artifacts (__pycache__, runtime logs) should not be committed.
Generated runtime files such as Data/Runtime/runtime_health.json, Data/Runtime/autonomous_state.json, Data/Runtime/error_counters.json, Data/Runtime/error_events.jsonl, and Logs/*.log may be absent in a clean checkout until the runtime starts.

13 KiB Raw Blame History