# AI Photographer Production-oriented robot photographer stack for Unitree G1. ## Quick Start 1. Set API key in config: - edit `Data/Settings/config.json` -> `gemini.api_key` 2. Run launcher: - `cd Scripts && ./photo_sanad.sh` Launcher behavior: - resets `mode.current_mode` to the configured `mode.default_mode` on each launch, - resolves the active PulseAudio speaker/microphone and exports them for the runtime, - starts the direct camera service (`Core/direct_camera_service.py` in the `teleimager` conda env) for the full runtime path, - starts Gemini runtime (`gemini` conda env), - runs `Gemini/voice_sanad.py` as the main process, - keeps `manual` as the default startup mode unless changed in `Data/Settings/config.json`, - keeps `Gemini/voice_sanad.py`, the dashboard server, the direct camera server, and replay/trigger services running in the full runtime path across mode switches, - arms autonomous services in the full runtime path so switching dashboard mode from `manual` to `ai` works without a restart. Optional startup profile: - `MANUAL_LEAN_RUNTIME=1` - manual-mode voice + dashboard only - skips direct camera, DDS, replay, uploader, and autonomous startup - capture/replay services are unavailable in that profile - this is an explicit reduced profile, not the normal production mode ## Project Layout - `Scripts/` - startup and ops shell entrypoints - `photo_sanad.sh`, `fix_realsense_usb.sh` - `direct_camera_samples_server.py` remains as a compatibility wrapper - `fix_realsense_usb.sh` supports `--check`, `--fix`, and `--serials` - `Data/` - categorized runtime assets and state - `Settings/config.json` - `Scripts/photo_command_ai.txt`, `Scripts/sanad_script.txt` - `Runtime/upload_db.json`, generated runtime JSON state files - `Settings/config.json -> camera.preferred_realsense_serial` selects the preferred default RealSense by serial - generated runtime JSON files are created on demand during execution - `Data/Audio/` - fixed prerecorded AI situation prompts (`*.wav`) - matching raw Gemini captures (`*_raw.wav`) - `Data/Settings/audio_prompt_records.json` - prompt recording metadata for files in `Data/Audio/` - `photos/people/` - AI face-recognition registry - each returning guest gets a folder with face/scene references, metadata, and captured-photo links - `photos/Captures/` - final saved runtime captures from dashboard, manual trigger, and AI capture flow - `photos/samples/` - standalone direct-camera sample captures - `Logs/` - one stable log file per runtime component - examples: `voice_sanad.log`, `gemini_voice.log`, `photo_server.log`, `direct_camera.log` - `Web/` - operator dashboard frontend (`gallery.html`, `gallery.js`, `style.css`) - direct camera service frontend (`direct_camera.html`, `direct_camera.js`, `direct_camera.css`) - `Data/G1/` - replay/gesture motion files (`*.jsonl`) - dashboard-recorded replay captures are saved directly here - `Core/` - shared runtime foundations (`settings.py`, `Logger.py`, `error_events.py`, `direct_camera_service.py`) - `direct_camera_service.py` handles camera backend/API and serves its UI from `Web/` - `Gemini/` - voice orchestration (`voice_sanad.py`, `gemini_voice.py`, `sanad_text_utils.py`) - `Server/` - dashboard/API/capture/upload (`photo_server.py`, `capture_service.py`, `direct_camera_client.py`, `uploader.py`) - `Modes/AI/` - autonomous vision/intent/session manager (`autonomous_manager.py`, `vision_detector.py`, `camera_module.py`) - `Modes/Manual/` - controller + replay + trigger loop (`controller.py`, `replay_engine.py`, `trigger_loop.py`) ## Runtime Modes Mode is persisted in `Data/Settings/config.json` under `mode.current_mode`: - `manual` - Gemini conversation stays available when `gemini.mic_enabled=true` - voice `request_photo / yes_photo / no_photo` disabled - `R2+X` replay/capture path stays available in the full runtime path - `Gemini/voice_sanad.py`, dashboard server, direct camera server, and replay services stay running - autonomous services can be armed in the background, but autonomous flow stays paused until mode becomes `ai` - `ai` - voice `request_photo / yes_photo / no_photo` enabled - `R2+X` replay/capture path still works - `Gemini/voice_sanad.py`, dashboard server, direct camera server, and replay services continue running without restart - autonomous flow runs live when `AUTONOMOUS_ENABLE=1` - on stable visual intent, AI identifies or enrolls a single guest, optionally greets with a short hand replay, asks for photo confirmation, guides guests into frame, then captures using the active replay during the shot when AI capture replay is enabled - returning guests are recognized from `photos/people/` and can be greeted as returning visitors Command-mode functionality was extracted from this project and moved to: - `G1_Lootah/AI_Command` ## Remote Controls - `R2+X` - replay + photographer talk + capture pipeline - `R2+L1` - global hard cancel safety combo - active in runtime loops to cancel pending capture/replay and reset active interaction Mode APIs: - `GET /api/mode` - `GET /api/set_mode?mode=manual|ai` - `GET /api/mode_policy` - `GET /api/mic` - `GET /api/set_mic?enabled=0|1` - `GET /api/detector_backend` - `GET /api/set_detector_backend?backend=normal|yolo` - `GET /api/ai_readiness` - `GET /api/ai_options` - `GET /api/set_ai_options?hard_target_lock_enabled=0|1&retake_prompt_enabled=0|1&autonomous_greeting_replay_enabled=0|1&autonomous_greeting_replay_file=...&autonomous_capture_replay_enabled=0|1&face_recognition_enabled=0|1&face_recognition_threshold=...` - `GET /api/autonomous_state` - `GET /api/runtime_health` ## Autonomous Flow Autonomous services are armed by environment: - `AUTONOMOUS_ENABLE=1` - allows `Modes/AI/autonomous_manager.py` to run inside `voice_sanad.py` - in `manual` mode it stays paused - core services still remain up in `manual` (`voice_sanad.py`, dashboard, direct camera server, replay/trigger) - switching dashboard mode to `ai` starts autonomous flow live without a restart - `AUTONOMOUS_ENABLE=0` - disables autonomous manager entirely - manual trigger loop + voice runtime still work Session state machine: - `IDLE -> WAIT_CONFIRM -> FRAMING -> COUNTDOWN -> RETAKE_CONFIRM (optional) -> COMPLETE -> IDLE` - strict readiness block state: `IDLE_BLOCKED` when required YOLO readiness is not met ## Dashboard / API Highlights - `GET /` gallery dashboard - `GET /preview.mjpg` live preview - preview is off by default and starts only when requested from the dashboard - preview camera/OpenCV is loaded lazily when preview is requested - Camera control APIs: - `GET /api/camera_health` - `GET /api/camera_sources` - `GET /api/set_camera_source?source=...` - `GET /api/set_camera_resolution?width=...&height=...&fps=...` - `GET /api/set_preferred_camera?serial=...` - dashboard can switch camera source, show active camera info, change resolution live, and save a preferred RealSense serial into `Data/Settings/config.json` - Audio prompt APIs: - `GET /api/audio_prompts` - `GET /api/set_audio_prompt_mode?mode=audio|gemini` - `GET /api/set_audio_prompt_fallback?enabled=0|1` - `GET /api/audio_prompt_record_status` - `GET /api/download_audio_prompt?key=...` - `GET /api/delete_audio_prompt?key=...` - `POST /api/upload_audio_prompt` - `POST /api/audio_prompt_record` - dashboard can upload, replace, download, delete, inspect, and record prerecorded AI prompt clips stored in `Data/Audio/` - dashboard can switch fixed AI situation speech between recorded audio and Gemini without restart - `GET /api/autonomous_state` runtime autonomous state panel data (lock/retake/health fields) - `GET /api/runtime_health` component health (WS/mic/speaker/gate/restarts) - `GET /api/mic` and `GET /api/set_mic` - microphone ON/OFF toggle for both modes - `GET /api/ai_readiness` strict AI readiness + block reason - `GET /api/ai_options` and `GET /api/set_ai_options` for hard lock/retake toggles - Replay APIs: - `GET /api/replays` - `GET /api/get_replay` - `GET /api/set_replay?name=...` - `GET /api/delete_replay?name=...` - `GET /api/rename_replay?old=...&new=...` - `GET /api/download_replay?name=...` - `GET /api/replay_record_status` - `GET /api/replay_record_start?name=...&seconds=...` - `GET /api/replay_test_status` - `GET /api/test_replay?name=...` - `POST /api/upload_replay` - dashboard can record new replays into `Data/G1`, replay-test them, rename them, download them, delete them, upload new `.jsonl` replays, and set the active replay - replay recording and replay test are allowed only while runtime mode is `manual` - People APIs: - `GET /api/people` - `GET /api/person_image?id=...&kind=face|scene` - `GET /api/download_person?id=...` - `GET /api/delete_person?id=...` - `GET /api/reset_people` - `POST /api/upload_person` - dashboard can upload face photos for future recognition, download a saved guest package, delete one guest, or reset the full registry - `GET /api/capture` capture via unified pipeline - `GET /api/photos`, `GET /api/sessions`, `GET /api/delete`, `GET /api/reupload` - `GET /api/errors` structured error counters ## Configuration Source of truth: - `Data/Settings/config.json` loaded by `Core/settings.py` Environment overrides are supported (timing, ports, upload settings, camera, etc.). Direct camera serial selection precedence: - `REALSENSE_SERIAL` - `PREFERRED_REALSENSE_SERIAL` - `Data/Settings/config.json -> camera.preferred_realsense_serial` - teleimager camera config serial - any other detected RealSense Dashboard camera behavior: - main production dashboard can switch between available camera sources without restarting the runtime - camera status panel shows requested source, active source, backend, active profile, preferred serial, and active RealSense serial - resolution changes are applied live through the direct camera service - `Save As Default` stores the preferred RealSense serial into `Data/Settings/config.json` AI prerecorded prompt behavior: - `Data/Audio/` stores fixed WAV clips by prompt key - `Data/Settings/audio_prompt_records.json` stores prompt recording metadata and raw-output file references - `audio_prompts.files` in `Data/Settings/config.json` maps each key to its filename - `audio_prompts.mode` controls fixed AI situation speech: - `audio`: use recorded clips first for AI situation prompts - `gemini`: use Gemini speech instead for those same fixed prompts - `audio_prompts.fallback_to_gemini` controls whether missing prompt clips fall back to Gemini text - dashboard prompt library manages upload/download/delete, text-to-record generation, speech mode, and fallback state - imported prerecorded prompts currently cover 18 situation keys; missing keys continue through Gemini fallback until recorded AI replay behavior: - `vision.autonomous_greeting_replay_enabled` controls the short greeting gesture when intent stabilizes - `vision.autonomous_greeting_replay_file` selects the greeting replay file - `vision.autonomous_capture_replay_enabled` controls whether AI photo capture uses the active replay during the shot - `replay.active_file` in `Data/Settings/config.json` is the single persisted active-replay setting - the active replay is shared between manual `R2+X`, dashboard capture choreography, and AI capture when AI capture replay is enabled - replay inventory is shared across the full `Data/G1` tree Face recognition behavior: - `vision.face_recognition_enabled` enables single-guest recognition/enrollment in AI mode - `vision.face_recognition_threshold` controls the similarity threshold for matching a returning guest - new guests are enrolled into `photos/people/` - successful AI captures are linked back into the guest folder for future reference Vision model configuration (`Data/Settings/config.json` -> `vision`): - `detection_backend`: `normal` or `yolo` (runtime switchable from dashboard in AI mode) - `yolo_runtime`: `ultralytics` (production) or `opencv` (legacy ONNX parser) - `yolo_ultralytics_device`: inference device for ultralytics (`cpu`, `0`, `0,1`, ...) - `person_yolo_onnx`: path to YOLO ONNX person model - `face_yolo_onnx`: path to YOLO ONNX face model - `group_min_people`: minimum people count to mark a group - `group_link_distance_px`: max centroid-link distance for group clustering ## Documentation - `Current_runtime.md`: detailed current runtime behavior and script chain. ## Data Layout - `Data/Settings/` - `config.json` - `Data/Scripts/` - `photo_command_ai.txt` - `sanad_script.txt` - `Data/Runtime/` - runtime health/state/error JSON files - `Data/Audio/` - prerecorded AI prompt WAV files - matching `_raw.wav` Gemini output captures - `Data/Settings/audio_prompt_records.json` - prompt recording metadata for files in `Data/Audio/` ## Notes - `config.py` is intentionally removed; runtime config is JSON + env overrides. - legacy AI mover/autonomous prototype scripts were removed from the production tree. - Generated artifacts (`__pycache__`, runtime logs) should not be committed. - Generated runtime files such as `Data/Runtime/runtime_health.json`, `Data/Runtime/autonomous_state.json`, `Data/Runtime/error_counters.json`, `Data/Runtime/error_events.jsonl`, and `Logs/*.log` may be absent in a clean checkout until the runtime starts.