AI_Photographer/README.md

# AI Photographer

Production-oriented robot photographer stack for Unitree G1.

## Quick Start

1. Set API key in config:
   - edit `Data/Settings/config.json` -> `gemini.api_key`
2. Run launcher:
   - `cd Scripts && ./photo_sanad.sh`

Launcher behavior:
- resets `mode.current_mode` to the configured `mode.default_mode` on each launch,
- resolves the active PulseAudio speaker/microphone and exports them for the runtime,
- starts the direct camera service (`Core/direct_camera_service.py` in the `teleimager` conda env) for the full runtime path,
- starts Gemini runtime (`gemini` conda env),
- runs `Gemini/voice_sanad.py` as the main process,
- keeps `manual` as the default startup mode unless changed in `Data/Settings/config.json`,
- keeps `Gemini/voice_sanad.py`, the dashboard server, the direct camera server, and replay/trigger services running in the full runtime path across mode switches,
- arms autonomous services in the full runtime path so switching dashboard mode from `manual` to `ai` works without a restart.

Optional startup profile:
- `MANUAL_LEAN_RUNTIME=1`
  - manual-mode voice + dashboard only
  - skips direct camera, DDS, replay, uploader, and autonomous startup
  - capture/replay services are unavailable in that profile
  - this is an explicit reduced profile, not the normal production mode

## Project Layout

- `Scripts/`
  - startup and ops shell entrypoints
  - `photo_sanad.sh`, `fix_realsense_usb.sh`
  - `direct_camera_samples_server.py` remains as a compatibility wrapper
  - `fix_realsense_usb.sh` supports `--check`, `--fix`, and `--serials`
- `Data/`
  - categorized runtime assets and state
  - `Settings/config.json`
  - `Scripts/photo_command_ai.txt`, `Scripts/sanad_script.txt`
  - `Runtime/upload_db.json`, generated runtime JSON state files
  - `Settings/config.json -> camera.preferred_realsense_serial` selects the preferred default RealSense by serial
  - generated runtime JSON files are created on demand during execution
- `Data/Audio/`
  - fixed prerecorded AI situation prompts (`*.wav`)
  - matching raw Gemini captures (`*_raw.wav`)
- `Data/Settings/audio_prompt_records.json`
  - prompt recording metadata for files in `Data/Audio/`
- `photos/people/`
  - AI face-recognition registry
  - each returning guest gets a folder with face/scene references, metadata, and captured-photo links
- `photos/Captures/`
  - final saved runtime captures from dashboard, manual trigger, and AI capture flow
- `photos/samples/`
  - standalone direct-camera sample captures
- `Logs/`
  - one stable log file per runtime component
  - examples: `voice_sanad.log`, `gemini_voice.log`, `photo_server.log`, `direct_camera.log`
- `Web/`
  - operator dashboard frontend (`gallery.html`, `gallery.js`, `style.css`)
  - direct camera service frontend (`direct_camera.html`, `direct_camera.js`, `direct_camera.css`)
- `Data/G1/`
  - replay/gesture motion files (`*.jsonl`)
  - dashboard-recorded replay captures are saved directly here
- `Core/`
  - shared runtime foundations (`settings.py`, `Logger.py`, `error_events.py`, `direct_camera_service.py`)
  - `direct_camera_service.py` handles camera backend/API and serves its UI from `Web/`
- `Gemini/`
  - voice orchestration (`voice_sanad.py`, `gemini_voice.py`, `sanad_text_utils.py`)
- `Server/`
  - dashboard/API/capture/upload (`photo_server.py`, `capture_service.py`, `direct_camera_client.py`, `uploader.py`)
- `Modes/AI/`
  - autonomous vision/intent/session manager (`autonomous_manager.py`, `vision_detector.py`, `camera_module.py`)
- `Modes/Manual/`
  - controller + replay + trigger loop (`controller.py`, `replay_engine.py`, `trigger_loop.py`)

## Runtime Modes

Mode is persisted in `Data/Settings/config.json` under `mode.current_mode`:
- `manual`
  - Gemini conversation stays available when `gemini.mic_enabled=true`
  - voice `request_photo / yes_photo / no_photo` disabled
  - `R2+X` replay/capture path stays available in the full runtime path
  - `Gemini/voice_sanad.py`, dashboard server, direct camera server, and replay services stay running
  - autonomous services can be armed in the background, but autonomous flow stays paused until mode becomes `ai`
- `ai`
  - voice `request_photo / yes_photo / no_photo` enabled
  - `R2+X` replay/capture path still works
  - `Gemini/voice_sanad.py`, dashboard server, direct camera server, and replay services continue running without restart
  - autonomous flow runs live when `AUTONOMOUS_ENABLE=1`
  - on stable visual intent, AI identifies or enrolls a single guest, optionally greets with a short hand replay, asks for photo confirmation, guides guests into frame, then captures using the active replay during the shot when AI capture replay is enabled
  - returning guests are recognized from `photos/people/` and can be greeted as returning visitors

Command-mode functionality was extracted from this project and moved to:
- `G1_Lootah/AI_Command`

## Remote Controls

- `R2+X`
  - replay + photographer talk + capture pipeline
- `R2+L1`
  - global hard cancel safety combo
  - active in runtime loops to cancel pending capture/replay and reset active interaction

Mode APIs:
- `GET /api/mode`
- `GET /api/set_mode?mode=manual|ai`
- `GET /api/mode_policy`
- `GET /api/mic`
- `GET /api/set_mic?enabled=0|1`
- `GET /api/detector_backend`
- `GET /api/set_detector_backend?backend=normal|yolo`
- `GET /api/ai_readiness`
- `GET /api/ai_options`
- `GET /api/set_ai_options?hard_target_lock_enabled=0|1&retake_prompt_enabled=0|1&autonomous_greeting_replay_enabled=0|1&autonomous_greeting_replay_file=...&autonomous_capture_replay_enabled=0|1&face_recognition_enabled=0|1&face_recognition_threshold=...`
- `GET /api/autonomous_state`
- `GET /api/runtime_health`

## Autonomous Flow

Autonomous services are armed by environment:
- `AUTONOMOUS_ENABLE=1`
  - allows `Modes/AI/autonomous_manager.py` to run inside `voice_sanad.py`
  - in `manual` mode it stays paused
  - core services still remain up in `manual` (`voice_sanad.py`, dashboard, direct camera server, replay/trigger)
  - switching dashboard mode to `ai` starts autonomous flow live without a restart
- `AUTONOMOUS_ENABLE=0`
  - disables autonomous manager entirely
  - manual trigger loop + voice runtime still work

Session state machine:
- `IDLE -> WAIT_CONFIRM -> FRAMING -> COUNTDOWN -> RETAKE_CONFIRM (optional) -> COMPLETE -> IDLE`
- strict readiness block state: `IDLE_BLOCKED` when required YOLO readiness is not met

## Dashboard / API Highlights

- `GET /` gallery dashboard
- `GET /preview.mjpg` live preview
  - preview is off by default and starts only when requested from the dashboard
  - preview camera/OpenCV is loaded lazily when preview is requested
- Camera control APIs:
  - `GET /api/camera_health`
  - `GET /api/camera_sources`
  - `GET /api/set_camera_source?source=...`
  - `GET /api/set_camera_resolution?width=...&height=...&fps=...`
  - `GET /api/set_preferred_camera?serial=...`
  - dashboard can switch camera source, show active camera info, change resolution live, and save a preferred RealSense serial into `Data/Settings/config.json`
- Audio prompt APIs:
  - `GET /api/audio_prompts`
  - `GET /api/set_audio_prompt_mode?mode=audio|gemini`
  - `GET /api/set_audio_prompt_fallback?enabled=0|1`
  - `GET /api/audio_prompt_record_status`
  - `GET /api/download_audio_prompt?key=...`
  - `GET /api/delete_audio_prompt?key=...`
  - `POST /api/upload_audio_prompt`
  - `POST /api/audio_prompt_record`
  - dashboard can upload, replace, download, delete, inspect, and record prerecorded AI prompt clips stored in `Data/Audio/`
  - dashboard can switch fixed AI situation speech between recorded audio and Gemini without restart
- `GET /api/autonomous_state` runtime autonomous state panel data (lock/retake/health fields)
- `GET /api/runtime_health` component health (WS/mic/speaker/gate/restarts)
- `GET /api/mic` and `GET /api/set_mic`
  - microphone ON/OFF toggle for both modes
- `GET /api/ai_readiness` strict AI readiness + block reason
- `GET /api/ai_options` and `GET /api/set_ai_options` for hard lock/retake toggles
- Replay APIs:
  - `GET /api/replays`
  - `GET /api/get_replay`
  - `GET /api/set_replay?name=...`
  - `GET /api/delete_replay?name=...`
  - `GET /api/rename_replay?old=...&new=...`
  - `GET /api/download_replay?name=...`
  - `GET /api/replay_record_status`
  - `GET /api/replay_record_start?name=...&seconds=...`
  - `GET /api/replay_test_status`
  - `GET /api/test_replay?name=...`
  - `POST /api/upload_replay`
  - dashboard can record new replays into `Data/G1`, replay-test them, rename them, download them, delete them, upload new `.jsonl` replays, and set the active replay
  - replay recording and replay test are allowed only while runtime mode is `manual`
- People APIs:
  - `GET /api/people`
  - `GET /api/person_image?id=...&kind=face|scene`
  - `GET /api/download_person?id=...`
  - `GET /api/delete_person?id=...`
  - `GET /api/reset_people`
  - `POST /api/upload_person`
  - dashboard can upload face photos for future recognition, download a saved guest package, delete one guest, or reset the full registry
- `GET /api/capture` capture via unified pipeline
- `GET /api/photos`, `GET /api/sessions`, `GET /api/delete`, `GET /api/reupload`
- `GET /api/errors` structured error counters

## Configuration

Source of truth:
- `Data/Settings/config.json` loaded by `Core/settings.py`

Environment overrides are supported (timing, ports, upload settings, camera, etc.).

Direct camera serial selection precedence:
- `REALSENSE_SERIAL`
- `PREFERRED_REALSENSE_SERIAL`
- `Data/Settings/config.json -> camera.preferred_realsense_serial`
- teleimager camera config serial
- any other detected RealSense

Dashboard camera behavior:
- main production dashboard can switch between available camera sources without restarting the runtime
- camera status panel shows requested source, active source, backend, active profile, preferred serial, and active RealSense serial
- resolution changes are applied live through the direct camera service
- `Save As Default` stores the preferred RealSense serial into `Data/Settings/config.json`

AI prerecorded prompt behavior:
- `Data/Audio/` stores fixed WAV clips by prompt key
- `Data/Settings/audio_prompt_records.json` stores prompt recording metadata and raw-output file references
- `audio_prompts.files` in `Data/Settings/config.json` maps each key to its filename
- `audio_prompts.mode` controls fixed AI situation speech:
  - `audio`: use recorded clips first for AI situation prompts
  - `gemini`: use Gemini speech instead for those same fixed prompts
- `audio_prompts.fallback_to_gemini` controls whether missing prompt clips fall back to Gemini text
- dashboard prompt library manages upload/download/delete, text-to-record generation, speech mode, and fallback state
- imported prerecorded prompts currently cover 18 situation keys; missing keys continue through Gemini fallback until recorded

AI replay behavior:
- `vision.autonomous_greeting_replay_enabled` controls the short greeting gesture when intent stabilizes
- `vision.autonomous_greeting_replay_file` selects the greeting replay file
- `vision.autonomous_capture_replay_enabled` controls whether AI photo capture uses the active replay during the shot
- `replay.active_file` in `Data/Settings/config.json` is the single persisted active-replay setting
- the active replay is shared between manual `R2+X`, dashboard capture choreography, and AI capture when AI capture replay is enabled
- replay inventory is shared across the full `Data/G1` tree

Face recognition behavior:
- `vision.face_recognition_enabled` enables single-guest recognition/enrollment in AI mode
- `vision.face_recognition_threshold` controls the similarity threshold for matching a returning guest
- new guests are enrolled into `photos/people/`
- successful AI captures are linked back into the guest folder for future reference

Vision model configuration (`Data/Settings/config.json` -> `vision`):
- `detection_backend`: `normal` or `yolo` (runtime switchable from dashboard in AI mode)
- `yolo_runtime`: `ultralytics` (production) or `opencv` (legacy ONNX parser)
- `yolo_ultralytics_device`: inference device for ultralytics (`cpu`, `0`, `0,1`, ...)
- `person_yolo_onnx`: path to YOLO ONNX person model
- `face_yolo_onnx`: path to YOLO ONNX face model
- `group_min_people`: minimum people count to mark a group
- `group_link_distance_px`: max centroid-link distance for group clustering

## Documentation

- `Current_runtime.md`: detailed current runtime behavior and script chain.

## Data Layout

- `Data/Settings/`
  - `config.json`
- `Data/Scripts/`
  - `photo_command_ai.txt`
  - `sanad_script.txt`
- `Data/Runtime/`
  - runtime health/state/error JSON files
- `Data/Audio/`
  - prerecorded AI prompt WAV files
  - matching `_raw.wav` Gemini output captures
- `Data/Settings/audio_prompt_records.json`
  - prompt recording metadata for files in `Data/Audio/`

## Notes

- `config.py` is intentionally removed; runtime config is JSON + env overrides.
- legacy AI mover/autonomous prototype scripts were removed from the production tree.
- Generated artifacts (`__pycache__`, runtime logs) should not be committed.
- Generated runtime files such as `Data/Runtime/runtime_health.json`, `Data/Runtime/autonomous_state.json`, `Data/Runtime/error_counters.json`, `Data/Runtime/error_events.jsonl`, and `Logs/*.log` may be absent in a clean checkout until the runtime starts.