269 lines
13 KiB
Markdown
269 lines
13 KiB
Markdown
# AI Photographer
|
|
|
|
Production-oriented robot photographer stack for Unitree G1.
|
|
|
|
## Quick Start
|
|
|
|
1. Set API key in config:
|
|
- edit `Data/Settings/config.json` -> `gemini.api_key`
|
|
2. Run launcher:
|
|
- `cd Scripts && ./photo_sanad.sh`
|
|
|
|
Launcher behavior:
|
|
- resets `mode.current_mode` to the configured `mode.default_mode` on each launch,
|
|
- resolves the active PulseAudio speaker/microphone and exports them for the runtime,
|
|
- starts the direct camera service (`Core/direct_camera_service.py` in the `teleimager` conda env) for the full runtime path,
|
|
- starts Gemini runtime (`gemini` conda env),
|
|
- runs `Gemini/voice_sanad.py` as the main process,
|
|
- keeps `manual` as the default startup mode unless changed in `Data/Settings/config.json`,
|
|
- keeps `Gemini/voice_sanad.py`, the dashboard server, the direct camera server, and replay/trigger services running in the full runtime path across mode switches,
|
|
- arms autonomous services in the full runtime path so switching dashboard mode from `manual` to `ai` works without a restart.
|
|
|
|
Optional startup profile:
|
|
- `MANUAL_LEAN_RUNTIME=1`
|
|
- manual-mode voice + dashboard only
|
|
- skips direct camera, DDS, replay, uploader, and autonomous startup
|
|
- capture/replay services are unavailable in that profile
|
|
- this is an explicit reduced profile, not the normal production mode
|
|
|
|
## Project Layout
|
|
|
|
- `Scripts/`
|
|
- startup and ops shell entrypoints
|
|
- `photo_sanad.sh`, `fix_realsense_usb.sh`
|
|
- `direct_camera_samples_server.py` remains as a compatibility wrapper
|
|
- `fix_realsense_usb.sh` supports `--check`, `--fix`, and `--serials`
|
|
- `Data/`
|
|
- categorized runtime assets and state
|
|
- `Settings/config.json`
|
|
- `Scripts/photo_command_ai.txt`, `Scripts/sanad_script.txt`
|
|
- `Runtime/upload_db.json`, generated runtime JSON state files
|
|
- `Settings/config.json -> camera.preferred_realsense_serial` selects the preferred default RealSense by serial
|
|
- generated runtime JSON files are created on demand during execution
|
|
- `Data/Audio/`
|
|
- fixed prerecorded AI situation prompts (`*.wav`)
|
|
- matching raw Gemini captures (`*_raw.wav`)
|
|
- `Data/Settings/audio_prompt_records.json`
|
|
- prompt recording metadata for files in `Data/Audio/`
|
|
- `photos/people/`
|
|
- AI face-recognition registry
|
|
- each returning guest gets a folder with face/scene references, metadata, and captured-photo links
|
|
- `photos/Captures/`
|
|
- final saved runtime captures from dashboard, manual trigger, and AI capture flow
|
|
- `photos/samples/`
|
|
- standalone direct-camera sample captures
|
|
- `Logs/`
|
|
- one stable log file per runtime component
|
|
- examples: `voice_sanad.log`, `gemini_voice.log`, `photo_server.log`, `direct_camera.log`
|
|
- `Web/`
|
|
- operator dashboard frontend (`gallery.html`, `gallery.js`, `style.css`)
|
|
- direct camera service frontend (`direct_camera.html`, `direct_camera.js`, `direct_camera.css`)
|
|
- `Data/G1/`
|
|
- replay/gesture motion files (`*.jsonl`)
|
|
- dashboard-recorded replay captures are saved directly here
|
|
- `Core/`
|
|
- shared runtime foundations (`settings.py`, `Logger.py`, `error_events.py`, `direct_camera_service.py`)
|
|
- `direct_camera_service.py` handles camera backend/API and serves its UI from `Web/`
|
|
- `Gemini/`
|
|
- voice orchestration (`voice_sanad.py`, `gemini_voice.py`, `sanad_text_utils.py`)
|
|
- `Server/`
|
|
- dashboard/API/capture/upload (`photo_server.py`, `capture_service.py`, `direct_camera_client.py`, `uploader.py`)
|
|
- `Modes/AI/`
|
|
- autonomous vision/intent/session manager (`autonomous_manager.py`, `vision_detector.py`, `camera_module.py`)
|
|
- `Modes/Manual/`
|
|
- controller + replay + trigger loop (`controller.py`, `replay_engine.py`, `trigger_loop.py`)
|
|
|
|
## Runtime Modes
|
|
|
|
Mode is persisted in `Data/Settings/config.json` under `mode.current_mode`:
|
|
- `manual`
|
|
- Gemini conversation stays available when `gemini.mic_enabled=true`
|
|
- voice `request_photo / yes_photo / no_photo` disabled
|
|
- `R2+X` replay/capture path stays available in the full runtime path
|
|
- `Gemini/voice_sanad.py`, dashboard server, direct camera server, and replay services stay running
|
|
- autonomous services can be armed in the background, but autonomous flow stays paused until mode becomes `ai`
|
|
- `ai`
|
|
- voice `request_photo / yes_photo / no_photo` enabled
|
|
- `R2+X` replay/capture path still works
|
|
- `Gemini/voice_sanad.py`, dashboard server, direct camera server, and replay services continue running without restart
|
|
- autonomous flow runs live when `AUTONOMOUS_ENABLE=1`
|
|
- on stable visual intent, AI identifies or enrolls a single guest, optionally greets with a short hand replay, asks for photo confirmation, guides guests into frame, then captures using the active replay during the shot when AI capture replay is enabled
|
|
- returning guests are recognized from `photos/people/` and can be greeted as returning visitors
|
|
|
|
Command-mode functionality was extracted from this project and moved to:
|
|
- `G1_Lootah/AI_Command`
|
|
|
|
## Remote Controls
|
|
|
|
- `R2+X`
|
|
- replay + photographer talk + capture pipeline
|
|
- `R2+L1`
|
|
- global hard cancel safety combo
|
|
- active in runtime loops to cancel pending capture/replay and reset active interaction
|
|
|
|
Mode APIs:
|
|
- `GET /api/mode`
|
|
- `GET /api/set_mode?mode=manual|ai`
|
|
- `GET /api/mode_policy`
|
|
- `GET /api/mic`
|
|
- `GET /api/set_mic?enabled=0|1`
|
|
- `GET /api/detector_backend`
|
|
- `GET /api/set_detector_backend?backend=normal|yolo`
|
|
- `GET /api/ai_readiness`
|
|
- `GET /api/ai_options`
|
|
- `GET /api/set_ai_options?hard_target_lock_enabled=0|1&retake_prompt_enabled=0|1&autonomous_greeting_replay_enabled=0|1&autonomous_greeting_replay_file=...&autonomous_capture_replay_enabled=0|1&face_recognition_enabled=0|1&face_recognition_threshold=...`
|
|
- `GET /api/autonomous_state`
|
|
- `GET /api/runtime_health`
|
|
|
|
## Autonomous Flow
|
|
|
|
Autonomous services are armed by environment:
|
|
- `AUTONOMOUS_ENABLE=1`
|
|
- allows `Modes/AI/autonomous_manager.py` to run inside `voice_sanad.py`
|
|
- in `manual` mode it stays paused
|
|
- core services still remain up in `manual` (`voice_sanad.py`, dashboard, direct camera server, replay/trigger)
|
|
- switching dashboard mode to `ai` starts autonomous flow live without a restart
|
|
- `AUTONOMOUS_ENABLE=0`
|
|
- disables autonomous manager entirely
|
|
- manual trigger loop + voice runtime still work
|
|
|
|
Session state machine:
|
|
- `IDLE -> WAIT_CONFIRM -> FRAMING -> COUNTDOWN -> RETAKE_CONFIRM (optional) -> COMPLETE -> IDLE`
|
|
- strict readiness block state: `IDLE_BLOCKED` when required YOLO readiness is not met
|
|
|
|
## Dashboard / API Highlights
|
|
|
|
- `GET /` gallery dashboard
|
|
- `GET /preview.mjpg` live preview
|
|
- preview is off by default and starts only when requested from the dashboard
|
|
- preview camera/OpenCV is loaded lazily when preview is requested
|
|
- Camera control APIs:
|
|
- `GET /api/camera_health`
|
|
- `GET /api/camera_sources`
|
|
- `GET /api/set_camera_source?source=...`
|
|
- `GET /api/set_camera_resolution?width=...&height=...&fps=...`
|
|
- `GET /api/set_preferred_camera?serial=...`
|
|
- dashboard can switch camera source, show active camera info, change resolution live, and save a preferred RealSense serial into `Data/Settings/config.json`
|
|
- Audio prompt APIs:
|
|
- `GET /api/audio_prompts`
|
|
- `GET /api/set_audio_prompt_mode?mode=audio|gemini`
|
|
- `GET /api/set_audio_prompt_fallback?enabled=0|1`
|
|
- `GET /api/audio_prompt_record_status`
|
|
- `GET /api/download_audio_prompt?key=...`
|
|
- `GET /api/delete_audio_prompt?key=...`
|
|
- `POST /api/upload_audio_prompt`
|
|
- `POST /api/audio_prompt_record`
|
|
- dashboard can upload, replace, download, delete, inspect, and record prerecorded AI prompt clips stored in `Data/Audio/`
|
|
- dashboard can switch fixed AI situation speech between recorded audio and Gemini without restart
|
|
- `GET /api/autonomous_state` runtime autonomous state panel data (lock/retake/health fields)
|
|
- `GET /api/runtime_health` component health (WS/mic/speaker/gate/restarts)
|
|
- `GET /api/mic` and `GET /api/set_mic`
|
|
- microphone ON/OFF toggle for both modes
|
|
- `GET /api/ai_readiness` strict AI readiness + block reason
|
|
- `GET /api/ai_options` and `GET /api/set_ai_options` for hard lock/retake toggles
|
|
- Replay APIs:
|
|
- `GET /api/replays`
|
|
- `GET /api/get_replay`
|
|
- `GET /api/set_replay?name=...`
|
|
- `GET /api/delete_replay?name=...`
|
|
- `GET /api/rename_replay?old=...&new=...`
|
|
- `GET /api/download_replay?name=...`
|
|
- `GET /api/replay_record_status`
|
|
- `GET /api/replay_record_start?name=...&seconds=...`
|
|
- `GET /api/replay_test_status`
|
|
- `GET /api/test_replay?name=...`
|
|
- `POST /api/upload_replay`
|
|
- dashboard can record new replays into `Data/G1`, replay-test them, rename them, download them, delete them, upload new `.jsonl` replays, and set the active replay
|
|
- replay recording and replay test are allowed only while runtime mode is `manual`
|
|
- People APIs:
|
|
- `GET /api/people`
|
|
- `GET /api/person_image?id=...&kind=face|scene`
|
|
- `GET /api/download_person?id=...`
|
|
- `GET /api/delete_person?id=...`
|
|
- `GET /api/reset_people`
|
|
- `POST /api/upload_person`
|
|
- dashboard can upload face photos for future recognition, download a saved guest package, delete one guest, or reset the full registry
|
|
- `GET /api/capture` capture via unified pipeline
|
|
- `GET /api/photos`, `GET /api/sessions`, `GET /api/delete`, `GET /api/reupload`
|
|
- `GET /api/errors` structured error counters
|
|
|
|
## Configuration
|
|
|
|
Source of truth:
|
|
- `Data/Settings/config.json` loaded by `Core/settings.py`
|
|
|
|
Environment overrides are supported (timing, ports, upload settings, camera, etc.).
|
|
|
|
Direct camera serial selection precedence:
|
|
- `REALSENSE_SERIAL`
|
|
- `PREFERRED_REALSENSE_SERIAL`
|
|
- `Data/Settings/config.json -> camera.preferred_realsense_serial`
|
|
- teleimager camera config serial
|
|
- any other detected RealSense
|
|
|
|
Dashboard camera behavior:
|
|
- main production dashboard can switch between available camera sources without restarting the runtime
|
|
- camera status panel shows requested source, active source, backend, active profile, preferred serial, and active RealSense serial
|
|
- resolution changes are applied live through the direct camera service
|
|
- `Save As Default` stores the preferred RealSense serial into `Data/Settings/config.json`
|
|
|
|
AI prerecorded prompt behavior:
|
|
- `Data/Audio/` stores fixed WAV clips by prompt key
|
|
- `Data/Settings/audio_prompt_records.json` stores prompt recording metadata and raw-output file references
|
|
- `audio_prompts.files` in `Data/Settings/config.json` maps each key to its filename
|
|
- `audio_prompts.mode` controls fixed AI situation speech:
|
|
- `audio`: use recorded clips first for AI situation prompts
|
|
- `gemini`: use Gemini speech instead for those same fixed prompts
|
|
- `audio_prompts.fallback_to_gemini` controls whether missing prompt clips fall back to Gemini text
|
|
- dashboard prompt library manages upload/download/delete, text-to-record generation, speech mode, and fallback state
|
|
- imported prerecorded prompts currently cover 18 situation keys; missing keys continue through Gemini fallback until recorded
|
|
|
|
AI replay behavior:
|
|
- `vision.autonomous_greeting_replay_enabled` controls the short greeting gesture when intent stabilizes
|
|
- `vision.autonomous_greeting_replay_file` selects the greeting replay file
|
|
- `vision.autonomous_capture_replay_enabled` controls whether AI photo capture uses the active replay during the shot
|
|
- `replay.active_file` in `Data/Settings/config.json` is the single persisted active-replay setting
|
|
- the active replay is shared between manual `R2+X`, dashboard capture choreography, and AI capture when AI capture replay is enabled
|
|
- replay inventory is shared across the full `Data/G1` tree
|
|
|
|
Face recognition behavior:
|
|
- `vision.face_recognition_enabled` enables single-guest recognition/enrollment in AI mode
|
|
- `vision.face_recognition_threshold` controls the similarity threshold for matching a returning guest
|
|
- new guests are enrolled into `photos/people/`
|
|
- successful AI captures are linked back into the guest folder for future reference
|
|
|
|
Vision model configuration (`Data/Settings/config.json` -> `vision`):
|
|
- `detection_backend`: `normal` or `yolo` (runtime switchable from dashboard in AI mode)
|
|
- `yolo_runtime`: `ultralytics` (production) or `opencv` (legacy ONNX parser)
|
|
- `yolo_ultralytics_device`: inference device for ultralytics (`cpu`, `0`, `0,1`, ...)
|
|
- `person_yolo_onnx`: path to YOLO ONNX person model
|
|
- `face_yolo_onnx`: path to YOLO ONNX face model
|
|
- `group_min_people`: minimum people count to mark a group
|
|
- `group_link_distance_px`: max centroid-link distance for group clustering
|
|
|
|
## Documentation
|
|
|
|
- `Current_runtime.md`: detailed current runtime behavior and script chain.
|
|
|
|
## Data Layout
|
|
|
|
- `Data/Settings/`
|
|
- `config.json`
|
|
- `Data/Scripts/`
|
|
- `photo_command_ai.txt`
|
|
- `sanad_script.txt`
|
|
- `Data/Runtime/`
|
|
- runtime health/state/error JSON files
|
|
- `Data/Audio/`
|
|
- prerecorded AI prompt WAV files
|
|
- matching `_raw.wav` Gemini output captures
|
|
- `Data/Settings/audio_prompt_records.json`
|
|
- prompt recording metadata for files in `Data/Audio/`
|
|
|
|
## Notes
|
|
|
|
- `config.py` is intentionally removed; runtime config is JSON + env overrides.
|
|
- legacy AI mover/autonomous prototype scripts were removed from the production tree.
|
|
- Generated artifacts (`__pycache__`, runtime logs) should not be committed.
|
|
- Generated runtime files such as `Data/Runtime/runtime_health.json`, `Data/Runtime/autonomous_state.json`, `Data/Runtime/error_counters.json`, `Data/Runtime/error_events.jsonl`, and `Logs/*.log` may be absent in a clean checkout until the runtime starts.
|