AI_Photographer/Current_runtime.md
2026-04-12 18:52:37 +04:00

273 lines
13 KiB
Markdown

# Current Runtime
Production runtime architecture (as implemented now):
`photo_sanad.sh -> voice_sanad.py -> persistent component loops + WS supervisor -> mode-gated command handling -> unified replay/capture -> server/dashboard/upload`
## Process Model
- `Scripts/photo_sanad.sh` resets `mode.current_mode` to `mode.default_mode` on each launch.
- Default full runtime path:
- startup mode is normally `manual`
- `photo_sanad.sh` resolves active PulseAudio sink/source, launches `Core/direct_camera_service.py` in the `teleimager` env, then starts `Gemini/voice_sanad.py` in the gemini env
- `Core/direct_camera_service.py` is backend-first and serves external UI assets from `Web/direct_camera.html`, `Web/direct_camera.css`, and `Web/direct_camera.js`
- preferred RealSense default is read from `Data/Settings/config.json -> camera.preferred_realsense_serial` and falls back to another detected camera if absent
- in the default full runtime path, `Gemini/voice_sanad.py`, the dashboard server, the direct camera server, and replay/trigger services remain running in both `manual` and `ai`
- in the full runtime path, `AUTONOMOUS_ENABLE` is auto-armed by default so dashboard mode can switch from `manual` to `ai` without restart
- Optional lean manual path:
- `MANUAL_LEAN_RUNTIME=1`
- skips the direct camera server and the heavy manual/AI runtime services
- keeps Gemini + dashboard only
- `Gemini/voice_sanad.py` is the single long-lived orchestrator process.
- Runtime logs are centralized under `Logs/`, with one stable file per component.
- It starts these loops once and keeps them alive:
- `capture_mic`
- `receive_audio`
- `play_audio`
- `keepalive`
- `Modes/Manual/trigger_loop.py` in the full runtime path
- autonomous-mode supervisor in the full runtime path
- `Modes/AI/autonomous_manager.py` only while `AUTONOMOUS_ENABLE=1` and runtime mode is `ai`
- Runtime health writer (`Data/Runtime/runtime_health.json`)
- Mode policy sync
- Gemini WebSocket is managed by a dedicated reconnect supervisor. WS reconnect does not restart other loops.
- In the normal full runtime path, mode switches change gating/state only; they do not tear down `voice_sanad.py`, the dashboard server, the direct camera server, or replay services.
## Mode Control
- Source of truth is `Data/Settings/config.json`:
- `mode.current_mode` in `manual|ai`
- launcher resets `mode.current_mode` to `mode.default_mode` at process start
- API updates mode live via `Server/photo_server.py`:
- `/api/set_mode`
- Voice command gating is enforced on user transcription events in `Gemini/gemini_voice.py`.
## Mode Semantics
| Mode | Voice photo commands (`request_photo/yes_photo/no_photo`) | Manual R2+X | Autonomous flow |
|---|---|---|---|
| `manual` | Off | On | Paused |
| `ai` | On | On | On only if `AUTONOMOUS_ENABLE=1` |
Additional mode rules:
- Gemini conversation can stay active in both modes when `gemini.mic_enabled=true`.
- In full runtime, `manual` still includes the direct camera server, replay/trigger, uploader, and dashboard capture services.
- In full runtime, switching dashboard mode from `manual` to `ai` does not restart the process.
- In full runtime, switching between `manual` and `ai` does not stop `voice_sanad.py`, the dashboard server, the direct camera server, or replay/trigger services.
- If `AUTONOMOUS_ENABLE=1`, autonomous manager is armed and starts live when runtime mode becomes `ai`.
- Switching back to `manual` pauses autonomous flow again.
- In optional `MANUAL_LEAN_RUNTIME=1`, capture/replay/autonomous services are intentionally unavailable.
Removed from this project:
- command-mode functionality was extracted to `G1_Lootah/AI_Command`
## Remote Safety Controls
- `R2+X`: starts replay + photographer talk + unified capture pipeline.
- `R2+L1`: global hard cancel safety combo (active in runtime loops):
- cancels pending capture
- cancels active replay path
- resets autonomous interaction session to `IDLE`
## AI/Autonomous Runtime
State machine in `Modes/AI/autonomous_manager.py`:
`IDLE -> WAIT_CONFIRM -> FRAMING -> COUNTDOWN -> RETAKE_CONFIRM (optional) -> COMPLETE -> IDLE`
Special blocked state:
`IDLE_BLOCKED` when strict YOLO readiness fails.
Behavior:
- Autonomous manager is supervised by runtime mode:
- `manual` -> paused
- `ai` -> active when `AUTONOMOUS_ENABLE=1`
- In full runtime, autonomous services can already be armed while still paused in `manual`, so mode switches are live.
- In full runtime, the direct camera server and replay infrastructure remain started while autonomous manager is paused in `manual`.
- On stable intent, manager opens audio gate, triggers a short greeting-hand replay, and asks whether the visitor wants a photo.
- On stable single-person intent, manager can identify a returning guest or enroll a new guest into `photos/people/`.
- Group-first greeting is used when group is detected.
- Confirmation uses flag commands from voice layer (`request_photo.flag`, `confirm_yes.flag`, `confirm_no.flag`).
- Hard target lock can pin one subject/group through the session.
- Framing checks: center, size, blur, exposure, headroom, eye-line.
- AI greeting replay is controlled by `vision.autonomous_greeting_replay_enabled` and `vision.autonomous_greeting_replay_file`.
- AI photo-time replay is controlled by `vision.autonomous_capture_replay_enabled`.
- When AI photo-time replay is enabled, autonomous capture uses the active replay file from `Data/Settings/config.json -> replay.active_file`, same as manual `R2+X`.
- When a capture succeeds for an identified guest, the saved photo is attached into that guest's folder in `photos/people/`.
- After capture, retake recommendation can move flow to `RETAKE_CONFIRM` (max retakes from config).
- On completion: CTA prompt and cooldown reset.
## Vision Runtime
`Modes/AI/vision_detector.py` provides:
- Backend selection `normal|yolo`.
- YOLO runtime selection `ultralytics|opencv`.
- Person/face detection and group clustering.
- Intent detection using depth-first logic and bbox-area fallback.
- Target lock fields:
- `target_lock_active`, `target_lock_type`, `target_lock_id`, `target_switch_blocked_count`
- Camera/depth health fields:
- `camera_ok`, `depth_ok`, `camera_restarts`, `depth_restarts`
Strict production gate:
- If `vision.yolo_strict_required=true` and YOLO readiness is not valid, AI session is blocked in `IDLE_BLOCKED`.
## Gemini Integration
`Gemini/gemini_voice.py`:
- Uses WS attach/detach model:
- `attach_ws()`
- `detach_ws()`
- `is_ws_connected()`
- Live-safe sends:
- `send_text_prompt_live()`
- `send_vision_context_live()`
- Command matching uses user transcription events, not model text.
- Continuous vision context is streamed from autonomous manager.
- Context can be silent (`vision.gemini_context_silent=true`) and model audio is suppressed for context-only turns.
- Exposes runtime health snapshot for dashboard/API.
- In `manual`, Gemini conversation can remain available while AI photo flags stay disabled.
- Mic state is controlled live through `/api/mic` and `/api/set_mic`.
## Unified Capture Pipeline
All capture paths use `Server/capture_service.py`:
- Replay execution + trigger marker callback capture.
- Timed fallback capture if trigger marker is missing.
- Capture retries using watchdog settings:
- `watchdog.camera_capture_retry_count`
- `watchdog.camera_capture_retry_delay_sec`
- Upload trigger flag is touched after successful capture.
Replay integrity is validated at startup and fallback replay can be selected automatically.
## Component Recovery (Watchdog)
- WS failure: reconnect WS channel only.
- Mic failure: restart mic component only.
- Speaker failure: restart speaker component only.
- Detector frame starvation: recover detector camera/depth inputs only.
- Capture camera failure: retry capture call only.
- Process stays alive unless startup fatal occurs (for example empty Gemini API key).
## Server and Dashboard
`Server/photo_server.py` + `Web/gallery.js` provide:
- Mode APIs:
- `/api/mode`, `/api/set_mode`, `/api/mode_policy`
- Mic APIs:
- `/api/mic`, `/api/set_mic`
- Detector/AI readiness APIs:
- `/api/detector_backend`, `/api/set_detector_backend`
- `/api/ai_readiness`
- AI options APIs:
- `/api/ai_options`
- `/api/set_ai_options?hard_target_lock_enabled=0|1&retake_prompt_enabled=0|1&autonomous_greeting_replay_enabled=0|1&autonomous_greeting_replay_file=...&autonomous_capture_replay_enabled=0|1&face_recognition_enabled=0|1&face_recognition_threshold=...`
- Replay APIs:
- `/api/replays`
- `/api/get_replay`
- `/api/set_replay?name=...`
- `/api/delete_replay?name=...`
- `/api/rename_replay?old=...&new=...`
- `/api/download_replay?name=...`
- `/api/replay_record_status`
- `/api/replay_record_start?name=...&seconds=...`
- `/api/replay_test_status`
- `/api/test_replay?name=...`
- `/api/upload_replay`
- Runtime state APIs:
- `/api/autonomous_state`
- `/api/runtime_health`
- Camera APIs:
- `/api/camera_health`
- `/api/camera_sources`
- `/api/set_camera_source?source=...`
- `/api/set_camera_resolution?width=...&height=...&fps=...`
- `/api/set_preferred_camera?serial=...`
- Photo APIs:
- `/api/capture`, `/api/photos`, `/api/delete`, `/api/reupload`, `/api/upload_now`, `/api/download_zip`
- Live preview:
- `/preview.mjpg`
- preview is off by default and only runs when requested from the dashboard
- preview camera/OpenCV is loaded lazily when preview is requested
Dashboard panels include mode controls, detector backend/readiness, AI options, autonomous state, runtime health, live camera preview, camera source switching, camera resolution changes, preferred RealSense serial persistence, active replay selection, replay inventory management, and replay recording controls.
Replay-management rules:
- replay inventory covers the full `Data/G1` tree
- replay recording is allowed only in `manual`
- replay test/play is allowed only in `manual`
- rename/download/delete/upload remain available from the dashboard inventory tools
- People APIs:
- `/api/people`
- `/api/person_image?id=...&kind=face|scene`
- `/api/download_person?id=...`
- `/api/delete_person?id=...`
- `/api/reset_people`
- `/api/upload_person`
- Audio prompt APIs:
- `/api/audio_prompts`
- `/api/set_audio_prompt_mode?mode=audio|gemini`
- `/api/set_audio_prompt_fallback?enabled=0|1`
- `/api/audio_prompt_record_status`
- `/api/download_audio_prompt?key=...`
- `/api/delete_audio_prompt?key=...`
- `/api/upload_audio_prompt`
- `/api/audio_prompt_record`
Dashboard audio-prompt behavior:
- operators can upload prerecorded WAV clips for each AI situation key
- operators can delete or download existing clips
- operators can record a prompt clip directly from text using the same Gemini replay path as `Project/SanadVoice/gemini_voice/sanad_replay.py`
- operators can switch fixed AI situation speech between:
- `audio`: recorded prompt clips first
- `gemini`: Gemini speech for those same fixed situations
- if a prompt clip is missing while `audio_prompts.mode=audio`, runtime falls back to Gemini text when `audio_prompts.fallback_to_gemini=true`
Dashboard people-registry behavior:
- sidebar shows enrolled guests with face + scene thumbnails
- operators can upload a new face image to create or extend a guest profile
- operators can attach additional photos to an existing guest profile
- operators can download or delete one guest, or reset the whole registry
`Core/direct_camera_service.py` serves its own camera UI from external web assets under `Web/` rather than embedding HTML/CSS/JS in Python.
## Runtime State Files
- `Data/Settings/config.json`
- `Data/Runtime/autonomous_state.json`
- `Data/Runtime/runtime_health.json`
- `Data/Runtime/error_counters.json`
- `Data/Runtime/error_events.jsonl`
- `Data/Runtime/upload_db.json`
- `Data/Audio/`
- `Data/Settings/audio_prompt_records.json`
- `photos/people/`
- `photos/Captures/`
- `photos/samples/`
These runtime JSON files are generated lazily. In a clean project tree, some of them will not exist until the corresponding component starts writing state.
## Core Config Blocks
- `mode`: runtime mode.
- `vision`: backend/runtime, strict YOLO, context stream settings, hard lock, retake, framing thresholds, greeting replay, and AI capture replay.
- `vision`: backend/runtime, strict YOLO, context stream settings, hard lock, retake, framing thresholds, greeting replay, AI capture replay, and face-recognition controls.
- `watchdog`: WS backoff, component restart delay, capture retry policy.
## Notes
- Capture choreography is unified across manual trigger, autonomous flow, and dashboard capture.
- Hands/replay behavior during capture remains driven by replay files in `Data/G1`.
- Replay recordings created from the dashboard are stored directly under `Data/G1` and become selectable as active replays without restart.
- Imported AI prompt recordings are stored under `Data/Audio/` and indexed by `Data/Settings/audio_prompt_records.json`.
- In `audio` prompt mode, AI detection, greeting, confirmation, countdown, refusal, retake, and thank-you situations use recorded clips first.
- After a fixed prompt finishes, runtime returns to normal Gemini conversation flow automatically.