# Current Runtime Production runtime architecture (as implemented now): `photo_sanad.sh -> voice_sanad.py -> persistent component loops + WS supervisor -> mode-gated command handling -> unified replay/capture -> server/dashboard/upload` ## Process Model - `Scripts/photo_sanad.sh` resets `mode.current_mode` to `mode.default_mode` on each launch. - Default full runtime path: - startup mode is normally `manual` - `photo_sanad.sh` resolves active PulseAudio sink/source, launches `Core/direct_camera_service.py` in the `teleimager` env, then starts `Gemini/voice_sanad.py` in the gemini env - `Core/direct_camera_service.py` is backend-first and serves external UI assets from `Web/direct_camera.html`, `Web/direct_camera.css`, and `Web/direct_camera.js` - preferred RealSense default is read from `Data/Settings/config.json -> camera.preferred_realsense_serial` and falls back to another detected camera if absent - in the default full runtime path, `Gemini/voice_sanad.py`, the dashboard server, the direct camera server, and replay/trigger services remain running in both `manual` and `ai` - in the full runtime path, `AUTONOMOUS_ENABLE` is auto-armed by default so dashboard mode can switch from `manual` to `ai` without restart - Optional lean manual path: - `MANUAL_LEAN_RUNTIME=1` - skips the direct camera server and the heavy manual/AI runtime services - keeps Gemini + dashboard only - `Gemini/voice_sanad.py` is the single long-lived orchestrator process. - Runtime logs are centralized under `Logs/`, with one stable file per component. - It starts these loops once and keeps them alive: - `capture_mic` - `receive_audio` - `play_audio` - `keepalive` - `Modes/Manual/trigger_loop.py` in the full runtime path - autonomous-mode supervisor in the full runtime path - `Modes/AI/autonomous_manager.py` only while `AUTONOMOUS_ENABLE=1` and runtime mode is `ai` - Runtime health writer (`Data/Runtime/runtime_health.json`) - Mode policy sync - Gemini WebSocket is managed by a dedicated reconnect supervisor. WS reconnect does not restart other loops. - In the normal full runtime path, mode switches change gating/state only; they do not tear down `voice_sanad.py`, the dashboard server, the direct camera server, or replay services. ## Mode Control - Source of truth is `Data/Settings/config.json`: - `mode.current_mode` in `manual|ai` - launcher resets `mode.current_mode` to `mode.default_mode` at process start - API updates mode live via `Server/photo_server.py`: - `/api/set_mode` - Voice command gating is enforced on user transcription events in `Gemini/gemini_voice.py`. ## Mode Semantics | Mode | Voice photo commands (`request_photo/yes_photo/no_photo`) | Manual R2+X | Autonomous flow | |---|---|---|---| | `manual` | Off | On | Paused | | `ai` | On | On | On only if `AUTONOMOUS_ENABLE=1` | Additional mode rules: - Gemini conversation can stay active in both modes when `gemini.mic_enabled=true`. - In full runtime, `manual` still includes the direct camera server, replay/trigger, uploader, and dashboard capture services. - In full runtime, switching dashboard mode from `manual` to `ai` does not restart the process. - In full runtime, switching between `manual` and `ai` does not stop `voice_sanad.py`, the dashboard server, the direct camera server, or replay/trigger services. - If `AUTONOMOUS_ENABLE=1`, autonomous manager is armed and starts live when runtime mode becomes `ai`. - Switching back to `manual` pauses autonomous flow again. - In optional `MANUAL_LEAN_RUNTIME=1`, capture/replay/autonomous services are intentionally unavailable. Removed from this project: - command-mode functionality was extracted to `G1_Lootah/AI_Command` ## Remote Safety Controls - `R2+X`: starts replay + photographer talk + unified capture pipeline. - `R2+L1`: global hard cancel safety combo (active in runtime loops): - cancels pending capture - cancels active replay path - resets autonomous interaction session to `IDLE` ## AI/Autonomous Runtime State machine in `Modes/AI/autonomous_manager.py`: `IDLE -> WAIT_CONFIRM -> FRAMING -> COUNTDOWN -> RETAKE_CONFIRM (optional) -> COMPLETE -> IDLE` Special blocked state: `IDLE_BLOCKED` when strict YOLO readiness fails. Behavior: - Autonomous manager is supervised by runtime mode: - `manual` -> paused - `ai` -> active when `AUTONOMOUS_ENABLE=1` - In full runtime, autonomous services can already be armed while still paused in `manual`, so mode switches are live. - In full runtime, the direct camera server and replay infrastructure remain started while autonomous manager is paused in `manual`. - On stable intent, manager opens audio gate, triggers a short greeting-hand replay, and asks whether the visitor wants a photo. - On stable single-person intent, manager can identify a returning guest or enroll a new guest into `photos/people/`. - Group-first greeting is used when group is detected. - Confirmation uses flag commands from voice layer (`request_photo.flag`, `confirm_yes.flag`, `confirm_no.flag`). - Hard target lock can pin one subject/group through the session. - Framing checks: center, size, blur, exposure, headroom, eye-line. - AI greeting replay is controlled by `vision.autonomous_greeting_replay_enabled` and `vision.autonomous_greeting_replay_file`. - AI photo-time replay is controlled by `vision.autonomous_capture_replay_enabled`. - When AI photo-time replay is enabled, autonomous capture uses the active replay file from `Data/Settings/config.json -> replay.active_file`, same as manual `R2+X`. - When a capture succeeds for an identified guest, the saved photo is attached into that guest's folder in `photos/people/`. - After capture, retake recommendation can move flow to `RETAKE_CONFIRM` (max retakes from config). - On completion: CTA prompt and cooldown reset. ## Vision Runtime `Modes/AI/vision_detector.py` provides: - Backend selection `normal|yolo`. - YOLO runtime selection `ultralytics|opencv`. - Person/face detection and group clustering. - Intent detection using depth-first logic and bbox-area fallback. - Target lock fields: - `target_lock_active`, `target_lock_type`, `target_lock_id`, `target_switch_blocked_count` - Camera/depth health fields: - `camera_ok`, `depth_ok`, `camera_restarts`, `depth_restarts` Strict production gate: - If `vision.yolo_strict_required=true` and YOLO readiness is not valid, AI session is blocked in `IDLE_BLOCKED`. ## Gemini Integration `Gemini/gemini_voice.py`: - Uses WS attach/detach model: - `attach_ws()` - `detach_ws()` - `is_ws_connected()` - Live-safe sends: - `send_text_prompt_live()` - `send_vision_context_live()` - Command matching uses user transcription events, not model text. - Continuous vision context is streamed from autonomous manager. - Context can be silent (`vision.gemini_context_silent=true`) and model audio is suppressed for context-only turns. - Exposes runtime health snapshot for dashboard/API. - In `manual`, Gemini conversation can remain available while AI photo flags stay disabled. - Mic state is controlled live through `/api/mic` and `/api/set_mic`. ## Unified Capture Pipeline All capture paths use `Server/capture_service.py`: - Replay execution + trigger marker callback capture. - Timed fallback capture if trigger marker is missing. - Capture retries using watchdog settings: - `watchdog.camera_capture_retry_count` - `watchdog.camera_capture_retry_delay_sec` - Upload trigger flag is touched after successful capture. Replay integrity is validated at startup and fallback replay can be selected automatically. ## Component Recovery (Watchdog) - WS failure: reconnect WS channel only. - Mic failure: restart mic component only. - Speaker failure: restart speaker component only. - Detector frame starvation: recover detector camera/depth inputs only. - Capture camera failure: retry capture call only. - Process stays alive unless startup fatal occurs (for example empty Gemini API key). ## Server and Dashboard `Server/photo_server.py` + `Web/gallery.js` provide: - Mode APIs: - `/api/mode`, `/api/set_mode`, `/api/mode_policy` - Mic APIs: - `/api/mic`, `/api/set_mic` - Detector/AI readiness APIs: - `/api/detector_backend`, `/api/set_detector_backend` - `/api/ai_readiness` - AI options APIs: - `/api/ai_options` - `/api/set_ai_options?hard_target_lock_enabled=0|1&retake_prompt_enabled=0|1&autonomous_greeting_replay_enabled=0|1&autonomous_greeting_replay_file=...&autonomous_capture_replay_enabled=0|1&face_recognition_enabled=0|1&face_recognition_threshold=...` - Replay APIs: - `/api/replays` - `/api/get_replay` - `/api/set_replay?name=...` - `/api/delete_replay?name=...` - `/api/rename_replay?old=...&new=...` - `/api/download_replay?name=...` - `/api/replay_record_status` - `/api/replay_record_start?name=...&seconds=...` - `/api/replay_test_status` - `/api/test_replay?name=...` - `/api/upload_replay` - Runtime state APIs: - `/api/autonomous_state` - `/api/runtime_health` - Camera APIs: - `/api/camera_health` - `/api/camera_sources` - `/api/set_camera_source?source=...` - `/api/set_camera_resolution?width=...&height=...&fps=...` - `/api/set_preferred_camera?serial=...` - Photo APIs: - `/api/capture`, `/api/photos`, `/api/delete`, `/api/reupload`, `/api/upload_now`, `/api/download_zip` - Live preview: - `/preview.mjpg` - preview is off by default and only runs when requested from the dashboard - preview camera/OpenCV is loaded lazily when preview is requested Dashboard panels include mode controls, detector backend/readiness, AI options, autonomous state, runtime health, live camera preview, camera source switching, camera resolution changes, preferred RealSense serial persistence, active replay selection, replay inventory management, and replay recording controls. Replay-management rules: - replay inventory covers the full `Data/G1` tree - replay recording is allowed only in `manual` - replay test/play is allowed only in `manual` - rename/download/delete/upload remain available from the dashboard inventory tools - People APIs: - `/api/people` - `/api/person_image?id=...&kind=face|scene` - `/api/download_person?id=...` - `/api/delete_person?id=...` - `/api/reset_people` - `/api/upload_person` - Audio prompt APIs: - `/api/audio_prompts` - `/api/set_audio_prompt_mode?mode=audio|gemini` - `/api/set_audio_prompt_fallback?enabled=0|1` - `/api/audio_prompt_record_status` - `/api/download_audio_prompt?key=...` - `/api/delete_audio_prompt?key=...` - `/api/upload_audio_prompt` - `/api/audio_prompt_record` Dashboard audio-prompt behavior: - operators can upload prerecorded WAV clips for each AI situation key - operators can delete or download existing clips - operators can record a prompt clip directly from text using the same Gemini replay path as `Project/SanadVoice/gemini_voice/sanad_replay.py` - operators can switch fixed AI situation speech between: - `audio`: recorded prompt clips first - `gemini`: Gemini speech for those same fixed situations - if a prompt clip is missing while `audio_prompts.mode=audio`, runtime falls back to Gemini text when `audio_prompts.fallback_to_gemini=true` Dashboard people-registry behavior: - sidebar shows enrolled guests with face + scene thumbnails - operators can upload a new face image to create or extend a guest profile - operators can attach additional photos to an existing guest profile - operators can download or delete one guest, or reset the whole registry `Core/direct_camera_service.py` serves its own camera UI from external web assets under `Web/` rather than embedding HTML/CSS/JS in Python. ## Runtime State Files - `Data/Settings/config.json` - `Data/Runtime/autonomous_state.json` - `Data/Runtime/runtime_health.json` - `Data/Runtime/error_counters.json` - `Data/Runtime/error_events.jsonl` - `Data/Runtime/upload_db.json` - `Data/Audio/` - `Data/Settings/audio_prompt_records.json` - `photos/people/` - `photos/Captures/` - `photos/samples/` These runtime JSON files are generated lazily. In a clean project tree, some of them will not exist until the corresponding component starts writing state. ## Core Config Blocks - `mode`: runtime mode. - `vision`: backend/runtime, strict YOLO, context stream settings, hard lock, retake, framing thresholds, greeting replay, and AI capture replay. - `vision`: backend/runtime, strict YOLO, context stream settings, hard lock, retake, framing thresholds, greeting replay, AI capture replay, and face-recognition controls. - `watchdog`: WS backoff, component restart delay, capture retry policy. ## Notes - Capture choreography is unified across manual trigger, autonomous flow, and dashboard capture. - Hands/replay behavior during capture remains driven by replay files in `Data/G1`. - Replay recordings created from the dashboard are stored directly under `Data/G1` and become selectable as active replays without restart. - Imported AI prompt recordings are stored under `Data/Audio/` and indexed by `Data/Settings/audio_prompt_records.json`. - In `audio` prompt mode, AI detection, greeting, confirmation, countdown, refusal, retake, and thank-you situations use recorded clips first. - After a fixed prompt finishes, runtime returns to normal Gemini conversation flow automatically.