AI_Photographer/Current_runtime.md
2026-04-12 18:52:37 +04:00

13 KiB

Current Runtime

Production runtime architecture (as implemented now):

photo_sanad.sh -> voice_sanad.py -> persistent component loops + WS supervisor -> mode-gated command handling -> unified replay/capture -> server/dashboard/upload

Process Model

  • Scripts/photo_sanad.sh resets mode.current_mode to mode.default_mode on each launch.
  • Default full runtime path:
  • startup mode is normally manual
  • photo_sanad.sh resolves active PulseAudio sink/source, launches Core/direct_camera_service.py in the teleimager env, then starts Gemini/voice_sanad.py in the gemini env
  • Core/direct_camera_service.py is backend-first and serves external UI assets from Web/direct_camera.html, Web/direct_camera.css, and Web/direct_camera.js
  • preferred RealSense default is read from Data/Settings/config.json -> camera.preferred_realsense_serial and falls back to another detected camera if absent
  • in the default full runtime path, Gemini/voice_sanad.py, the dashboard server, the direct camera server, and replay/trigger services remain running in both manual and ai
  • in the full runtime path, AUTONOMOUS_ENABLE is auto-armed by default so dashboard mode can switch from manual to ai without restart
  • Optional lean manual path:
  • MANUAL_LEAN_RUNTIME=1
  • skips the direct camera server and the heavy manual/AI runtime services
  • keeps Gemini + dashboard only
  • Gemini/voice_sanad.py is the single long-lived orchestrator process.
  • Runtime logs are centralized under Logs/, with one stable file per component.
  • It starts these loops once and keeps them alive:
  • capture_mic
  • receive_audio
  • play_audio
  • keepalive
  • Modes/Manual/trigger_loop.py in the full runtime path
  • autonomous-mode supervisor in the full runtime path
  • Modes/AI/autonomous_manager.py only while AUTONOMOUS_ENABLE=1 and runtime mode is ai
  • Runtime health writer (Data/Runtime/runtime_health.json)
  • Mode policy sync
  • Gemini WebSocket is managed by a dedicated reconnect supervisor. WS reconnect does not restart other loops.
  • In the normal full runtime path, mode switches change gating/state only; they do not tear down voice_sanad.py, the dashboard server, the direct camera server, or replay services.

Mode Control

  • Source of truth is Data/Settings/config.json:
  • mode.current_mode in manual|ai
  • launcher resets mode.current_mode to mode.default_mode at process start
  • API updates mode live via Server/photo_server.py:
  • /api/set_mode
  • Voice command gating is enforced on user transcription events in Gemini/gemini_voice.py.

Mode Semantics

Mode Voice photo commands (request_photo/yes_photo/no_photo) Manual R2+X Autonomous flow
manual Off On Paused
ai On On On only if AUTONOMOUS_ENABLE=1

Additional mode rules:

  • Gemini conversation can stay active in both modes when gemini.mic_enabled=true.
  • In full runtime, manual still includes the direct camera server, replay/trigger, uploader, and dashboard capture services.
  • In full runtime, switching dashboard mode from manual to ai does not restart the process.
  • In full runtime, switching between manual and ai does not stop voice_sanad.py, the dashboard server, the direct camera server, or replay/trigger services.
  • If AUTONOMOUS_ENABLE=1, autonomous manager is armed and starts live when runtime mode becomes ai.
  • Switching back to manual pauses autonomous flow again.
  • In optional MANUAL_LEAN_RUNTIME=1, capture/replay/autonomous services are intentionally unavailable.

Removed from this project:

  • command-mode functionality was extracted to G1_Lootah/AI_Command

Remote Safety Controls

  • R2+X: starts replay + photographer talk + unified capture pipeline.
  • R2+L1: global hard cancel safety combo (active in runtime loops):
  • cancels pending capture
  • cancels active replay path
  • resets autonomous interaction session to IDLE

AI/Autonomous Runtime

State machine in Modes/AI/autonomous_manager.py:

IDLE -> WAIT_CONFIRM -> FRAMING -> COUNTDOWN -> RETAKE_CONFIRM (optional) -> COMPLETE -> IDLE

Special blocked state:

IDLE_BLOCKED when strict YOLO readiness fails.

Behavior:

  • Autonomous manager is supervised by runtime mode:
  • manual -> paused
  • ai -> active when AUTONOMOUS_ENABLE=1
  • In full runtime, autonomous services can already be armed while still paused in manual, so mode switches are live.
  • In full runtime, the direct camera server and replay infrastructure remain started while autonomous manager is paused in manual.
  • On stable intent, manager opens audio gate, triggers a short greeting-hand replay, and asks whether the visitor wants a photo.
  • On stable single-person intent, manager can identify a returning guest or enroll a new guest into photos/people/.
  • Group-first greeting is used when group is detected.
  • Confirmation uses flag commands from voice layer (request_photo.flag, confirm_yes.flag, confirm_no.flag).
  • Hard target lock can pin one subject/group through the session.
  • Framing checks: center, size, blur, exposure, headroom, eye-line.
  • AI greeting replay is controlled by vision.autonomous_greeting_replay_enabled and vision.autonomous_greeting_replay_file.
  • AI photo-time replay is controlled by vision.autonomous_capture_replay_enabled.
  • When AI photo-time replay is enabled, autonomous capture uses the active replay file from Data/Settings/config.json -> replay.active_file, same as manual R2+X.
  • When a capture succeeds for an identified guest, the saved photo is attached into that guest's folder in photos/people/.
  • After capture, retake recommendation can move flow to RETAKE_CONFIRM (max retakes from config).
  • On completion: CTA prompt and cooldown reset.

Vision Runtime

Modes/AI/vision_detector.py provides:

  • Backend selection normal|yolo.
  • YOLO runtime selection ultralytics|opencv.
  • Person/face detection and group clustering.
  • Intent detection using depth-first logic and bbox-area fallback.
  • Target lock fields:
  • target_lock_active, target_lock_type, target_lock_id, target_switch_blocked_count
  • Camera/depth health fields:
  • camera_ok, depth_ok, camera_restarts, depth_restarts

Strict production gate:

  • If vision.yolo_strict_required=true and YOLO readiness is not valid, AI session is blocked in IDLE_BLOCKED.

Gemini Integration

Gemini/gemini_voice.py:

  • Uses WS attach/detach model:
  • attach_ws()
  • detach_ws()
  • is_ws_connected()
  • Live-safe sends:
  • send_text_prompt_live()
  • send_vision_context_live()
  • Command matching uses user transcription events, not model text.
  • Continuous vision context is streamed from autonomous manager.
  • Context can be silent (vision.gemini_context_silent=true) and model audio is suppressed for context-only turns.
  • Exposes runtime health snapshot for dashboard/API.
  • In manual, Gemini conversation can remain available while AI photo flags stay disabled.
  • Mic state is controlled live through /api/mic and /api/set_mic.

Unified Capture Pipeline

All capture paths use Server/capture_service.py:

  • Replay execution + trigger marker callback capture.
  • Timed fallback capture if trigger marker is missing.
  • Capture retries using watchdog settings:
  • watchdog.camera_capture_retry_count
  • watchdog.camera_capture_retry_delay_sec
  • Upload trigger flag is touched after successful capture.

Replay integrity is validated at startup and fallback replay can be selected automatically.

Component Recovery (Watchdog)

  • WS failure: reconnect WS channel only.
  • Mic failure: restart mic component only.
  • Speaker failure: restart speaker component only.
  • Detector frame starvation: recover detector camera/depth inputs only.
  • Capture camera failure: retry capture call only.
  • Process stays alive unless startup fatal occurs (for example empty Gemini API key).

Server and Dashboard

Server/photo_server.py + Web/gallery.js provide:

  • Mode APIs:
  • /api/mode, /api/set_mode, /api/mode_policy
  • Mic APIs:
  • /api/mic, /api/set_mic
  • Detector/AI readiness APIs:
  • /api/detector_backend, /api/set_detector_backend
  • /api/ai_readiness
  • AI options APIs:
  • /api/ai_options
  • /api/set_ai_options?hard_target_lock_enabled=0|1&retake_prompt_enabled=0|1&autonomous_greeting_replay_enabled=0|1&autonomous_greeting_replay_file=...&autonomous_capture_replay_enabled=0|1&face_recognition_enabled=0|1&face_recognition_threshold=...
  • Replay APIs:
  • /api/replays
  • /api/get_replay
  • /api/set_replay?name=...
  • /api/delete_replay?name=...
  • /api/rename_replay?old=...&new=...
  • /api/download_replay?name=...
  • /api/replay_record_status
  • /api/replay_record_start?name=...&seconds=...
  • /api/replay_test_status
  • /api/test_replay?name=...
  • /api/upload_replay
  • Runtime state APIs:
  • /api/autonomous_state
  • /api/runtime_health
  • Camera APIs:
  • /api/camera_health
  • /api/camera_sources
  • /api/set_camera_source?source=...
  • /api/set_camera_resolution?width=...&height=...&fps=...
  • /api/set_preferred_camera?serial=...
  • Photo APIs:
  • /api/capture, /api/photos, /api/delete, /api/reupload, /api/upload_now, /api/download_zip
  • Live preview:
  • /preview.mjpg
  • preview is off by default and only runs when requested from the dashboard
  • preview camera/OpenCV is loaded lazily when preview is requested

Dashboard panels include mode controls, detector backend/readiness, AI options, autonomous state, runtime health, live camera preview, camera source switching, camera resolution changes, preferred RealSense serial persistence, active replay selection, replay inventory management, and replay recording controls.

Replay-management rules:

  • replay inventory covers the full Data/G1 tree
  • replay recording is allowed only in manual
  • replay test/play is allowed only in manual
  • rename/download/delete/upload remain available from the dashboard inventory tools
  • People APIs:
  • /api/people
  • /api/person_image?id=...&kind=face|scene
  • /api/download_person?id=...
  • /api/delete_person?id=...
  • /api/reset_people
  • /api/upload_person
  • Audio prompt APIs:
  • /api/audio_prompts
  • /api/set_audio_prompt_mode?mode=audio|gemini
  • /api/set_audio_prompt_fallback?enabled=0|1
  • /api/audio_prompt_record_status
  • /api/download_audio_prompt?key=...
  • /api/delete_audio_prompt?key=...
  • /api/upload_audio_prompt
  • /api/audio_prompt_record

Dashboard audio-prompt behavior:

  • operators can upload prerecorded WAV clips for each AI situation key
  • operators can delete or download existing clips
  • operators can record a prompt clip directly from text using the same Gemini replay path as Project/SanadVoice/gemini_voice/sanad_replay.py
  • operators can switch fixed AI situation speech between:
    • audio: recorded prompt clips first
    • gemini: Gemini speech for those same fixed situations
  • if a prompt clip is missing while audio_prompts.mode=audio, runtime falls back to Gemini text when audio_prompts.fallback_to_gemini=true

Dashboard people-registry behavior:

  • sidebar shows enrolled guests with face + scene thumbnails
  • operators can upload a new face image to create or extend a guest profile
  • operators can attach additional photos to an existing guest profile
  • operators can download or delete one guest, or reset the whole registry

Core/direct_camera_service.py serves its own camera UI from external web assets under Web/ rather than embedding HTML/CSS/JS in Python.

Runtime State Files

  • Data/Settings/config.json
  • Data/Runtime/autonomous_state.json
  • Data/Runtime/runtime_health.json
  • Data/Runtime/error_counters.json
  • Data/Runtime/error_events.jsonl
  • Data/Runtime/upload_db.json
  • Data/Audio/
  • Data/Settings/audio_prompt_records.json
  • photos/people/
  • photos/Captures/
  • photos/samples/

These runtime JSON files are generated lazily. In a clean project tree, some of them will not exist until the corresponding component starts writing state.

Core Config Blocks

  • mode: runtime mode.
  • vision: backend/runtime, strict YOLO, context stream settings, hard lock, retake, framing thresholds, greeting replay, and AI capture replay.
  • vision: backend/runtime, strict YOLO, context stream settings, hard lock, retake, framing thresholds, greeting replay, AI capture replay, and face-recognition controls.
  • watchdog: WS backoff, component restart delay, capture retry policy.

Notes

  • Capture choreography is unified across manual trigger, autonomous flow, and dashboard capture.
  • Hands/replay behavior during capture remains driven by replay files in Data/G1.
  • Replay recordings created from the dashboard are stored directly under Data/G1 and become selectable as active replays without restart.
  • Imported AI prompt recordings are stored under Data/Audio/ and indexed by Data/Settings/audio_prompt_records.json.
  • In audio prompt mode, AI detection, greeting, confirmation, countdown, refusal, retake, and thank-you situations use recorded clips first.
  • After a fixed prompt finishes, runtime returns to normal Gemini conversation flow automatically.