13 KiB
Current Runtime
Production runtime architecture (as implemented now):
photo_sanad.sh -> voice_sanad.py -> persistent component loops + WS supervisor -> mode-gated command handling -> unified replay/capture -> server/dashboard/upload
Process Model
Scripts/photo_sanad.shresetsmode.current_modetomode.default_modeon each launch.- Default full runtime path:
- startup mode is normally
manual photo_sanad.shresolves active PulseAudio sink/source, launchesCore/direct_camera_service.pyin theteleimagerenv, then startsGemini/voice_sanad.pyin the gemini envCore/direct_camera_service.pyis backend-first and serves external UI assets fromWeb/direct_camera.html,Web/direct_camera.css, andWeb/direct_camera.js- preferred RealSense default is read from
Data/Settings/config.json -> camera.preferred_realsense_serialand falls back to another detected camera if absent - in the default full runtime path,
Gemini/voice_sanad.py, the dashboard server, the direct camera server, and replay/trigger services remain running in bothmanualandai - in the full runtime path,
AUTONOMOUS_ENABLEis auto-armed by default so dashboard mode can switch frommanualtoaiwithout restart - Optional lean manual path:
MANUAL_LEAN_RUNTIME=1- skips the direct camera server and the heavy manual/AI runtime services
- keeps Gemini + dashboard only
Gemini/voice_sanad.pyis the single long-lived orchestrator process.- Runtime logs are centralized under
Logs/, with one stable file per component. - It starts these loops once and keeps them alive:
capture_micreceive_audioplay_audiokeepaliveModes/Manual/trigger_loop.pyin the full runtime path- autonomous-mode supervisor in the full runtime path
Modes/AI/autonomous_manager.pyonly whileAUTONOMOUS_ENABLE=1and runtime mode isai- Runtime health writer (
Data/Runtime/runtime_health.json) - Mode policy sync
- Gemini WebSocket is managed by a dedicated reconnect supervisor. WS reconnect does not restart other loops.
- In the normal full runtime path, mode switches change gating/state only; they do not tear down
voice_sanad.py, the dashboard server, the direct camera server, or replay services.
Mode Control
- Source of truth is
Data/Settings/config.json: mode.current_modeinmanual|ai- launcher resets
mode.current_modetomode.default_modeat process start - API updates mode live via
Server/photo_server.py: /api/set_mode- Voice command gating is enforced on user transcription events in
Gemini/gemini_voice.py.
Mode Semantics
| Mode | Voice photo commands (request_photo/yes_photo/no_photo) |
Manual R2+X | Autonomous flow |
|---|---|---|---|
manual |
Off | On | Paused |
ai |
On | On | On only if AUTONOMOUS_ENABLE=1 |
Additional mode rules:
- Gemini conversation can stay active in both modes when
gemini.mic_enabled=true. - In full runtime,
manualstill includes the direct camera server, replay/trigger, uploader, and dashboard capture services. - In full runtime, switching dashboard mode from
manualtoaidoes not restart the process. - In full runtime, switching between
manualandaidoes not stopvoice_sanad.py, the dashboard server, the direct camera server, or replay/trigger services. - If
AUTONOMOUS_ENABLE=1, autonomous manager is armed and starts live when runtime mode becomesai. - Switching back to
manualpauses autonomous flow again. - In optional
MANUAL_LEAN_RUNTIME=1, capture/replay/autonomous services are intentionally unavailable.
Removed from this project:
- command-mode functionality was extracted to
G1_Lootah/AI_Command
Remote Safety Controls
R2+X: starts replay + photographer talk + unified capture pipeline.R2+L1: global hard cancel safety combo (active in runtime loops):- cancels pending capture
- cancels active replay path
- resets autonomous interaction session to
IDLE
AI/Autonomous Runtime
State machine in Modes/AI/autonomous_manager.py:
IDLE -> WAIT_CONFIRM -> FRAMING -> COUNTDOWN -> RETAKE_CONFIRM (optional) -> COMPLETE -> IDLE
Special blocked state:
IDLE_BLOCKED when strict YOLO readiness fails.
Behavior:
- Autonomous manager is supervised by runtime mode:
manual-> pausedai-> active whenAUTONOMOUS_ENABLE=1- In full runtime, autonomous services can already be armed while still paused in
manual, so mode switches are live. - In full runtime, the direct camera server and replay infrastructure remain started while autonomous manager is paused in
manual. - On stable intent, manager opens audio gate, triggers a short greeting-hand replay, and asks whether the visitor wants a photo.
- On stable single-person intent, manager can identify a returning guest or enroll a new guest into
photos/people/. - Group-first greeting is used when group is detected.
- Confirmation uses flag commands from voice layer (
request_photo.flag,confirm_yes.flag,confirm_no.flag). - Hard target lock can pin one subject/group through the session.
- Framing checks: center, size, blur, exposure, headroom, eye-line.
- AI greeting replay is controlled by
vision.autonomous_greeting_replay_enabledandvision.autonomous_greeting_replay_file. - AI photo-time replay is controlled by
vision.autonomous_capture_replay_enabled. - When AI photo-time replay is enabled, autonomous capture uses the active replay file from
Data/Settings/config.json -> replay.active_file, same as manualR2+X. - When a capture succeeds for an identified guest, the saved photo is attached into that guest's folder in
photos/people/. - After capture, retake recommendation can move flow to
RETAKE_CONFIRM(max retakes from config). - On completion: CTA prompt and cooldown reset.
Vision Runtime
Modes/AI/vision_detector.py provides:
- Backend selection
normal|yolo. - YOLO runtime selection
ultralytics|opencv. - Person/face detection and group clustering.
- Intent detection using depth-first logic and bbox-area fallback.
- Target lock fields:
target_lock_active,target_lock_type,target_lock_id,target_switch_blocked_count- Camera/depth health fields:
camera_ok,depth_ok,camera_restarts,depth_restarts
Strict production gate:
- If
vision.yolo_strict_required=trueand YOLO readiness is not valid, AI session is blocked inIDLE_BLOCKED.
Gemini Integration
Gemini/gemini_voice.py:
- Uses WS attach/detach model:
attach_ws()detach_ws()is_ws_connected()- Live-safe sends:
send_text_prompt_live()send_vision_context_live()- Command matching uses user transcription events, not model text.
- Continuous vision context is streamed from autonomous manager.
- Context can be silent (
vision.gemini_context_silent=true) and model audio is suppressed for context-only turns. - Exposes runtime health snapshot for dashboard/API.
- In
manual, Gemini conversation can remain available while AI photo flags stay disabled. - Mic state is controlled live through
/api/micand/api/set_mic.
Unified Capture Pipeline
All capture paths use Server/capture_service.py:
- Replay execution + trigger marker callback capture.
- Timed fallback capture if trigger marker is missing.
- Capture retries using watchdog settings:
watchdog.camera_capture_retry_countwatchdog.camera_capture_retry_delay_sec- Upload trigger flag is touched after successful capture.
Replay integrity is validated at startup and fallback replay can be selected automatically.
Component Recovery (Watchdog)
- WS failure: reconnect WS channel only.
- Mic failure: restart mic component only.
- Speaker failure: restart speaker component only.
- Detector frame starvation: recover detector camera/depth inputs only.
- Capture camera failure: retry capture call only.
- Process stays alive unless startup fatal occurs (for example empty Gemini API key).
Server and Dashboard
Server/photo_server.py + Web/gallery.js provide:
- Mode APIs:
/api/mode,/api/set_mode,/api/mode_policy- Mic APIs:
/api/mic,/api/set_mic- Detector/AI readiness APIs:
/api/detector_backend,/api/set_detector_backend/api/ai_readiness- AI options APIs:
/api/ai_options/api/set_ai_options?hard_target_lock_enabled=0|1&retake_prompt_enabled=0|1&autonomous_greeting_replay_enabled=0|1&autonomous_greeting_replay_file=...&autonomous_capture_replay_enabled=0|1&face_recognition_enabled=0|1&face_recognition_threshold=...- Replay APIs:
/api/replays/api/get_replay/api/set_replay?name=.../api/delete_replay?name=.../api/rename_replay?old=...&new=.../api/download_replay?name=.../api/replay_record_status/api/replay_record_start?name=...&seconds=.../api/replay_test_status/api/test_replay?name=.../api/upload_replay- Runtime state APIs:
/api/autonomous_state/api/runtime_health- Camera APIs:
/api/camera_health/api/camera_sources/api/set_camera_source?source=.../api/set_camera_resolution?width=...&height=...&fps=.../api/set_preferred_camera?serial=...- Photo APIs:
/api/capture,/api/photos,/api/delete,/api/reupload,/api/upload_now,/api/download_zip- Live preview:
/preview.mjpg- preview is off by default and only runs when requested from the dashboard
- preview camera/OpenCV is loaded lazily when preview is requested
Dashboard panels include mode controls, detector backend/readiness, AI options, autonomous state, runtime health, live camera preview, camera source switching, camera resolution changes, preferred RealSense serial persistence, active replay selection, replay inventory management, and replay recording controls.
Replay-management rules:
- replay inventory covers the full
Data/G1tree - replay recording is allowed only in
manual - replay test/play is allowed only in
manual - rename/download/delete/upload remain available from the dashboard inventory tools
- People APIs:
/api/people/api/person_image?id=...&kind=face|scene/api/download_person?id=.../api/delete_person?id=.../api/reset_people/api/upload_person- Audio prompt APIs:
/api/audio_prompts/api/set_audio_prompt_mode?mode=audio|gemini/api/set_audio_prompt_fallback?enabled=0|1/api/audio_prompt_record_status/api/download_audio_prompt?key=.../api/delete_audio_prompt?key=.../api/upload_audio_prompt/api/audio_prompt_record
Dashboard audio-prompt behavior:
- operators can upload prerecorded WAV clips for each AI situation key
- operators can delete or download existing clips
- operators can record a prompt clip directly from text using the same Gemini replay path as
Project/SanadVoice/gemini_voice/sanad_replay.py - operators can switch fixed AI situation speech between:
audio: recorded prompt clips firstgemini: Gemini speech for those same fixed situations
- if a prompt clip is missing while
audio_prompts.mode=audio, runtime falls back to Gemini text whenaudio_prompts.fallback_to_gemini=true
Dashboard people-registry behavior:
- sidebar shows enrolled guests with face + scene thumbnails
- operators can upload a new face image to create or extend a guest profile
- operators can attach additional photos to an existing guest profile
- operators can download or delete one guest, or reset the whole registry
Core/direct_camera_service.py serves its own camera UI from external web assets under Web/ rather than embedding HTML/CSS/JS in Python.
Runtime State Files
Data/Settings/config.jsonData/Runtime/autonomous_state.jsonData/Runtime/runtime_health.jsonData/Runtime/error_counters.jsonData/Runtime/error_events.jsonlData/Runtime/upload_db.jsonData/Audio/Data/Settings/audio_prompt_records.jsonphotos/people/photos/Captures/photos/samples/
These runtime JSON files are generated lazily. In a clean project tree, some of them will not exist until the corresponding component starts writing state.
Core Config Blocks
mode: runtime mode.vision: backend/runtime, strict YOLO, context stream settings, hard lock, retake, framing thresholds, greeting replay, and AI capture replay.vision: backend/runtime, strict YOLO, context stream settings, hard lock, retake, framing thresholds, greeting replay, AI capture replay, and face-recognition controls.watchdog: WS backoff, component restart delay, capture retry policy.
Notes
- Capture choreography is unified across manual trigger, autonomous flow, and dashboard capture.
- Hands/replay behavior during capture remains driven by replay files in
Data/G1. - Replay recordings created from the dashboard are stored directly under
Data/G1and become selectable as active replays without restart. - Imported AI prompt recordings are stored under
Data/Audio/and indexed byData/Settings/audio_prompt_records.json. - In
audioprompt mode, AI detection, greeting, confirmation, countdown, refusal, retake, and thank-you situations use recorded clips first. - After a fixed prompt finishes, runtime returns to normal Gemini conversation flow automatically.