kassam abf0fb5688 Initial project commit

2026-04-12 18:52:37 +04:00

13 KiB

Raw Blame History

Current Runtime

Production runtime architecture (as implemented now):

photo_sanad.sh -> voice_sanad.py -> persistent component loops + WS supervisor -> mode-gated command handling -> unified replay/capture -> server/dashboard/upload

Process Model

Scripts/photo_sanad.sh resets mode.current_mode to mode.default_mode on each launch.
Default full runtime path:
startup mode is normally manual
photo_sanad.sh resolves active PulseAudio sink/source, launches Core/direct_camera_service.py in the teleimager env, then starts Gemini/voice_sanad.py in the gemini env
Core/direct_camera_service.py is backend-first and serves external UI assets from Web/direct_camera.html, Web/direct_camera.css, and Web/direct_camera.js
preferred RealSense default is read from Data/Settings/config.json -> camera.preferred_realsense_serial and falls back to another detected camera if absent
in the default full runtime path, Gemini/voice_sanad.py, the dashboard server, the direct camera server, and replay/trigger services remain running in both manual and ai
in the full runtime path, AUTONOMOUS_ENABLE is auto-armed by default so dashboard mode can switch from manual to ai without restart
Optional lean manual path:
MANUAL_LEAN_RUNTIME=1
skips the direct camera server and the heavy manual/AI runtime services
keeps Gemini + dashboard only
Gemini/voice_sanad.py is the single long-lived orchestrator process.
Runtime logs are centralized under Logs/, with one stable file per component.
It starts these loops once and keeps them alive:
capture_mic
receive_audio
play_audio
keepalive
Modes/Manual/trigger_loop.py in the full runtime path
autonomous-mode supervisor in the full runtime path
Modes/AI/autonomous_manager.py only while AUTONOMOUS_ENABLE=1 and runtime mode is ai
Runtime health writer (Data/Runtime/runtime_health.json)
Mode policy sync
Gemini WebSocket is managed by a dedicated reconnect supervisor. WS reconnect does not restart other loops.
In the normal full runtime path, mode switches change gating/state only; they do not tear down voice_sanad.py, the dashboard server, the direct camera server, or replay services.

Mode Control

Source of truth is Data/Settings/config.json:
mode.current_mode in manual|ai
launcher resets mode.current_mode to mode.default_mode at process start
API updates mode live via Server/photo_server.py:
/api/set_mode
Voice command gating is enforced on user transcription events in Gemini/gemini_voice.py.

Mode Semantics

Mode	Voice photo commands (`request_photo/yes_photo/no_photo`)	Manual R2+X	Autonomous flow
`manual`	Off	On	Paused
`ai`	On	On	On only if `AUTONOMOUS_ENABLE=1`

Additional mode rules:

Gemini conversation can stay active in both modes when gemini.mic_enabled=true.
In full runtime, manual still includes the direct camera server, replay/trigger, uploader, and dashboard capture services.
In full runtime, switching dashboard mode from manual to ai does not restart the process.
In full runtime, switching between manual and ai does not stop voice_sanad.py, the dashboard server, the direct camera server, or replay/trigger services.
If AUTONOMOUS_ENABLE=1, autonomous manager is armed and starts live when runtime mode becomes ai.
Switching back to manual pauses autonomous flow again.
In optional MANUAL_LEAN_RUNTIME=1, capture/replay/autonomous services are intentionally unavailable.

Removed from this project:

command-mode functionality was extracted to G1_Lootah/AI_Command

Remote Safety Controls

R2+X: starts replay + photographer talk + unified capture pipeline.
R2+L1: global hard cancel safety combo (active in runtime loops):
cancels pending capture
cancels active replay path
resets autonomous interaction session to IDLE

AI/Autonomous Runtime

State machine in Modes/AI/autonomous_manager.py:

IDLE -> WAIT_CONFIRM -> FRAMING -> COUNTDOWN -> RETAKE_CONFIRM (optional) -> COMPLETE -> IDLE

Special blocked state:

IDLE_BLOCKED when strict YOLO readiness fails.

Behavior:

Autonomous manager is supervised by runtime mode:
manual -> paused
ai -> active when AUTONOMOUS_ENABLE=1
In full runtime, autonomous services can already be armed while still paused in manual, so mode switches are live.
In full runtime, the direct camera server and replay infrastructure remain started while autonomous manager is paused in manual.
On stable intent, manager opens audio gate, triggers a short greeting-hand replay, and asks whether the visitor wants a photo.
On stable single-person intent, manager can identify a returning guest or enroll a new guest into photos/people/.
Group-first greeting is used when group is detected.
Confirmation uses flag commands from voice layer (request_photo.flag, confirm_yes.flag, confirm_no.flag).
Hard target lock can pin one subject/group through the session.
Framing checks: center, size, blur, exposure, headroom, eye-line.
AI greeting replay is controlled by vision.autonomous_greeting_replay_enabled and vision.autonomous_greeting_replay_file.
AI photo-time replay is controlled by vision.autonomous_capture_replay_enabled.
When AI photo-time replay is enabled, autonomous capture uses the active replay file from Data/Settings/config.json -> replay.active_file, same as manual R2+X.
When a capture succeeds for an identified guest, the saved photo is attached into that guest's folder in photos/people/.
After capture, retake recommendation can move flow to RETAKE_CONFIRM (max retakes from config).
On completion: CTA prompt and cooldown reset.

Vision Runtime

Modes/AI/vision_detector.py provides:

Backend selection normal|yolo.
YOLO runtime selection ultralytics|opencv.
Person/face detection and group clustering.
Intent detection using depth-first logic and bbox-area fallback.
Target lock fields:
target_lock_active, target_lock_type, target_lock_id, target_switch_blocked_count
Camera/depth health fields:
camera_ok, depth_ok, camera_restarts, depth_restarts

Strict production gate:

If vision.yolo_strict_required=true and YOLO readiness is not valid, AI session is blocked in IDLE_BLOCKED.

Gemini Integration

Gemini/gemini_voice.py:

Uses WS attach/detach model:
attach_ws()
detach_ws()
is_ws_connected()
Live-safe sends:
send_text_prompt_live()
send_vision_context_live()
Command matching uses user transcription events, not model text.
Continuous vision context is streamed from autonomous manager.
Context can be silent (vision.gemini_context_silent=true) and model audio is suppressed for context-only turns.
Exposes runtime health snapshot for dashboard/API.
In manual, Gemini conversation can remain available while AI photo flags stay disabled.
Mic state is controlled live through /api/mic and /api/set_mic.

Unified Capture Pipeline

All capture paths use Server/capture_service.py:

Replay execution + trigger marker callback capture.
Timed fallback capture if trigger marker is missing.
Capture retries using watchdog settings:
watchdog.camera_capture_retry_count
watchdog.camera_capture_retry_delay_sec
Upload trigger flag is touched after successful capture.

Replay integrity is validated at startup and fallback replay can be selected automatically.

Component Recovery (Watchdog)

WS failure: reconnect WS channel only.
Mic failure: restart mic component only.
Speaker failure: restart speaker component only.
Detector frame starvation: recover detector camera/depth inputs only.
Capture camera failure: retry capture call only.
Process stays alive unless startup fatal occurs (for example empty Gemini API key).

Server and Dashboard

Server/photo_server.py + Web/gallery.js provide:

Mode APIs:
/api/mode, /api/set_mode, /api/mode_policy
Mic APIs:
/api/mic, /api/set_mic
Detector/AI readiness APIs:
/api/detector_backend, /api/set_detector_backend
/api/ai_readiness
AI options APIs:
/api/ai_options
/api/set_ai_options?hard_target_lock_enabled=0|1&retake_prompt_enabled=0|1&autonomous_greeting_replay_enabled=0|1&autonomous_greeting_replay_file=...&autonomous_capture_replay_enabled=0|1&face_recognition_enabled=0|1&face_recognition_threshold=...
Replay APIs:
/api/replays
/api/get_replay
/api/set_replay?name=...
/api/delete_replay?name=...
/api/rename_replay?old=...&new=...
/api/download_replay?name=...
/api/replay_record_status
/api/replay_record_start?name=...&seconds=...
/api/replay_test_status
/api/test_replay?name=...
/api/upload_replay
Runtime state APIs:
/api/autonomous_state
/api/runtime_health
Camera APIs:
/api/camera_health
/api/camera_sources
/api/set_camera_source?source=...
/api/set_camera_resolution?width=...&height=...&fps=...
/api/set_preferred_camera?serial=...
Photo APIs:
/api/capture, /api/photos, /api/delete, /api/reupload, /api/upload_now, /api/download_zip
Live preview:
/preview.mjpg
preview is off by default and only runs when requested from the dashboard
preview camera/OpenCV is loaded lazily when preview is requested

Dashboard panels include mode controls, detector backend/readiness, AI options, autonomous state, runtime health, live camera preview, camera source switching, camera resolution changes, preferred RealSense serial persistence, active replay selection, replay inventory management, and replay recording controls.

Replay-management rules:

replay inventory covers the full Data/G1 tree
replay recording is allowed only in manual
replay test/play is allowed only in manual
rename/download/delete/upload remain available from the dashboard inventory tools
People APIs:
/api/people
/api/person_image?id=...&kind=face|scene
/api/download_person?id=...
/api/delete_person?id=...
/api/reset_people
/api/upload_person
Audio prompt APIs:
/api/audio_prompts
/api/set_audio_prompt_mode?mode=audio|gemini
/api/set_audio_prompt_fallback?enabled=0|1
/api/audio_prompt_record_status
/api/download_audio_prompt?key=...
/api/delete_audio_prompt?key=...
/api/upload_audio_prompt
/api/audio_prompt_record

Dashboard audio-prompt behavior:

operators can upload prerecorded WAV clips for each AI situation key
operators can delete or download existing clips
operators can record a prompt clip directly from text using the same Gemini replay path as Project/SanadVoice/gemini_voice/sanad_replay.py
operators can switch fixed AI situation speech between:
- audio: recorded prompt clips first
- gemini: Gemini speech for those same fixed situations
if a prompt clip is missing while audio_prompts.mode=audio, runtime falls back to Gemini text when audio_prompts.fallback_to_gemini=true

Dashboard people-registry behavior:

sidebar shows enrolled guests with face + scene thumbnails
operators can upload a new face image to create or extend a guest profile
operators can attach additional photos to an existing guest profile
operators can download or delete one guest, or reset the whole registry

Core/direct_camera_service.py serves its own camera UI from external web assets under Web/ rather than embedding HTML/CSS/JS in Python.

Runtime State Files

Data/Settings/config.json
Data/Runtime/autonomous_state.json
Data/Runtime/runtime_health.json
Data/Runtime/error_counters.json
Data/Runtime/error_events.jsonl
Data/Runtime/upload_db.json
Data/Audio/
Data/Settings/audio_prompt_records.json
photos/people/
photos/Captures/
photos/samples/

These runtime JSON files are generated lazily. In a clean project tree, some of them will not exist until the corresponding component starts writing state.

Core Config Blocks

mode: runtime mode.
vision: backend/runtime, strict YOLO, context stream settings, hard lock, retake, framing thresholds, greeting replay, and AI capture replay.
vision: backend/runtime, strict YOLO, context stream settings, hard lock, retake, framing thresholds, greeting replay, AI capture replay, and face-recognition controls.
watchdog: WS backoff, component restart delay, capture retry policy.

Notes

Capture choreography is unified across manual trigger, autonomous flow, and dashboard capture.
Hands/replay behavior during capture remains driven by replay files in Data/G1.
Replay recordings created from the dashboard are stored directly under Data/G1 and become selectable as active replays without restart.
Imported AI prompt recordings are stored under Data/Audio/ and indexed by Data/Settings/audio_prompt_records.json.
In audio prompt mode, AI detection, greeting, confirmation, countdown, refusal, retake, and thank-you situations use recorded clips first.
After a fixed prompt finishes, runtime returns to normal Gemini conversation flow automatically.

13 KiB Raw Blame History