273 lines
13 KiB
Markdown
273 lines
13 KiB
Markdown
# Current Runtime
|
|
|
|
Production runtime architecture (as implemented now):
|
|
|
|
`photo_sanad.sh -> voice_sanad.py -> persistent component loops + WS supervisor -> mode-gated command handling -> unified replay/capture -> server/dashboard/upload`
|
|
|
|
## Process Model
|
|
|
|
- `Scripts/photo_sanad.sh` resets `mode.current_mode` to `mode.default_mode` on each launch.
|
|
- Default full runtime path:
|
|
- startup mode is normally `manual`
|
|
- `photo_sanad.sh` resolves active PulseAudio sink/source, launches `Core/direct_camera_service.py` in the `teleimager` env, then starts `Gemini/voice_sanad.py` in the gemini env
|
|
- `Core/direct_camera_service.py` is backend-first and serves external UI assets from `Web/direct_camera.html`, `Web/direct_camera.css`, and `Web/direct_camera.js`
|
|
- preferred RealSense default is read from `Data/Settings/config.json -> camera.preferred_realsense_serial` and falls back to another detected camera if absent
|
|
- in the default full runtime path, `Gemini/voice_sanad.py`, the dashboard server, the direct camera server, and replay/trigger services remain running in both `manual` and `ai`
|
|
- in the full runtime path, `AUTONOMOUS_ENABLE` is auto-armed by default so dashboard mode can switch from `manual` to `ai` without restart
|
|
- Optional lean manual path:
|
|
- `MANUAL_LEAN_RUNTIME=1`
|
|
- skips the direct camera server and the heavy manual/AI runtime services
|
|
- keeps Gemini + dashboard only
|
|
- `Gemini/voice_sanad.py` is the single long-lived orchestrator process.
|
|
- Runtime logs are centralized under `Logs/`, with one stable file per component.
|
|
- It starts these loops once and keeps them alive:
|
|
- `capture_mic`
|
|
- `receive_audio`
|
|
- `play_audio`
|
|
- `keepalive`
|
|
- `Modes/Manual/trigger_loop.py` in the full runtime path
|
|
- autonomous-mode supervisor in the full runtime path
|
|
- `Modes/AI/autonomous_manager.py` only while `AUTONOMOUS_ENABLE=1` and runtime mode is `ai`
|
|
- Runtime health writer (`Data/Runtime/runtime_health.json`)
|
|
- Mode policy sync
|
|
- Gemini WebSocket is managed by a dedicated reconnect supervisor. WS reconnect does not restart other loops.
|
|
- In the normal full runtime path, mode switches change gating/state only; they do not tear down `voice_sanad.py`, the dashboard server, the direct camera server, or replay services.
|
|
|
|
## Mode Control
|
|
|
|
- Source of truth is `Data/Settings/config.json`:
|
|
- `mode.current_mode` in `manual|ai`
|
|
- launcher resets `mode.current_mode` to `mode.default_mode` at process start
|
|
- API updates mode live via `Server/photo_server.py`:
|
|
- `/api/set_mode`
|
|
- Voice command gating is enforced on user transcription events in `Gemini/gemini_voice.py`.
|
|
|
|
## Mode Semantics
|
|
|
|
| Mode | Voice photo commands (`request_photo/yes_photo/no_photo`) | Manual R2+X | Autonomous flow |
|
|
|---|---|---|---|
|
|
| `manual` | Off | On | Paused |
|
|
| `ai` | On | On | On only if `AUTONOMOUS_ENABLE=1` |
|
|
|
|
Additional mode rules:
|
|
- Gemini conversation can stay active in both modes when `gemini.mic_enabled=true`.
|
|
- In full runtime, `manual` still includes the direct camera server, replay/trigger, uploader, and dashboard capture services.
|
|
- In full runtime, switching dashboard mode from `manual` to `ai` does not restart the process.
|
|
- In full runtime, switching between `manual` and `ai` does not stop `voice_sanad.py`, the dashboard server, the direct camera server, or replay/trigger services.
|
|
- If `AUTONOMOUS_ENABLE=1`, autonomous manager is armed and starts live when runtime mode becomes `ai`.
|
|
- Switching back to `manual` pauses autonomous flow again.
|
|
- In optional `MANUAL_LEAN_RUNTIME=1`, capture/replay/autonomous services are intentionally unavailable.
|
|
|
|
Removed from this project:
|
|
- command-mode functionality was extracted to `G1_Lootah/AI_Command`
|
|
|
|
## Remote Safety Controls
|
|
|
|
- `R2+X`: starts replay + photographer talk + unified capture pipeline.
|
|
- `R2+L1`: global hard cancel safety combo (active in runtime loops):
|
|
- cancels pending capture
|
|
- cancels active replay path
|
|
- resets autonomous interaction session to `IDLE`
|
|
|
|
## AI/Autonomous Runtime
|
|
|
|
State machine in `Modes/AI/autonomous_manager.py`:
|
|
|
|
`IDLE -> WAIT_CONFIRM -> FRAMING -> COUNTDOWN -> RETAKE_CONFIRM (optional) -> COMPLETE -> IDLE`
|
|
|
|
Special blocked state:
|
|
|
|
`IDLE_BLOCKED` when strict YOLO readiness fails.
|
|
|
|
Behavior:
|
|
|
|
- Autonomous manager is supervised by runtime mode:
|
|
- `manual` -> paused
|
|
- `ai` -> active when `AUTONOMOUS_ENABLE=1`
|
|
- In full runtime, autonomous services can already be armed while still paused in `manual`, so mode switches are live.
|
|
- In full runtime, the direct camera server and replay infrastructure remain started while autonomous manager is paused in `manual`.
|
|
- On stable intent, manager opens audio gate, triggers a short greeting-hand replay, and asks whether the visitor wants a photo.
|
|
- On stable single-person intent, manager can identify a returning guest or enroll a new guest into `photos/people/`.
|
|
- Group-first greeting is used when group is detected.
|
|
- Confirmation uses flag commands from voice layer (`request_photo.flag`, `confirm_yes.flag`, `confirm_no.flag`).
|
|
- Hard target lock can pin one subject/group through the session.
|
|
- Framing checks: center, size, blur, exposure, headroom, eye-line.
|
|
- AI greeting replay is controlled by `vision.autonomous_greeting_replay_enabled` and `vision.autonomous_greeting_replay_file`.
|
|
- AI photo-time replay is controlled by `vision.autonomous_capture_replay_enabled`.
|
|
- When AI photo-time replay is enabled, autonomous capture uses the active replay file from `Data/Settings/config.json -> replay.active_file`, same as manual `R2+X`.
|
|
- When a capture succeeds for an identified guest, the saved photo is attached into that guest's folder in `photos/people/`.
|
|
- After capture, retake recommendation can move flow to `RETAKE_CONFIRM` (max retakes from config).
|
|
- On completion: CTA prompt and cooldown reset.
|
|
|
|
## Vision Runtime
|
|
|
|
`Modes/AI/vision_detector.py` provides:
|
|
|
|
- Backend selection `normal|yolo`.
|
|
- YOLO runtime selection `ultralytics|opencv`.
|
|
- Person/face detection and group clustering.
|
|
- Intent detection using depth-first logic and bbox-area fallback.
|
|
- Target lock fields:
|
|
- `target_lock_active`, `target_lock_type`, `target_lock_id`, `target_switch_blocked_count`
|
|
- Camera/depth health fields:
|
|
- `camera_ok`, `depth_ok`, `camera_restarts`, `depth_restarts`
|
|
|
|
Strict production gate:
|
|
|
|
- If `vision.yolo_strict_required=true` and YOLO readiness is not valid, AI session is blocked in `IDLE_BLOCKED`.
|
|
|
|
## Gemini Integration
|
|
|
|
`Gemini/gemini_voice.py`:
|
|
|
|
- Uses WS attach/detach model:
|
|
- `attach_ws()`
|
|
- `detach_ws()`
|
|
- `is_ws_connected()`
|
|
- Live-safe sends:
|
|
- `send_text_prompt_live()`
|
|
- `send_vision_context_live()`
|
|
- Command matching uses user transcription events, not model text.
|
|
- Continuous vision context is streamed from autonomous manager.
|
|
- Context can be silent (`vision.gemini_context_silent=true`) and model audio is suppressed for context-only turns.
|
|
- Exposes runtime health snapshot for dashboard/API.
|
|
- In `manual`, Gemini conversation can remain available while AI photo flags stay disabled.
|
|
- Mic state is controlled live through `/api/mic` and `/api/set_mic`.
|
|
|
|
## Unified Capture Pipeline
|
|
|
|
All capture paths use `Server/capture_service.py`:
|
|
|
|
- Replay execution + trigger marker callback capture.
|
|
- Timed fallback capture if trigger marker is missing.
|
|
- Capture retries using watchdog settings:
|
|
- `watchdog.camera_capture_retry_count`
|
|
- `watchdog.camera_capture_retry_delay_sec`
|
|
- Upload trigger flag is touched after successful capture.
|
|
|
|
Replay integrity is validated at startup and fallback replay can be selected automatically.
|
|
|
|
## Component Recovery (Watchdog)
|
|
|
|
- WS failure: reconnect WS channel only.
|
|
- Mic failure: restart mic component only.
|
|
- Speaker failure: restart speaker component only.
|
|
- Detector frame starvation: recover detector camera/depth inputs only.
|
|
- Capture camera failure: retry capture call only.
|
|
- Process stays alive unless startup fatal occurs (for example empty Gemini API key).
|
|
|
|
## Server and Dashboard
|
|
|
|
`Server/photo_server.py` + `Web/gallery.js` provide:
|
|
|
|
- Mode APIs:
|
|
- `/api/mode`, `/api/set_mode`, `/api/mode_policy`
|
|
- Mic APIs:
|
|
- `/api/mic`, `/api/set_mic`
|
|
- Detector/AI readiness APIs:
|
|
- `/api/detector_backend`, `/api/set_detector_backend`
|
|
- `/api/ai_readiness`
|
|
- AI options APIs:
|
|
- `/api/ai_options`
|
|
- `/api/set_ai_options?hard_target_lock_enabled=0|1&retake_prompt_enabled=0|1&autonomous_greeting_replay_enabled=0|1&autonomous_greeting_replay_file=...&autonomous_capture_replay_enabled=0|1&face_recognition_enabled=0|1&face_recognition_threshold=...`
|
|
- Replay APIs:
|
|
- `/api/replays`
|
|
- `/api/get_replay`
|
|
- `/api/set_replay?name=...`
|
|
- `/api/delete_replay?name=...`
|
|
- `/api/rename_replay?old=...&new=...`
|
|
- `/api/download_replay?name=...`
|
|
- `/api/replay_record_status`
|
|
- `/api/replay_record_start?name=...&seconds=...`
|
|
- `/api/replay_test_status`
|
|
- `/api/test_replay?name=...`
|
|
- `/api/upload_replay`
|
|
- Runtime state APIs:
|
|
- `/api/autonomous_state`
|
|
- `/api/runtime_health`
|
|
- Camera APIs:
|
|
- `/api/camera_health`
|
|
- `/api/camera_sources`
|
|
- `/api/set_camera_source?source=...`
|
|
- `/api/set_camera_resolution?width=...&height=...&fps=...`
|
|
- `/api/set_preferred_camera?serial=...`
|
|
- Photo APIs:
|
|
- `/api/capture`, `/api/photos`, `/api/delete`, `/api/reupload`, `/api/upload_now`, `/api/download_zip`
|
|
- Live preview:
|
|
- `/preview.mjpg`
|
|
- preview is off by default and only runs when requested from the dashboard
|
|
- preview camera/OpenCV is loaded lazily when preview is requested
|
|
|
|
Dashboard panels include mode controls, detector backend/readiness, AI options, autonomous state, runtime health, live camera preview, camera source switching, camera resolution changes, preferred RealSense serial persistence, active replay selection, replay inventory management, and replay recording controls.
|
|
|
|
Replay-management rules:
|
|
- replay inventory covers the full `Data/G1` tree
|
|
- replay recording is allowed only in `manual`
|
|
- replay test/play is allowed only in `manual`
|
|
- rename/download/delete/upload remain available from the dashboard inventory tools
|
|
- People APIs:
|
|
- `/api/people`
|
|
- `/api/person_image?id=...&kind=face|scene`
|
|
- `/api/download_person?id=...`
|
|
- `/api/delete_person?id=...`
|
|
- `/api/reset_people`
|
|
- `/api/upload_person`
|
|
- Audio prompt APIs:
|
|
- `/api/audio_prompts`
|
|
- `/api/set_audio_prompt_mode?mode=audio|gemini`
|
|
- `/api/set_audio_prompt_fallback?enabled=0|1`
|
|
- `/api/audio_prompt_record_status`
|
|
- `/api/download_audio_prompt?key=...`
|
|
- `/api/delete_audio_prompt?key=...`
|
|
- `/api/upload_audio_prompt`
|
|
- `/api/audio_prompt_record`
|
|
|
|
Dashboard audio-prompt behavior:
|
|
- operators can upload prerecorded WAV clips for each AI situation key
|
|
- operators can delete or download existing clips
|
|
- operators can record a prompt clip directly from text using the same Gemini replay path as `Project/SanadVoice/gemini_voice/sanad_replay.py`
|
|
- operators can switch fixed AI situation speech between:
|
|
- `audio`: recorded prompt clips first
|
|
- `gemini`: Gemini speech for those same fixed situations
|
|
- if a prompt clip is missing while `audio_prompts.mode=audio`, runtime falls back to Gemini text when `audio_prompts.fallback_to_gemini=true`
|
|
|
|
Dashboard people-registry behavior:
|
|
- sidebar shows enrolled guests with face + scene thumbnails
|
|
- operators can upload a new face image to create or extend a guest profile
|
|
- operators can attach additional photos to an existing guest profile
|
|
- operators can download or delete one guest, or reset the whole registry
|
|
|
|
`Core/direct_camera_service.py` serves its own camera UI from external web assets under `Web/` rather than embedding HTML/CSS/JS in Python.
|
|
|
|
## Runtime State Files
|
|
|
|
- `Data/Settings/config.json`
|
|
- `Data/Runtime/autonomous_state.json`
|
|
- `Data/Runtime/runtime_health.json`
|
|
- `Data/Runtime/error_counters.json`
|
|
- `Data/Runtime/error_events.jsonl`
|
|
- `Data/Runtime/upload_db.json`
|
|
- `Data/Audio/`
|
|
- `Data/Settings/audio_prompt_records.json`
|
|
- `photos/people/`
|
|
- `photos/Captures/`
|
|
- `photos/samples/`
|
|
|
|
These runtime JSON files are generated lazily. In a clean project tree, some of them will not exist until the corresponding component starts writing state.
|
|
|
|
## Core Config Blocks
|
|
|
|
- `mode`: runtime mode.
|
|
- `vision`: backend/runtime, strict YOLO, context stream settings, hard lock, retake, framing thresholds, greeting replay, and AI capture replay.
|
|
- `vision`: backend/runtime, strict YOLO, context stream settings, hard lock, retake, framing thresholds, greeting replay, AI capture replay, and face-recognition controls.
|
|
- `watchdog`: WS backoff, component restart delay, capture retry policy.
|
|
|
|
## Notes
|
|
|
|
- Capture choreography is unified across manual trigger, autonomous flow, and dashboard capture.
|
|
- Hands/replay behavior during capture remains driven by replay files in `Data/G1`.
|
|
- Replay recordings created from the dashboard are stored directly under `Data/G1` and become selectable as active replays without restart.
|
|
- Imported AI prompt recordings are stored under `Data/Audio/` and indexed by `Data/Settings/audio_prompt_records.json`.
|
|
- In `audio` prompt mode, AI detection, greeting, confirmation, countdown, refusal, retake, and thank-you situations use recorded clips first.
|
|
- After a fixed prompt finishes, runtime returns to normal Gemini conversation flow automatically.
|