Document backlog priorities and add security audit report

This commit is contained in:
2026-04-06 22:17:19 -04:00
parent 9fb6b480eb
commit 09cf230820
4 changed files with 412 additions and 29 deletions

View File

@@ -35,7 +35,7 @@ The project uses `just` as the primary task runner.
- `just build`: Performs a full production build (Frontend then Rust). - `just build`: Performs a full production build (Frontend then Rust).
- `just docker-build`: Builds the local Docker image. - `just docker-build`: Builds the local Docker image.
- **Verification:** - **Verification:**
- `just check`: Runs all checks (Rust fmt, clippy, check; Frontend typecheck, build). - `just release-check`: Runs all checks (Rust fmt, clippy, check; Frontend typecheck, build).
- `just check-rust`: Rust-only verification. - `just check-rust`: Rust-only verification.
- **Testing:** - **Testing:**
- `just test`: Runs all Rust tests. - `just test`: Runs all Rust tests.

View File

@@ -15,36 +15,183 @@ These are deliberate design decisions, not omissions. Do not add them.
## High Priority ## High Priority
### Behavior-Preserving Refactor Pass Testing policy for this section:
- Keep the current product behavior exactly the same while improving internal structure
- Refactor `web/src/components/JobManager.tsx` into smaller components and hooks without changing screens, filters, polling, SSE updates, or job actions
- Centralize duplicated byte/time/reduction formatting logic into shared utilities and preserve current output formatting
- Preserve the current realtime model, but make ownership clearer: job/config/system events via SSE, resource metrics via polling
- Add regression coverage around planner decisions, watcher behavior, job lifecycle transitions, and decision explanation rendering before deeper refactors
- Document the current planner heuristics and hardware fallback rules so future cleanup does not accidentally change behavior
### Planning / Simulation Mode - Backend/unit/integration coverage and Playwright coverage are exit criteria for each item below.
- Add a first-class simulation flow that answers what Alchemist would transcode, remux, or skip without mutating the library - Do not treat "more tests" as a standalone product track; attach the required coverage to the feature or refactor that needs it.
- Show estimated total bytes recoverable, action counts, top skip reasons, and per-file predicted actions
- Support comparing current settings against alternative profiles, codec targets, or threshold snapshots
- Reuse the scanner, analyzer, and planner, but stop before executor and promotion stages
### Per-File Encode History ### 1. Engine Lifecycle Controls
- When a file has been processed more than once (retry, re-queue after settings change, manual re-run), show the full history of attempts in the job detail panel
- Each attempt should show: date, outcome (completed/failed/skipped), encode stats if applicable (size before/after, codec, duration), and failure reason if failed
- The data is already in the DB across `jobs`, `encode_stats`, and `job_failure_explanations` — this is primarily a UI feature
- Useful for understanding why a file kept failing, or comparing quality before/after a settings change
### E2E Test Coverage #### Goal
- Expand Playwright tests for more UI flows - Make engine lifecycle controls real, explicit, and operator-safe from the header/About surface.
- Test job queue management scenarios
- Test error states and recovery flows
### AMD AV1 Validation #### Scope
- Validate and tune the existing AMD AV1 paths on real hardware - Redesign the About screen so it fits the current visual language.
- Cover Linux VAAPI and Windows AMF separately - Add a **Restart Engine** action that restarts the engine loop without killing the Alchemist process.
- Verify encoder selection, fallback behavior, and quality/performance defaults - Add a **Shutdown Alchemist** action that cancels active jobs immediately and exits the process cleanly.
- Do not treat this as support-from-scratch: encoder wiring and hardware detection already exist - Define and surface the lifecycle states needed to make restart and shutdown understandable in the UI.
#### Non-Goals
- Do not re-exec the whole app process to implement restart.
- Do not drain active jobs to completion on shutdown; shutdown means cancel and exit.
#### Dependencies
- Backend lifecycle endpoints and orchestration semantics for restart and shutdown.
- Reliable event/state propagation so the UI can reflect transient lifecycle states without stale polling or SSE behavior.
#### Acceptance Criteria
- Restart tears down and reinitializes the engine loop while the binary stays alive.
- Shutdown stops accepting new work, cancels active jobs, persists the right terminal states, and exits cleanly.
- Job rows, logs, and toasts clearly distinguish pause, drain, restart, cancellation, and shutdown.
- The About surface exposes restart and shutdown with confirmation and clear failure handling.
#### Required Tests
- Backend tests for restart/shutdown semantics and lifecycle state transitions.
- Playwright coverage for About screen controls, confirmations, success states, and failure states.
#### Solution
- Add a dedicated engine lifecycle API instead of overloading pause/drain:
- Add authenticated lifecycle routes for `restart engine` and `shutdown app`.
- Keep restart scoped to the engine loop only; do not re-exec the binary.
- Keep shutdown as cancel-all-and-exit; do not reuse drain semantics.
- Introduce a server-owned shutdown trigger so HTTP-initiated shutdown uses the same shutdown path as Ctrl+C and SIGTERM:
- Extend `RunServerArgs` and `AppState` with a shutdown signal sender.
- Update `axum::serve(...).with_graceful_shutdown(...)` to also listen for an internal shutdown signal.
- Add an explicit lifecycle transition guard:
- Reject overlapping restart/shutdown requests while a lifecycle action is already in progress.
- Surface lifecycle state through `/api/engine/status` so the UI can render restarting/shutting-down states cleanly.
- Implement restart as an engine-loop reset, not a process restart:
- Pause new intake.
- Cancel active jobs immediately through the orchestrator.
- Clear drain state and any temporary lifecycle flags.
- Reinitialize the engine loop state needed to resume normal processing.
- Resume only if the scheduler is not actively pausing the engine.
- Implement shutdown as a process-level cancel-and-exit flow:
- Pause intake.
- Cancel all active jobs immediately.
- Give cancellation and persistence a short bounded window to flush terminal state.
- Trigger the internal shutdown signal so the server exits through the same top-level path already used for signals.
- Split the backend work by file responsibility:
- `src/media/processor.rs`: add restart/shutdown lifecycle methods and transient lifecycle state.
- `src/server/mod.rs`: wire new lifecycle routes and internal shutdown signaling into `AppState` and server startup.
- `src/server/jobs.rs` or a new dedicated engine/server lifecycle module: implement authenticated handlers for restart/shutdown.
- `src/main.rs`: keep the top-level exit behavior but make sure HTTP-triggered shutdown lands in the same path as signal-triggered shutdown.
- Update the UI in two passes:
- Redesign `web/src/components/AboutDialog.tsx` to match the current visual system and include restart/shutdown actions plus confirmation UX.
- Update `web/src/components/HeaderActions.tsx` and any engine-status consumers to understand the new lifecycle states.
- Add coverage before shipping:
- Backend tests for restart, shutdown, overlapping request rejection, and status payload transitions.
- Playwright tests for About modal actions, confirmation dialogs, success flows, disabled/loading states, and failure toasts.
### 2. Planner and Lifecycle Documentation
#### Goal
- Lock down current behavior before deeper refactors by documenting planner heuristics, hardware fallback rules, and engine lifecycle semantics.
#### Scope
- Document the current planner heuristics and stable skip/transcode/remux decision boundaries.
- Document hardware fallback rules and vendor/backend selection semantics.
- Document lifecycle semantics for pause, drain, restart, cancel, and shutdown.
#### Non-Goals
- No product behavior changes.
- No speculative redesign of the planner or lifecycle model.
#### Dependencies
- Cross-check against the existing backend behavior and tests, not just intended behavior.
#### Acceptance Criteria
- Future cleanup work has a single documented source of truth for planner and lifecycle behavior.
- The docs are specific enough to catch accidental behavior changes during refactors.
#### Required Tests
- Add or tighten assertions where documentation work uncovers missing coverage around planner decisions, hardware fallback, or lifecycle states.
#### Solution
### 3. Per-File Encode History
#### Goal
- Show a complete attempt history in the job detail panel for files that have been processed more than once.
#### Scope
- Group history by canonical file identity rather than path-only matching.
- Show date, outcome, encode stats where applicable, and failure reason where applicable.
- Make repeated retries, re-queues after settings changes, and manual reruns understandable at a glance.
#### Non-Goals
- Do not turn this into a general media-management timeline.
- Do not rely on path-only grouping when a canonical identity is available.
#### Dependencies
- Query shaping across `jobs`, `encode_stats`, and `job_failure_explanations`.
- A stable canonical file identity strategy that survives path changes better than naive path matching.
#### Acceptance Criteria
- Job detail shows prior attempts for the same canonical file identity with enough detail to explain repeated outcomes.
- Operators can distinguish retry noise from truly separate processing attempts.
#### Required Tests
- Backend coverage for history lookup and canonical identity grouping.
- UI coverage for rendering mixed completed/failed/skipped histories.
#### Solution
### 4. Behavior-Preserving Refactor Pass
#### Goal
- Improve internal structure without changing visible product behavior.
#### Scope
- Refactor `web/src/components/JobManager.tsx` into smaller components and hooks without changing screens, filters, polling, SSE updates, or job actions.
- Centralize duplicated byte/time/reduction formatting logic into shared utilities while preserving current output formatting.
- Preserve the current realtime model, but make ownership clearer: job/config/system events via SSE, resource metrics via polling.
- Add regression coverage around planner decisions, watcher behavior, job lifecycle transitions, and decision explanation rendering before deeper refactors.
#### Non-Goals
- No new screens, filters, realtime behaviors, or job actions.
- No opportunistic product changes hidden inside the refactor.
#### Dependencies
- Planner/lifecycle documentation and regression coverage should land before deeper structural work.
#### Acceptance Criteria
- Existing behavior, strings, filters, and action flows remain stable.
- `JobManager` is decomposed enough that future feature work does not require editing a single monolithic file for unrelated changes.
- Realtime ownership is easier to reason about and less likely to regress.
#### Required Tests
- Keep current backend and Playwright suites green.
- Add targeted regression coverage before extracting behavior into hooks/components.
#### Solution
### 5. AMD AV1 Validation
#### Goal
- Validate and tune the existing AMD AV1 paths on real hardware.
#### Scope
- Cover Linux VAAPI and Windows AMF separately.
- Verify encoder selection, fallback behavior, and quality/performance defaults.
- Treat this as validation/tuning of existing wiring, not support-from-scratch.
#### Non-Goals
- Do not expand the stable support promise before validation is complete.
- Do not invent a fake validation story without real hardware runs.
#### Dependencies
- Access to representative Linux VAAPI and Windows AMF hardware.
- Repeatable manual verification notes and any scripted checks that can be automated.
#### Acceptance Criteria
- AMD AV1 is either validated with documented defaults and caveats, or explicitly left outside the supported matrix with clearer docs.
- Linux and Windows results are documented separately.
#### Required Tests
- Scripted verification where possible, plus recorded manual validation runs on real hardware.
#### Solution
--- ---
@@ -172,10 +319,14 @@ These are deliberate design decisions, not omissions. Do not add them.
- Not a cloud encoding service; no public hosting assumed - Not a cloud encoding service; no public hosting assumed
- No raw FFmpeg flag injection (see Out of Scope) - No raw FFmpeg flag injection (see Out of Scope)
#### Solution
### Library Intelligence ### Library Intelligence
- Expand recommendations beyond duplicate detection into remux-only opportunities, wasteful audio layouts, commentary/descriptive-track cleanup, and duplicate-ish title variants - Expand recommendations beyond duplicate detection into remux-only opportunities, wasteful audio layouts, commentary/descriptive-track cleanup, and duplicate-ish title variants
- Keep the feature focused on storage and library quality, not general media management - Keep the feature focused on storage and library quality, not general media management
#### Solution
### Auto-Priority Rules ### Auto-Priority Rules
- Define rules that automatically assign queue priority based on file attributes - Define rules that automatically assign queue priority based on file attributes
- Rule conditions: file path pattern (glob), file age, file size, source watch folder - Rule conditions: file path pattern (glob), file age, file size, source watch folder
@@ -183,11 +334,15 @@ These are deliberate design decisions, not omissions. Do not add them.
- Rules evaluated at enqueue time; manual priority overrides still win - Rules evaluated at enqueue time; manual priority overrides still win
- Configured in Settings alongside other library behavior - Configured in Settings alongside other library behavior
#### Solution
### Performance Optimizations ### Performance Optimizations
- Profile scanner/analyzer hot paths before changing behavior - Profile scanner/analyzer hot paths before changing behavior
- Only tune connection pooling after measuring database contention under load - Only tune connection pooling after measuring database contention under load
- Consider caching repeated FFprobe calls on identical files if profiling shows probe churn is material - Consider caching repeated FFprobe calls on identical files if profiling shows probe churn is material
#### Solution
### Audio Normalization ### Audio Normalization
- Apply EBU R128 loudness normalization to audio streams during transcode - Apply EBU R128 loudness normalization to audio streams during transcode
- Target: -23 LUFS integrated, -1 dBTP true peak (broadcast standard) - Target: -23 LUFS integrated, -1 dBTP true peak (broadcast standard)
@@ -198,9 +353,13 @@ These are deliberate design decisions, not omissions. Do not add them.
the job detail panel alongside existing encode stats the job detail panel alongside existing encode stats
- Do not normalize if audio is being copied (copy mode bypasses this) - Do not normalize if audio is being copied (copy mode bypasses this)
#### Solution
### UI Improvements ### UI Improvements
- Add keyboard shortcuts for common actions - Add keyboard shortcuts for common actions
#### Solution
### Notification Improvements ### Notification Improvements
- **Granular event types** — current events are too coarse. Add: - **Granular event types** — current events are too coarse. Add:
- `encode.started` — job moved from queued to encoding - `encode.started` — job moved from queued to encoding
@@ -210,56 +369,82 @@ These are deliberate design decisions, not omissions. Do not add them.
- `engine.idle` — queue drained, nothing left to process - `engine.idle` — queue drained, nothing left to process
- `daily.summary` — opt-in digest of the day's activity - `daily.summary` — opt-in digest of the day's activity
- **Per-target event filtering** — each notification target should - **Per-target event filtering** — each notification target should
independently choose which events it receives. Currently all targets independently choose which events it receives. Currently, all targets
get the same events. A Discord webhook might want everything; a get the same events. A Discord webhook might want everything; a
phone webhook might only want failures. phone webhook might only want failures.
- **Richer payloads** — completed job notifications should include - **Richer payloads** — completed job notifications should include
filename, input size, output size, space saved, and encode time. filename, input size, output size, space saved, and encode time.
Currently the payload is minimal. Currently, the payload is minimal.
- **Add Telegram integration** — bot token + chat ID, same event - **Add Telegram integration** — bot token + chat ID, same event
model as Discord. No new dependencies needed (reqwest already present). model as Discord. No new dependencies needed (reqwest already present).
- **Improve Discord notifications** — add bot token support where it meaningfully improves delivery or richer messaging.
- **Add email support** — SMTP with TLS. Lower priority than Telegram. - **Add email support** — SMTP with TLS. Lower priority than Telegram.
Most self-hosters already have Discord or Telegram. Most self-hosters already have Discord or Telegram.
#### Solution
--- ---
## Low Priority ## Low Priority
### Planning / Simulation Mode
- Not a current focus. If revisited, start with a single current-config dry-run before attempting comparison mode.
- Add a first-class simulation flow that answers what Alchemist would transcode, remux, or skip without mutating the library.
- Show estimated total bytes recoverable, action counts, top skip reasons, and per-file predicted actions.
- Reuse the scanner, analyzer, and planner, but stop before executor and promotion stages.
- Only add profile/codec/threshold comparison snapshots after the simple single-config flow proves useful.
#### Solution
### API Token Authentication + API Documentation ### API Token Authentication + API Documentation
- Add support for static bearer tokens as an alternative to session cookies - Add support for static bearer tokens as an alternative to session cookies
- Enables programmatic access from scripts, home automation (Home Assistant, n8n), and CLI tools without managing session state - Enables programmatic access from scripts, home automation (Home Assistant, n8n), and CLI tools without managing session state
- Tokens generated and revoked from Settings; no expiry by default, revocable any time - Tokens generated and revoked from Settings; no expiry by default, revocable any time
- Expand API documentation to cover all endpoints with request/response examples - Expand API documentation to cover all endpoints with request/response examples
#### Solution
### Passthrough Mode ### Passthrough Mode
- A toggle that keeps all watch folders and watcher active but prevents the planner from queuing new jobs - A toggle that keeps all watch folders and watcher active but prevents the planner from queuing new jobs
- Different from Pause — Pause stops active encodes; Passthrough lets the system observe and index the library without touching anything - Different from Pause — Pause stops active encodes; Passthrough lets the system observe and index the library without touching anything
- Useful when testing settings or onboarding a new library without triggering encodes immediately - Useful when testing settings or onboarding a new library without triggering encodes immediately
#### Solution
### Base URL / Subpath Configuration ### Base URL / Subpath Configuration
- Allow Alchemist to be served at a non-root path (e.g. `/alchemist/`) via `ALCHEMIST_BASE_URL` - Allow Alchemist to be served at a non-root path (e.g. `/alchemist/`) via `ALCHEMIST_BASE_URL`
- Common self-hosting pattern for reverse proxy setups running multiple services on one domain - Common self-hosting pattern for reverse proxy setups running multiple services on one domain
- Low urgency — most users run Alchemist on a dedicated subdomain or port - Low urgency — most users run Alchemist on a dedicated subdomain or port
#### Solution
### Features from DESIGN_PHILOSOPHY.md ### Features from DESIGN_PHILOSOPHY.md
- Add batch job templates - Add batch job templates
#### Solution
### Code Quality ### Code Quality
- Increase test coverage for edge cases - Increase test coverage for edge cases
- Add property-based testing for codec parameter generation - Add property-based testing for codec parameter generation
- Add fuzzing for FFprobe output parsing - Add fuzzing for FFprobe output parsing
#### Solution
### Documentation ### Documentation
- Add architecture diagrams - Add architecture diagrams
- Add contributor guide with development setup - Add contributor guide with development setup
- Video tutorials for common workflows - Video tutorials for common workflows
#### Solution
### Distribution ### Distribution
- Add Homebrew formula - Add Homebrew formula
- Add AUR package - Add AUR package
- Add Flatpak/Snap packages - Add Flatpak/Snap packages
- Improve Windows installer (WiX) with auto-updates - Improve Windows installer (WiX) with auto-updates
#### Solution
--- ---
## Completed (Recent) ## Completed (Recent)

View File

@@ -0,0 +1,124 @@
# Security Best Practices Report
## Executive Summary
I found one critical security bug and one additional high-severity issue in the setup/bootstrap flow.
The critical problem is that first-run setup is remotely accessible without authentication while the server listens on `0.0.0.0`. A network-reachable attacker can win the initial setup race, create the first admin account, and take over the instance.
I did not find evidence of major client-side XSS sinks or obvious SQL injection paths during this audit. Most of the remaining concerns I saw were hardening-level issues rather than immediately exploitable major bugs.
## Critical Findings
### ALCH-SEC-001
- Severity: Critical
- Location:
- `src/server/middleware.rs:80-86`
- `src/server/wizard.rs:95-210`
- `src/server/mod.rs:176-197`
- `README.md:61-79`
- Impact: Any attacker who can reach the service before the legitimate operator completes setup can create the first admin account and fully compromise the instance.
#### Evidence
`auth_middleware` exempts the full `/api/setup` namespace from authentication:
- `src/server/middleware.rs:80-86`
`setup_complete_handler` only checks `setup_required` and then creates the user, session cookie, and persisted config:
- `src/server/wizard.rs:95-210`
The server binds to all interfaces by default:
- `src/server/mod.rs:176-197`
The documented Docker quick-start publishes port `3000` directly:
- `README.md:61-79`
#### Why This Is Exploitable
On a fresh install, or any run where `setup_required == true`, the application accepts unauthenticated requests to `/api/setup/complete`. Because the listener binds `0.0.0.0`, that endpoint is reachable from any network that can reach the host unless an external firewall or reverse proxy blocks it.
That lets a remote attacker:
1. POST their own username and password to `/api/setup/complete`
2. Receive the initial authenticated session cookie
3. Persist attacker-controlled configuration and start operating as the admin user
This is a full-authentication-bypass takeover of the instance during bootstrap.
#### Recommended Fix
Require setup completion to come only from a trusted local origin during bootstrap, matching the stricter treatment already used for `/api/fs/*` during setup.
Minimal safe options:
1. Restrict `/api/setup/*` and `/api/settings/bundle` to loopback-only while `setup_required == true`.
2. Alternatively require an explicit one-time bootstrap secret/token generated on startup and printed locally.
3. Consider binding to `127.0.0.1` by default until setup is complete, then allowing an explicit public bind only after bootstrap.
#### Mitigation Until Fixed
- Do not expose the service to any network before setup is completed.
- Do not publish the container port directly on untrusted networks.
- Complete setup only through a local-only tunnel or host firewall rule.
## High Findings
### ALCH-SEC-002
- Severity: High
- Location:
- `src/server/middleware.rs:116-117`
- `src/server/settings.rs:244-285`
- `src/config.rs:366-390`
- `src/main.rs:369-383`
- `src/db.rs:2566-2571`
- Impact: During setup mode, an unauthenticated remote attacker can read and overwrite the full runtime configuration; after `--reset-auth`, this can expose existing notification endpoints/tokens and let the attacker reconfigure the instance before the operator reclaims it.
#### Evidence
While `setup_required == true`, `auth_middleware` explicitly allows `/api/settings/bundle` without authentication:
- `src/server/middleware.rs:116-117`
`get_settings_bundle_handler` returns the full `Config`, and `update_settings_bundle_handler` writes an attacker-supplied `Config` back to disk and runtime state:
- `src/server/settings.rs:244-285`
The config structure includes notification targets and optional `auth_token` fields:
- `src/config.rs:366-390`
`--reset-auth` only clears users and sessions, then re-enters setup mode:
- `src/main.rs:369-383`
- `src/db.rs:2566-2571`
#### Why This Is Exploitable
This endpoint is effectively a public config API whenever the app is in setup mode. On a brand-new install that broadens the same bootstrap attack surface as ALCH-SEC-001. On an existing deployment where an operator runs `--reset-auth`, the previous configuration remains on disk while authentication is removed, so a remote caller can:
1. GET `/api/settings/bundle` and read the current config
2. Learn configured paths, schedules, webhook targets, and any stored notification bearer tokens
3. PUT a replacement config before the legitimate operator finishes recovery
That creates both confidential-data exposure and unauthenticated remote reconfiguration during recovery/bootstrap windows.
#### Recommended Fix
Do not expose `/api/settings/bundle` anonymously.
Safer options:
1. Apply the same loopback-only setup restriction used for `/api/fs/*`.
2. Split bootstrap-safe fields from privileged configuration and expose only the minimal bootstrap payload anonymously.
3. Redact secret-bearing config fields such as notification tokens from any unauthenticated response path.
## Notes
- I did not find a major DOM-XSS path in `web/src`; there were no `dangerouslySetInnerHTML`, `innerHTML`, `insertAdjacentHTML`, `eval`, or similar high-risk sinks in the audited code paths.
- I also did not see obvious raw SQL string interpolation issues; the database code I reviewed uses parameter binding.

74
stitch_findings.md Normal file
View File

@@ -0,0 +1,74 @@
# Alchemist Project Audit & Findings
This document provides a comprehensive audit of the Alchemist media transcoding project (v0.3.0-rc.3), covering backend architecture, frontend design, database schema, and operational workflows.
---
## 1. Project Architecture & Pipeline
Alchemist implements a robust, asynchronous media transcoding pipeline managed by a central `Agent`. The pipeline follows a strictly ordered lifecycle:
1. **Scanner (`src/media/scanner.rs`):** Performs a high-speed traversal of watch folders. It uses `mtime_hash` (seconds + nanoseconds) to detect changes without full file analysis, efficiently handling re-scans and minimizing DB writes.
2. **Analyzer (`src/media/analyzer.rs`):** Executes `ffprobe` to extract normalized media metadata (codecs, bit depth, BPP, bitrate). Analysis results are used to populate the `DetailedEncodeStats` and `Decision` tables.
3. **Planner (`src/media/planner.rs`):** A complex decision engine that evaluates whether to **Skip**, **Remux**, or **Transcode** a file based on user profiles.
* *Finding:* The planning logic is heavily hardcoded with "magic thresholds" (e.g., Bits-per-pixel thresholds). While effective, these could be more exposed as "Advanced Settings" in the UI.
4. **Executor (`src/media/executor.rs`):** Orchestrates the `ffmpeg` process. It dynamically selects encoders (NVENC, VAAPI, QSV, ProRes, or CPU fallback) based on the target profile and host hardware capabilities detected in `src/system/hardware.rs`.
---
## 2. Backend & API Design (Rust/Axum)
* **Concurrency:** Utilizes `tokio` for async orchestration and `rayon` for CPU-intensive tasks (like file hashing or list processing). The scheduler supports multiple concurrency modes: `Background` (1 job), `Balanced` (capped), and `Throughput` (uncapped).
* **State Management:** The backend uses `broadcast` channels to separate high-volume events (Progress, Logs) from low-volume system events (Config updates). This prevents UI "flicker" and unnecessary re-renders in the frontend.
* **API Structure:**
* **RESTful endpoints** for jobs, settings, and stats.
* **SSE (`src/server/sse.rs`)** for real-time progress updates, ensuring a reactive UI without high-frequency polling.
* **Auth (`src/server/auth.rs`):** Implements JWT-based authentication with Argon2 hashing for the initial setup.
---
## 3. Database Schema (SQLite/SQLx)
* **Stability:** The project uses 16+ migrations, showing a mature evolution from a simple schema to a sophisticated job-tracking system.
* **Decision Logging:** The `decisions` and `job_failure_explanations` tables are a standout feature. They store the "why" behind every action as structured JSON, which is then humanized in the UI (e.g., explaining exactly why a file was skipped).
* **Data Integrity:** Foreign keys and WAL (Write-Ahead Logging) mode ensure database stability even during heavy concurrent I/O.
---
## 4. Frontend Design (Astro/React/Helios)
* **Stack:** Astro 5 provides a fast, static-first framework with React 18 handles the complex stateful dashboards.
* **Design System ("Helios"):**
* *Identity:* A dark-themed, data-dense industrial aesthetic.
* *Findings:* While functional, the system suffers from "component bloat." `JobManager.tsx` (~2,000 lines) is a significant maintainability risk. It contains UI logic, filtering logic, and data transformation logic mixed together.
* **Data Visualization:** Uses `recharts` for historical trends and performance metrics.
* *Improvement:* The charts are currently static snapshots. Adding real-time interactivity (brushing, zooming) would improve the exploration of large datasets.
---
## 5. System & Hardware Integration
* **Hardware Discovery:** `src/system/hardware.rs` is extensive, detecting NVIDIA, Intel, AMD, and Apple Silicon capabilities. It correctly maps these to `ffmpeg` encoder flags.
* **FS Browser:** A custom filesystem browser (`src/system/fs_browser.rs`) allows for secure directory selection during setup, preventing path injection and ensuring platform-agnostic path handling.
---
## 6. Critical Areas for Improvement
### **Maintainability (High Priority)**
* **Decouple `JobManager.tsx`:** Refactor into functional hooks (`useJobs`, `useFilters`) and smaller, presentation-only components.
* **Standardize Formatters:** Move `formatBytes`, `formatTime`, and `formatReduction` into a centralized `lib/formatters.ts` to reduce code duplication across the Dashboard and Stats pages.
### **UX & Performance (Medium Priority)**
* **Polling vs. SSE:** Ensure all real-time metrics (like GPU temperature) are delivered via SSE rather than periodic polling to reduce backend load and improve UI responsiveness.
* **Interactive Decision Explanations:** The current skip reasons are helpful but static. Adding links to the relevant settings (e.g., "Change this threshold in Transcoding Settings") would close the loop for users.
### **Reliability (Low Priority)**
* **E2E Testing:** While Playwright tests exist, they focus on "reliability." Expanding these to cover complex "edge cases" (like network-attached storage disconnects during a scan) would improve long-term stability.
---
## 7. Stitch Recommendation
Use Stitch to generate **atomic component refinements** based on this audit.
* *Prompt Example:* "Refine the JobTable row to use iconic status indicators with tooltips for skip reasons, as outlined in the Alchemist Audit."
* *Prompt Example:* "Create a unified `Formatter` utility library in TypeScript that handles bytes, time, and percentage formatting for the Helios design system."