# Audit Findings Last updated: 2026-04-12 (second pass) --- ## P1 Issues --- ### [P1-1] Cancel during analysis can be overwritten by the pipeline **Status: RESOLVED** **Files:** - `src/server/jobs.rs:41–63` - `src/media/pipeline.rs:1178–1221` - `src/orchestrator.rs:84–90` **Severity:** P1 **Problem:** `request_job_cancel()` in `jobs.rs` immediately writes `Cancelled` to the DB for jobs in `Analyzing` or `Resuming` state. The pipeline used to have race windows where it could overwrite this state with `Skipped`, `Encoding`, or `Remuxing` if it reached a checkpoint after the cancel was issued but before it could be processed. **Fix:** Implemented `cancel_requested: Arc>>` in `Transcoder` (orchestrator). The `update_job_state` wrapper in `pipeline.rs` now checks this set before any DB write for `Encoding`, `Remuxing`, `Skipped`, and `Completed` states. Terminal states (Completed, Failed, Cancelled, Skipped) also trigger removal from the set. --- ### [P1-2] VideoToolbox quality controls are effectively ignored **Status: RESOLVED** **Files:** - `src/media/planner.rs:630–650` - `src/media/ffmpeg/videotoolbox.rs:25–54` - `src/config.rs:85–92` **Severity:** P1 **Problem:** The planner used to emit `RateControl::Cq` values that were incorrectly mapped for VideoToolbox, resulting in uncalibrated or inverted quality. **Fix:** Fixed the mapping in `videotoolbox.rs` to use `-q:v` (1-100, lower is better) and clamped the input range to 1-51 to match user expectations from x264/x265. Updated `QualityProfile` in `config.rs` to provide sane default values (24, 28, 32) for VideoToolbox quality. --- ## P2 Issues --- ### [P2-1] Convert does not reuse subtitle/container compatibility checks **Status: RESOLVED** **Files:** - `src/conversion.rs:372–380` - `src/media/planner.rs` **Severity:** P2 **Problem:** The conversion path was not validating subtitle/container compatibility, leading to FFmpeg runtime failures instead of early validation errors. **Fix:** Integrated `crate::media::planner::subtitle_copy_supported` into `src/conversion.rs:build_subtitle_plan`. The "copy" mode now returns an `AlchemistError::Config` if the combination is unsupported. --- ### [P2-2] Completed job metadata omitted at the API layer **Status: RESOLVED** **Files:** - `src/db.rs:254–263` - `src/media/pipeline.rs:599` - `src/server/jobs.rs:343` **Severity:** P2 **Problem:** Job details required a live re-probe of the input file to show metadata, which failed if the file was moved or deleted after completion. **Fix:** Added `input_metadata_json` column to the `jobs` table (migration `20260412000000_store_job_metadata.sql`). The pipeline now stores the metadata string immediately after analysis. `get_job_detail_handler` reads this stored value, ensuring metadata is always available even if the source file is missing. --- ### [P2-3] LAN-only setup exposed to reverse proxy misconfig **Status: RESOLVED** **Files:** - `src/config.rs` — `SystemConfig.trusted_proxies` - `src/server/mod.rs` — `AppState.trusted_proxies`, `AppState.setup_token` - `src/server/middleware.rs` — `is_trusted_peer`, `request_ip`, `auth_middleware` **Severity:** P2 **Problem:** The setup wizard gate trusts all private/loopback IPs for header forwarding. When running behind a misconfigured proxy that doesn't set headers, it falls back to the proxy's own IP (e.g. 127.0.0.1), making the setup endpoint accessible to external traffic. **Fix:** Added two independent security layers: 1. `trusted_proxies: Vec` to `SystemConfig`. When non-empty, only those exact IPs (plus loopback) are trusted for proxy header forwarding instead of all RFC-1918 ranges. Empty = previous behavior preserved. 2. `ALCHEMIST_SETUP_TOKEN` env var. When set, setup endpoints require `?token=` query param regardless of client IP. Token mode takes precedence over IP-based LAN check. --- ### [P2-4] N+1 DB update in batch cancel **Status: RESOLVED** **Files:** - `src/server/jobs.rs` — `batch_jobs_handler` **Severity:** P2 **Problem:** `batch_jobs_handler` for "cancel" action iterates over jobs and calls `request_job_cancel` which performs an individual `update_job_status` query per job. Cancelling a large number of jobs triggers N queries. **Fix:** Restructured the "cancel" branch in `batch_jobs_handler`. Orchestrator in-memory operations (add_cancel_request, cancel_job) still run per-job, but all DB status updates are batched into a single `db.batch_cancel_jobs(&ids)` call (which already existed at db.rs). Immediate-resolution jobs (Queued + successfully signalled Analyzing/Resuming) are collected and written in one UPDATE ... WHERE id IN (...) query. --- ### [P2-5] Missing archived filter in health and stats queries **Status: RESOLVED** **Files:** - `src/db.rs` — `get_aggregated_stats`, `get_job_stats`, `get_health_summary` **Severity:** P2 **Problem:** `get_health_summary` and `get_aggregated_stats` (total_jobs) do not include `AND archived = 0`. Archived (deleted) jobs are incorrectly included in library health metrics and total job counts. **Fix:** Added `AND archived = 0` to all three affected queries: `total_jobs` and `completed_jobs` subqueries in `get_aggregated_stats`, the `GROUP BY status` query in `get_job_stats`, and both subqueries in `get_health_summary`. Updated tests that were asserting the old (incorrect) behavior. --- ### [P2-6] Daily summary notifications bypass SSRF protections **Status: RESOLVED** **Files:** - `src/notifications.rs` — `build_safe_client()`, `send()`, `send_daily_summary_target()` **Severity:** P2 **Problem:** `send_daily_summary_target()` used `Client::new()` without any SSRF defences, while `send()` applied DNS timeout, private-IP blocking, no-redirect policy, and request timeout. **Fix:** Extracted all client-building logic into `build_safe_client(&self, target)` which applies the full SSRF defence stack. Both `send()` and `send_daily_summary_target()` now use this shared helper. --- ### [P2-7] Silent reprobe failure corrupts saved encode stats **Status: RESOLVED** **Files:** - `src/media/pipeline.rs` — `finalize_job()` duration reprobe **Severity:** P2 **Problem:** When a completed encode's metadata has `duration_secs <= 0.0`, the pipeline reprobes the output file to get the actual duration. If reprobe fails, the error was silently swallowed via `.ok()` and duration defaulted to 0.0, poisoning downstream stats. **Fix:** Replaced `.ok().and_then().unwrap_or(0.0)` chain with explicit `match` that logs the error via `tracing::warn!` and falls through to 0.0. Existing guards at the stats computation lines already handle `duration <= 0.0` correctly — operators now see *why* stats are zeroed. --- ## Technical Debt --- ### [TD-1] `db.rs` is a 3481-line monolith **Status: RESOLVED** **File:** `src/db/` (was `src/db.rs`) **Severity:** TD **Problem:** The database layer had grown to nearly 3500 lines. Every query, migration flag, and state enum was in one file, making navigation and maintenance difficult. **Fix:** Split into `src/db/` module with 8 submodules: `mod.rs` (Db struct, init, migrations, hash fns), `types.rs` (all type defs), `events.rs` (event enums + channels), `jobs.rs` (job CRUD/filtering/decisions), `stats.rs` (encode/aggregated/daily stats), `config.rs` (watch dirs/profiles/notifications/schedules/file settings/preferences), `conversion.rs` (ConversionJob CRUD), `system.rs` (auth/sessions/API tokens/logs/health). All tests moved alongside their methods. Public API unchanged — all types re-exported from `db/mod.rs`. --- ### [TD-2] `AlchemistEvent` legacy bridge is dead weight **Status: RESOLVED** **Files:** - `src/db.rs` — enum and From impls removed - `src/media/pipeline.rs`, `src/media/executor.rs`, `src/media/processor.rs` — legacy `tx` channel removed - `src/notifications.rs` — migrated to typed `EventChannels` (jobs + system) - `src/server/mod.rs`, `src/main.rs` — legacy channel removed from AppState/RunServerArgs **Severity:** TD **Problem:** `AlchemistEvent` was a legacy event type duplicated by `JobEvent`, `ConfigEvent`, and `SystemEvent`. All senders were emitting events on both channels. **Fix:** Migrated the notification system (the sole consumer) to subscribe to `EventChannels.jobs` and `EventChannels.system` directly. Added `SystemEvent::EngineIdle` variant. Removed `AlchemistEvent` enum, its `From` impls, the legacy `tx` broadcast channel from all structs, and the `pub use` from `lib.rs`. --- ### [TD-3] `pipeline.rs` legacy `AlchemistEvent::Progress` stub **Status: RESOLVED** **Files:** - `src/media/pipeline.rs:1228` **Severity:** TD **Problem:** The pipeline used to emit zeroed progress events that could overwrite real stats from the executor. **Fix:** Emission removed. A comment at line 1228-1229 confirms that `AlchemistEvent::Progress` is no longer emitted from the pipeline wrapper. --- ### [TD-4] Silent `.ok()` on pipeline decision and attempt DB writes **Status: RESOLVED** **Files:** - `src/media/pipeline.rs` — all `add_decision`, `insert_encode_attempt`, `upsert_job_failure_explanation`, and `add_log` call sites **Severity:** TD **Problem:** Decision records, encode attempt records, failure explanations, and error logs were written with `.ok()` or `let _ =`, silently discarding DB write failures. These records are the only audit trail of *why* a job was skipped/transcoded/failed. **Fix:** Replaced all `.ok()` / `let _ =` patterns on `add_decision`, `insert_encode_attempt`, `upsert_job_failure_explanation`, and `add_log` calls with `if let Err(e) = ... { tracing::warn!(...) }`. Pipeline still continues on failure, but operators now see the error. --- ### [TD-5] Correlated subquery for sort-by-size in job listing **Status: RESOLVED** **Files:** - `src/db.rs` — `get_jobs_filtered()` query **Severity:** TD **Problem:** Sorting jobs by file size used a correlated subquery in ORDER BY, executing one subquery per row and producing NULL for jobs without encode_stats. **Fix:** Added `LEFT JOIN encode_stats es ON es.job_id = j.id` to the base query. Sort column changed to `COALESCE(es.input_size_bytes, 0)`, ensuring jobs without stats sort as 0 (smallest) instead of NULL. --- ## Reliability Gaps --- ### [RG-1] No encode resume after crash or restart **Status: PARTIALLY RESOLVED** **Files:** - `src/main.rs:320` - `src/media/processor.rs:255` **Severity:** RG **Problem:** Interrupted encodes were left in `Encoding` state and orphaned temp files remained on disk. **Fix:** Implemented `db.reset_interrupted_jobs()` in `main.rs` which resets `Encoding`, `Remuxing`, `Resuming`, and `Analyzing` jobs to `Queued` on startup. Orphaned temp files are also detected and removed. Full bitstream-level resume (resuming from the middle of a file) is still missing. --- ### [RG-2] AMD VAAPI/AMF hardware paths unvalidated **Files:** - `src/media/ffmpeg/vaapi.rs` - `src/media/ffmpeg/amf.rs` **Severity:** RG **Problem:** Hardware paths for AMD (VAAPI on Linux, AMF on Windows) were implemented without real hardware validation. **Fix:** Verify exact flag compatibility on AMD hardware and add integration tests gated on GPU presence. --- ## UX Gaps --- ### [UX-1] Queued jobs show no position or estimated wait time **Status: RESOLVED** **Files:** - `src/db.rs` — `get_queue_position` - `src/server/jobs.rs` — `JobDetailResponse.queue_position` - `web/src/components/jobs/JobDetailModal.tsx` - `web/src/components/jobs/types.ts` — `JobDetail.queue_position` **Severity:** UX **Problem:** Queued jobs only show "Waiting" without indicating their position in the priority queue. **Fix:** Implemented `db.get_queue_position(job_id)` which counts jobs with higher priority or earlier `created_at` (matching the `priority DESC, created_at ASC` dequeue order). Added `queue_position: Option` to `JobDetailResponse` — populated only when `status == Queued`. Frontend shows `Queue position: #N` in the empty state card in `JobDetailModal`. --- ### [UX-2] No way to add a single file to the queue via the UI **Severity:** UX **Problem:** Jobs only enter the queue via full library scans. No manual "Enqueue path" exists in the UI. **Fix:** Add `POST /api/jobs/enqueue` and a "Add file" action in the `JobsToolbar`. --- ### [UX-3] Workers-blocked reason not surfaced for queued jobs **Severity:** UX **Problem:** Users cannot see why a job is stuck in Queued (paused, scheduled, or slots full). **Fix:** Add `GET /api/processor/status` and show the reason in the job detail. --- ## Feature Gaps --- ### [FG-4] Intelligence page content not actionable **Files:** - `web/src/components/LibraryIntelligence.tsx` **Severity:** FG **Problem:** Intelligence page is informational only; recommendations cannot be acted upon directly from the page. **Fix:** Add "Queue all" for remux opportunities and "Review" actions for duplicates. --- ## What To Fix Next 1. **[UX-2]** Single file enqueue — New feature. 2. **[UX-3]** Workers-blocked reason — New feature. 3. **[FG-4]** Intelligence page actions — New feature. 4. **[RG-2]** AMD VAAPI/AMF validation — Needs real hardware.