- P1: Fix cancel race in pipeline, fix VideoToolbox quality mapping - P2: SSRF protection, batch cancel N+1, archived filter fixes, metadata persistence, reverse proxy hardening, reprobe logging - TD: Remove AlchemistEvent legacy bridge, fix silent .ok() on DB writes, optimize sort-by-size query, split db.rs (3400 LOC) into 8 focused submodules under src/db/ - UX: Add queue position display for queued jobs - Docs: Update API docs, engine modes, library doctor, config ref - Plans: Add plans.md for remaining open items (UX-2/3, FG-4, RG-2) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
13 KiB
Audit Findings
Last updated: 2026-04-12 (second pass)
P1 Issues
[P1-1] Cancel during analysis can be overwritten by the pipeline
Status: RESOLVED
Files:
src/server/jobs.rs:41–63src/media/pipeline.rs:1178–1221src/orchestrator.rs:84–90
Severity: P1
Problem:
request_job_cancel() in jobs.rs immediately writes Cancelled to the DB for jobs in Analyzing or Resuming state. The pipeline used to have race windows where it could overwrite this state with Skipped, Encoding, or Remuxing if it reached a checkpoint after the cancel was issued but before it could be processed.
Fix:
Implemented cancel_requested: Arc<tokio::sync::RwLock<HashSet<i64>>> in Transcoder (orchestrator). The update_job_state wrapper in pipeline.rs now checks this set before any DB write for Encoding, Remuxing, Skipped, and Completed states. Terminal states (Completed, Failed, Cancelled, Skipped) also trigger removal from the set.
[P1-2] VideoToolbox quality controls are effectively ignored
Status: RESOLVED
Files:
src/media/planner.rs:630–650src/media/ffmpeg/videotoolbox.rs:25–54src/config.rs:85–92
Severity: P1
Problem:
The planner used to emit RateControl::Cq values that were incorrectly mapped for VideoToolbox, resulting in uncalibrated or inverted quality.
Fix:
Fixed the mapping in videotoolbox.rs to use -q:v (1-100, lower is better) and clamped the input range to 1-51 to match user expectations from x264/x265. Updated QualityProfile in config.rs to provide sane default values (24, 28, 32) for VideoToolbox quality.
P2 Issues
[P2-1] Convert does not reuse subtitle/container compatibility checks
Status: RESOLVED
Files:
src/conversion.rs:372–380src/media/planner.rs
Severity: P2
Problem:
The conversion path was not validating subtitle/container compatibility, leading to FFmpeg runtime failures instead of early validation errors.
Fix:
Integrated crate::media::planner::subtitle_copy_supported into src/conversion.rs:build_subtitle_plan. The "copy" mode now returns an AlchemistError::Config if the combination is unsupported.
[P2-2] Completed job metadata omitted at the API layer
Status: RESOLVED
Files:
src/db.rs:254–263src/media/pipeline.rs:599src/server/jobs.rs:343
Severity: P2
Problem:
Job details required a live re-probe of the input file to show metadata, which failed if the file was moved or deleted after completion.
Fix:
Added input_metadata_json column to the jobs table (migration 20260412000000_store_job_metadata.sql). The pipeline now stores the metadata string immediately after analysis. get_job_detail_handler reads this stored value, ensuring metadata is always available even if the source file is missing.
[P2-3] LAN-only setup exposed to reverse proxy misconfig
Status: RESOLVED
Files:
src/config.rs—SystemConfig.trusted_proxiessrc/server/mod.rs—AppState.trusted_proxies,AppState.setup_tokensrc/server/middleware.rs—is_trusted_peer,request_ip,auth_middleware
Severity: P2
Problem:
The setup wizard gate trusts all private/loopback IPs for header forwarding. When running behind a misconfigured proxy that doesn't set headers, it falls back to the proxy's own IP (e.g. 127.0.0.1), making the setup endpoint accessible to external traffic.
Fix:
Added two independent security layers:
trusted_proxies: Vec<String>toSystemConfig. When non-empty, only those exact IPs (plus loopback) are trusted for proxy header forwarding instead of all RFC-1918 ranges. Empty = previous behavior preserved.ALCHEMIST_SETUP_TOKENenv var. When set, setup endpoints require?token=<value>query param regardless of client IP. Token mode takes precedence over IP-based LAN check.
[P2-4] N+1 DB update in batch cancel
Status: RESOLVED
Files:
src/server/jobs.rs—batch_jobs_handler
Severity: P2
Problem:
batch_jobs_handler for "cancel" action iterates over jobs and calls request_job_cancel which performs an individual update_job_status query per job. Cancelling a large number of jobs triggers N queries.
Fix:
Restructured the "cancel" branch in batch_jobs_handler. Orchestrator in-memory operations (add_cancel_request, cancel_job) still run per-job, but all DB status updates are batched into a single db.batch_cancel_jobs(&ids) call (which already existed at db.rs). Immediate-resolution jobs (Queued + successfully signalled Analyzing/Resuming) are collected and written in one UPDATE ... WHERE id IN (...) query.
[P2-5] Missing archived filter in health and stats queries
Status: RESOLVED
Files:
src/db.rs—get_aggregated_stats,get_job_stats,get_health_summary
Severity: P2
Problem:
get_health_summary and get_aggregated_stats (total_jobs) do not include AND archived = 0. Archived (deleted) jobs are incorrectly included in library health metrics and total job counts.
Fix:
Added AND archived = 0 to all three affected queries: total_jobs and completed_jobs subqueries in get_aggregated_stats, the GROUP BY status query in get_job_stats, and both subqueries in get_health_summary. Updated tests that were asserting the old (incorrect) behavior.
[P2-6] Daily summary notifications bypass SSRF protections
Status: RESOLVED
Files:
src/notifications.rs—build_safe_client(),send(),send_daily_summary_target()
Severity: P2
Problem:
send_daily_summary_target() used Client::new() without any SSRF defences, while send() applied DNS timeout, private-IP blocking, no-redirect policy, and request timeout.
Fix:
Extracted all client-building logic into build_safe_client(&self, target) which applies the full SSRF defence stack. Both send() and send_daily_summary_target() now use this shared helper.
[P2-7] Silent reprobe failure corrupts saved encode stats
Status: RESOLVED
Files:
src/media/pipeline.rs—finalize_job()duration reprobe
Severity: P2
Problem:
When a completed encode's metadata has duration_secs <= 0.0, the pipeline reprobes the output file to get the actual duration. If reprobe fails, the error was silently swallowed via .ok() and duration defaulted to 0.0, poisoning downstream stats.
Fix:
Replaced .ok().and_then().unwrap_or(0.0) chain with explicit match that logs the error via tracing::warn! and falls through to 0.0. Existing guards at the stats computation lines already handle duration <= 0.0 correctly — operators now see why stats are zeroed.
Technical Debt
[TD-1] db.rs is a 3481-line monolith
Status: RESOLVED
File: src/db/ (was src/db.rs)
Severity: TD
Problem:
The database layer had grown to nearly 3500 lines. Every query, migration flag, and state enum was in one file, making navigation and maintenance difficult.
Fix:
Split into src/db/ module with 8 submodules: mod.rs (Db struct, init, migrations, hash fns), types.rs (all type defs), events.rs (event enums + channels), jobs.rs (job CRUD/filtering/decisions), stats.rs (encode/aggregated/daily stats), config.rs (watch dirs/profiles/notifications/schedules/file settings/preferences), conversion.rs (ConversionJob CRUD), system.rs (auth/sessions/API tokens/logs/health). All tests moved alongside their methods. Public API unchanged — all types re-exported from db/mod.rs.
[TD-2] AlchemistEvent legacy bridge is dead weight
Status: RESOLVED
Files:
src/db.rs— enum and From impls removedsrc/media/pipeline.rs,src/media/executor.rs,src/media/processor.rs— legacytxchannel removedsrc/notifications.rs— migrated to typedEventChannels(jobs + system)src/server/mod.rs,src/main.rs— legacy channel removed from AppState/RunServerArgs
Severity: TD
Problem:
AlchemistEvent was a legacy event type duplicated by JobEvent, ConfigEvent, and SystemEvent. All senders were emitting events on both channels.
Fix:
Migrated the notification system (the sole consumer) to subscribe to EventChannels.jobs and EventChannels.system directly. Added SystemEvent::EngineIdle variant. Removed AlchemistEvent enum, its From impls, the legacy tx broadcast channel from all structs, and the pub use from lib.rs.
[TD-3] pipeline.rs legacy AlchemistEvent::Progress stub
Status: RESOLVED
Files:
src/media/pipeline.rs:1228
Severity: TD
Problem:
The pipeline used to emit zeroed progress events that could overwrite real stats from the executor.
Fix:
Emission removed. A comment at line 1228-1229 confirms that AlchemistEvent::Progress is no longer emitted from the pipeline wrapper.
[TD-4] Silent .ok() on pipeline decision and attempt DB writes
Status: RESOLVED
Files:
src/media/pipeline.rs— alladd_decision,insert_encode_attempt,upsert_job_failure_explanation, andadd_logcall sites
Severity: TD
Problem:
Decision records, encode attempt records, failure explanations, and error logs were written with .ok() or let _ =, silently discarding DB write failures. These records are the only audit trail of why a job was skipped/transcoded/failed.
Fix:
Replaced all .ok() / let _ = patterns on add_decision, insert_encode_attempt, upsert_job_failure_explanation, and add_log calls with if let Err(e) = ... { tracing::warn!(...) }. Pipeline still continues on failure, but operators now see the error.
[TD-5] Correlated subquery for sort-by-size in job listing
Status: RESOLVED
Files:
src/db.rs—get_jobs_filtered()query
Severity: TD
Problem:
Sorting jobs by file size used a correlated subquery in ORDER BY, executing one subquery per row and producing NULL for jobs without encode_stats.
Fix:
Added LEFT JOIN encode_stats es ON es.job_id = j.id to the base query. Sort column changed to COALESCE(es.input_size_bytes, 0), ensuring jobs without stats sort as 0 (smallest) instead of NULL.
Reliability Gaps
[RG-1] No encode resume after crash or restart
Status: PARTIALLY RESOLVED
Files:
src/main.rs:320src/media/processor.rs:255
Severity: RG
Problem:
Interrupted encodes were left in Encoding state and orphaned temp files remained on disk.
Fix:
Implemented db.reset_interrupted_jobs() in main.rs which resets Encoding, Remuxing, Resuming, and Analyzing jobs to Queued on startup. Orphaned temp files are also detected and removed. Full bitstream-level resume (resuming from the middle of a file) is still missing.
[RG-2] AMD VAAPI/AMF hardware paths unvalidated
Files:
src/media/ffmpeg/vaapi.rssrc/media/ffmpeg/amf.rs
Severity: RG
Problem:
Hardware paths for AMD (VAAPI on Linux, AMF on Windows) were implemented without real hardware validation.
Fix:
Verify exact flag compatibility on AMD hardware and add integration tests gated on GPU presence.
UX Gaps
[UX-1] Queued jobs show no position or estimated wait time
Status: RESOLVED
Files:
src/db.rs—get_queue_positionsrc/server/jobs.rs—JobDetailResponse.queue_positionweb/src/components/jobs/JobDetailModal.tsxweb/src/components/jobs/types.ts—JobDetail.queue_position
Severity: UX
Problem:
Queued jobs only show "Waiting" without indicating their position in the priority queue.
Fix:
Implemented db.get_queue_position(job_id) which counts jobs with higher priority or earlier created_at (matching the priority DESC, created_at ASC dequeue order). Added queue_position: Option<u32> to JobDetailResponse — populated only when status == Queued. Frontend shows Queue position: #N in the empty state card in JobDetailModal.
[UX-2] No way to add a single file to the queue via the UI
Severity: UX
Problem:
Jobs only enter the queue via full library scans. No manual "Enqueue path" exists in the UI.
Fix:
Add POST /api/jobs/enqueue and a "Add file" action in the JobsToolbar.
[UX-3] Workers-blocked reason not surfaced for queued jobs
Severity: UX
Problem:
Users cannot see why a job is stuck in Queued (paused, scheduled, or slots full).
Fix:
Add GET /api/processor/status and show the reason in the job detail.
Feature Gaps
[FG-4] Intelligence page content not actionable
Files:
web/src/components/LibraryIntelligence.tsx
Severity: FG
Problem:
Intelligence page is informational only; recommendations cannot be acted upon directly from the page.
Fix:
Add "Queue all" for remux opportunities and "Review" actions for duplicates.
What To Fix Next
- [UX-2] Single file enqueue — New feature.
- [UX-3] Workers-blocked reason — New feature.
- [FG-4] Intelligence page actions — New feature.
- [RG-2] AMD VAAPI/AMF validation — Needs real hardware.