mirror of
https://github.com/bybrooklyn/alchemist.git
synced 2026-04-18 01:43:34 -04:00
Document backlog priorities and add security audit report
This commit is contained in:
@@ -35,7 +35,7 @@ The project uses `just` as the primary task runner.
|
|||||||
- `just build`: Performs a full production build (Frontend then Rust).
|
- `just build`: Performs a full production build (Frontend then Rust).
|
||||||
- `just docker-build`: Builds the local Docker image.
|
- `just docker-build`: Builds the local Docker image.
|
||||||
- **Verification:**
|
- **Verification:**
|
||||||
- `just check`: Runs all checks (Rust fmt, clippy, check; Frontend typecheck, build).
|
- `just release-check`: Runs all checks (Rust fmt, clippy, check; Frontend typecheck, build).
|
||||||
- `just check-rust`: Rust-only verification.
|
- `just check-rust`: Rust-only verification.
|
||||||
- **Testing:**
|
- **Testing:**
|
||||||
- `just test`: Runs all Rust tests.
|
- `just test`: Runs all Rust tests.
|
||||||
|
|||||||
241
backlog.md
241
backlog.md
@@ -15,36 +15,183 @@ These are deliberate design decisions, not omissions. Do not add them.
|
|||||||
|
|
||||||
## High Priority
|
## High Priority
|
||||||
|
|
||||||
### Behavior-Preserving Refactor Pass
|
Testing policy for this section:
|
||||||
- Keep the current product behavior exactly the same while improving internal structure
|
|
||||||
- Refactor `web/src/components/JobManager.tsx` into smaller components and hooks without changing screens, filters, polling, SSE updates, or job actions
|
|
||||||
- Centralize duplicated byte/time/reduction formatting logic into shared utilities and preserve current output formatting
|
|
||||||
- Preserve the current realtime model, but make ownership clearer: job/config/system events via SSE, resource metrics via polling
|
|
||||||
- Add regression coverage around planner decisions, watcher behavior, job lifecycle transitions, and decision explanation rendering before deeper refactors
|
|
||||||
- Document the current planner heuristics and hardware fallback rules so future cleanup does not accidentally change behavior
|
|
||||||
|
|
||||||
### Planning / Simulation Mode
|
- Backend/unit/integration coverage and Playwright coverage are exit criteria for each item below.
|
||||||
- Add a first-class simulation flow that answers what Alchemist would transcode, remux, or skip without mutating the library
|
- Do not treat "more tests" as a standalone product track; attach the required coverage to the feature or refactor that needs it.
|
||||||
- Show estimated total bytes recoverable, action counts, top skip reasons, and per-file predicted actions
|
|
||||||
- Support comparing current settings against alternative profiles, codec targets, or threshold snapshots
|
|
||||||
- Reuse the scanner, analyzer, and planner, but stop before executor and promotion stages
|
|
||||||
|
|
||||||
### Per-File Encode History
|
### 1. Engine Lifecycle Controls
|
||||||
- When a file has been processed more than once (retry, re-queue after settings change, manual re-run), show the full history of attempts in the job detail panel
|
|
||||||
- Each attempt should show: date, outcome (completed/failed/skipped), encode stats if applicable (size before/after, codec, duration), and failure reason if failed
|
|
||||||
- The data is already in the DB across `jobs`, `encode_stats`, and `job_failure_explanations` — this is primarily a UI feature
|
|
||||||
- Useful for understanding why a file kept failing, or comparing quality before/after a settings change
|
|
||||||
|
|
||||||
### E2E Test Coverage
|
#### Goal
|
||||||
- Expand Playwright tests for more UI flows
|
- Make engine lifecycle controls real, explicit, and operator-safe from the header/About surface.
|
||||||
- Test job queue management scenarios
|
|
||||||
- Test error states and recovery flows
|
|
||||||
|
|
||||||
### AMD AV1 Validation
|
#### Scope
|
||||||
- Validate and tune the existing AMD AV1 paths on real hardware
|
- Redesign the About screen so it fits the current visual language.
|
||||||
- Cover Linux VAAPI and Windows AMF separately
|
- Add a **Restart Engine** action that restarts the engine loop without killing the Alchemist process.
|
||||||
- Verify encoder selection, fallback behavior, and quality/performance defaults
|
- Add a **Shutdown Alchemist** action that cancels active jobs immediately and exits the process cleanly.
|
||||||
- Do not treat this as support-from-scratch: encoder wiring and hardware detection already exist
|
- Define and surface the lifecycle states needed to make restart and shutdown understandable in the UI.
|
||||||
|
|
||||||
|
#### Non-Goals
|
||||||
|
- Do not re-exec the whole app process to implement restart.
|
||||||
|
- Do not drain active jobs to completion on shutdown; shutdown means cancel and exit.
|
||||||
|
|
||||||
|
#### Dependencies
|
||||||
|
- Backend lifecycle endpoints and orchestration semantics for restart and shutdown.
|
||||||
|
- Reliable event/state propagation so the UI can reflect transient lifecycle states without stale polling or SSE behavior.
|
||||||
|
|
||||||
|
#### Acceptance Criteria
|
||||||
|
- Restart tears down and reinitializes the engine loop while the binary stays alive.
|
||||||
|
- Shutdown stops accepting new work, cancels active jobs, persists the right terminal states, and exits cleanly.
|
||||||
|
- Job rows, logs, and toasts clearly distinguish pause, drain, restart, cancellation, and shutdown.
|
||||||
|
- The About surface exposes restart and shutdown with confirmation and clear failure handling.
|
||||||
|
|
||||||
|
#### Required Tests
|
||||||
|
- Backend tests for restart/shutdown semantics and lifecycle state transitions.
|
||||||
|
- Playwright coverage for About screen controls, confirmations, success states, and failure states.
|
||||||
|
|
||||||
|
#### Solution
|
||||||
|
- Add a dedicated engine lifecycle API instead of overloading pause/drain:
|
||||||
|
- Add authenticated lifecycle routes for `restart engine` and `shutdown app`.
|
||||||
|
- Keep restart scoped to the engine loop only; do not re-exec the binary.
|
||||||
|
- Keep shutdown as cancel-all-and-exit; do not reuse drain semantics.
|
||||||
|
- Introduce a server-owned shutdown trigger so HTTP-initiated shutdown uses the same shutdown path as Ctrl+C and SIGTERM:
|
||||||
|
- Extend `RunServerArgs` and `AppState` with a shutdown signal sender.
|
||||||
|
- Update `axum::serve(...).with_graceful_shutdown(...)` to also listen for an internal shutdown signal.
|
||||||
|
- Add an explicit lifecycle transition guard:
|
||||||
|
- Reject overlapping restart/shutdown requests while a lifecycle action is already in progress.
|
||||||
|
- Surface lifecycle state through `/api/engine/status` so the UI can render restarting/shutting-down states cleanly.
|
||||||
|
- Implement restart as an engine-loop reset, not a process restart:
|
||||||
|
- Pause new intake.
|
||||||
|
- Cancel active jobs immediately through the orchestrator.
|
||||||
|
- Clear drain state and any temporary lifecycle flags.
|
||||||
|
- Reinitialize the engine loop state needed to resume normal processing.
|
||||||
|
- Resume only if the scheduler is not actively pausing the engine.
|
||||||
|
- Implement shutdown as a process-level cancel-and-exit flow:
|
||||||
|
- Pause intake.
|
||||||
|
- Cancel all active jobs immediately.
|
||||||
|
- Give cancellation and persistence a short bounded window to flush terminal state.
|
||||||
|
- Trigger the internal shutdown signal so the server exits through the same top-level path already used for signals.
|
||||||
|
- Split the backend work by file responsibility:
|
||||||
|
- `src/media/processor.rs`: add restart/shutdown lifecycle methods and transient lifecycle state.
|
||||||
|
- `src/server/mod.rs`: wire new lifecycle routes and internal shutdown signaling into `AppState` and server startup.
|
||||||
|
- `src/server/jobs.rs` or a new dedicated engine/server lifecycle module: implement authenticated handlers for restart/shutdown.
|
||||||
|
- `src/main.rs`: keep the top-level exit behavior but make sure HTTP-triggered shutdown lands in the same path as signal-triggered shutdown.
|
||||||
|
- Update the UI in two passes:
|
||||||
|
- Redesign `web/src/components/AboutDialog.tsx` to match the current visual system and include restart/shutdown actions plus confirmation UX.
|
||||||
|
- Update `web/src/components/HeaderActions.tsx` and any engine-status consumers to understand the new lifecycle states.
|
||||||
|
- Add coverage before shipping:
|
||||||
|
- Backend tests for restart, shutdown, overlapping request rejection, and status payload transitions.
|
||||||
|
- Playwright tests for About modal actions, confirmation dialogs, success flows, disabled/loading states, and failure toasts.
|
||||||
|
|
||||||
|
### 2. Planner and Lifecycle Documentation
|
||||||
|
|
||||||
|
#### Goal
|
||||||
|
- Lock down current behavior before deeper refactors by documenting planner heuristics, hardware fallback rules, and engine lifecycle semantics.
|
||||||
|
|
||||||
|
#### Scope
|
||||||
|
- Document the current planner heuristics and stable skip/transcode/remux decision boundaries.
|
||||||
|
- Document hardware fallback rules and vendor/backend selection semantics.
|
||||||
|
- Document lifecycle semantics for pause, drain, restart, cancel, and shutdown.
|
||||||
|
|
||||||
|
#### Non-Goals
|
||||||
|
- No product behavior changes.
|
||||||
|
- No speculative redesign of the planner or lifecycle model.
|
||||||
|
|
||||||
|
#### Dependencies
|
||||||
|
- Cross-check against the existing backend behavior and tests, not just intended behavior.
|
||||||
|
|
||||||
|
#### Acceptance Criteria
|
||||||
|
- Future cleanup work has a single documented source of truth for planner and lifecycle behavior.
|
||||||
|
- The docs are specific enough to catch accidental behavior changes during refactors.
|
||||||
|
|
||||||
|
#### Required Tests
|
||||||
|
- Add or tighten assertions where documentation work uncovers missing coverage around planner decisions, hardware fallback, or lifecycle states.
|
||||||
|
|
||||||
|
#### Solution
|
||||||
|
|
||||||
|
### 3. Per-File Encode History
|
||||||
|
|
||||||
|
#### Goal
|
||||||
|
- Show a complete attempt history in the job detail panel for files that have been processed more than once.
|
||||||
|
|
||||||
|
#### Scope
|
||||||
|
- Group history by canonical file identity rather than path-only matching.
|
||||||
|
- Show date, outcome, encode stats where applicable, and failure reason where applicable.
|
||||||
|
- Make repeated retries, re-queues after settings changes, and manual reruns understandable at a glance.
|
||||||
|
|
||||||
|
#### Non-Goals
|
||||||
|
- Do not turn this into a general media-management timeline.
|
||||||
|
- Do not rely on path-only grouping when a canonical identity is available.
|
||||||
|
|
||||||
|
#### Dependencies
|
||||||
|
- Query shaping across `jobs`, `encode_stats`, and `job_failure_explanations`.
|
||||||
|
- A stable canonical file identity strategy that survives path changes better than naive path matching.
|
||||||
|
|
||||||
|
#### Acceptance Criteria
|
||||||
|
- Job detail shows prior attempts for the same canonical file identity with enough detail to explain repeated outcomes.
|
||||||
|
- Operators can distinguish retry noise from truly separate processing attempts.
|
||||||
|
|
||||||
|
#### Required Tests
|
||||||
|
- Backend coverage for history lookup and canonical identity grouping.
|
||||||
|
- UI coverage for rendering mixed completed/failed/skipped histories.
|
||||||
|
|
||||||
|
#### Solution
|
||||||
|
|
||||||
|
### 4. Behavior-Preserving Refactor Pass
|
||||||
|
|
||||||
|
#### Goal
|
||||||
|
- Improve internal structure without changing visible product behavior.
|
||||||
|
|
||||||
|
#### Scope
|
||||||
|
- Refactor `web/src/components/JobManager.tsx` into smaller components and hooks without changing screens, filters, polling, SSE updates, or job actions.
|
||||||
|
- Centralize duplicated byte/time/reduction formatting logic into shared utilities while preserving current output formatting.
|
||||||
|
- Preserve the current realtime model, but make ownership clearer: job/config/system events via SSE, resource metrics via polling.
|
||||||
|
- Add regression coverage around planner decisions, watcher behavior, job lifecycle transitions, and decision explanation rendering before deeper refactors.
|
||||||
|
|
||||||
|
#### Non-Goals
|
||||||
|
- No new screens, filters, realtime behaviors, or job actions.
|
||||||
|
- No opportunistic product changes hidden inside the refactor.
|
||||||
|
|
||||||
|
#### Dependencies
|
||||||
|
- Planner/lifecycle documentation and regression coverage should land before deeper structural work.
|
||||||
|
|
||||||
|
#### Acceptance Criteria
|
||||||
|
- Existing behavior, strings, filters, and action flows remain stable.
|
||||||
|
- `JobManager` is decomposed enough that future feature work does not require editing a single monolithic file for unrelated changes.
|
||||||
|
- Realtime ownership is easier to reason about and less likely to regress.
|
||||||
|
|
||||||
|
#### Required Tests
|
||||||
|
- Keep current backend and Playwright suites green.
|
||||||
|
- Add targeted regression coverage before extracting behavior into hooks/components.
|
||||||
|
|
||||||
|
#### Solution
|
||||||
|
|
||||||
|
### 5. AMD AV1 Validation
|
||||||
|
|
||||||
|
#### Goal
|
||||||
|
- Validate and tune the existing AMD AV1 paths on real hardware.
|
||||||
|
|
||||||
|
#### Scope
|
||||||
|
- Cover Linux VAAPI and Windows AMF separately.
|
||||||
|
- Verify encoder selection, fallback behavior, and quality/performance defaults.
|
||||||
|
- Treat this as validation/tuning of existing wiring, not support-from-scratch.
|
||||||
|
|
||||||
|
#### Non-Goals
|
||||||
|
- Do not expand the stable support promise before validation is complete.
|
||||||
|
- Do not invent a fake validation story without real hardware runs.
|
||||||
|
|
||||||
|
#### Dependencies
|
||||||
|
- Access to representative Linux VAAPI and Windows AMF hardware.
|
||||||
|
- Repeatable manual verification notes and any scripted checks that can be automated.
|
||||||
|
|
||||||
|
#### Acceptance Criteria
|
||||||
|
- AMD AV1 is either validated with documented defaults and caveats, or explicitly left outside the supported matrix with clearer docs.
|
||||||
|
- Linux and Windows results are documented separately.
|
||||||
|
|
||||||
|
#### Required Tests
|
||||||
|
- Scripted verification where possible, plus recorded manual validation runs on real hardware.
|
||||||
|
|
||||||
|
#### Solution
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -172,10 +319,14 @@ These are deliberate design decisions, not omissions. Do not add them.
|
|||||||
- Not a cloud encoding service; no public hosting assumed
|
- Not a cloud encoding service; no public hosting assumed
|
||||||
- No raw FFmpeg flag injection (see Out of Scope)
|
- No raw FFmpeg flag injection (see Out of Scope)
|
||||||
|
|
||||||
|
#### Solution
|
||||||
|
|
||||||
### Library Intelligence
|
### Library Intelligence
|
||||||
- Expand recommendations beyond duplicate detection into remux-only opportunities, wasteful audio layouts, commentary/descriptive-track cleanup, and duplicate-ish title variants
|
- Expand recommendations beyond duplicate detection into remux-only opportunities, wasteful audio layouts, commentary/descriptive-track cleanup, and duplicate-ish title variants
|
||||||
- Keep the feature focused on storage and library quality, not general media management
|
- Keep the feature focused on storage and library quality, not general media management
|
||||||
|
|
||||||
|
#### Solution
|
||||||
|
|
||||||
### Auto-Priority Rules
|
### Auto-Priority Rules
|
||||||
- Define rules that automatically assign queue priority based on file attributes
|
- Define rules that automatically assign queue priority based on file attributes
|
||||||
- Rule conditions: file path pattern (glob), file age, file size, source watch folder
|
- Rule conditions: file path pattern (glob), file age, file size, source watch folder
|
||||||
@@ -183,11 +334,15 @@ These are deliberate design decisions, not omissions. Do not add them.
|
|||||||
- Rules evaluated at enqueue time; manual priority overrides still win
|
- Rules evaluated at enqueue time; manual priority overrides still win
|
||||||
- Configured in Settings alongside other library behavior
|
- Configured in Settings alongside other library behavior
|
||||||
|
|
||||||
|
#### Solution
|
||||||
|
|
||||||
### Performance Optimizations
|
### Performance Optimizations
|
||||||
- Profile scanner/analyzer hot paths before changing behavior
|
- Profile scanner/analyzer hot paths before changing behavior
|
||||||
- Only tune connection pooling after measuring database contention under load
|
- Only tune connection pooling after measuring database contention under load
|
||||||
- Consider caching repeated FFprobe calls on identical files if profiling shows probe churn is material
|
- Consider caching repeated FFprobe calls on identical files if profiling shows probe churn is material
|
||||||
|
|
||||||
|
#### Solution
|
||||||
|
|
||||||
### Audio Normalization
|
### Audio Normalization
|
||||||
- Apply EBU R128 loudness normalization to audio streams during transcode
|
- Apply EBU R128 loudness normalization to audio streams during transcode
|
||||||
- Target: -23 LUFS integrated, -1 dBTP true peak (broadcast standard)
|
- Target: -23 LUFS integrated, -1 dBTP true peak (broadcast standard)
|
||||||
@@ -198,9 +353,13 @@ These are deliberate design decisions, not omissions. Do not add them.
|
|||||||
the job detail panel alongside existing encode stats
|
the job detail panel alongside existing encode stats
|
||||||
- Do not normalize if audio is being copied (copy mode bypasses this)
|
- Do not normalize if audio is being copied (copy mode bypasses this)
|
||||||
|
|
||||||
|
#### Solution
|
||||||
|
|
||||||
### UI Improvements
|
### UI Improvements
|
||||||
- Add keyboard shortcuts for common actions
|
- Add keyboard shortcuts for common actions
|
||||||
|
|
||||||
|
#### Solution
|
||||||
|
|
||||||
### Notification Improvements
|
### Notification Improvements
|
||||||
- **Granular event types** — current events are too coarse. Add:
|
- **Granular event types** — current events are too coarse. Add:
|
||||||
- `encode.started` — job moved from queued to encoding
|
- `encode.started` — job moved from queued to encoding
|
||||||
@@ -210,56 +369,82 @@ These are deliberate design decisions, not omissions. Do not add them.
|
|||||||
- `engine.idle` — queue drained, nothing left to process
|
- `engine.idle` — queue drained, nothing left to process
|
||||||
- `daily.summary` — opt-in digest of the day's activity
|
- `daily.summary` — opt-in digest of the day's activity
|
||||||
- **Per-target event filtering** — each notification target should
|
- **Per-target event filtering** — each notification target should
|
||||||
independently choose which events it receives. Currently all targets
|
independently choose which events it receives. Currently, all targets
|
||||||
get the same events. A Discord webhook might want everything; a
|
get the same events. A Discord webhook might want everything; a
|
||||||
phone webhook might only want failures.
|
phone webhook might only want failures.
|
||||||
- **Richer payloads** — completed job notifications should include
|
- **Richer payloads** — completed job notifications should include
|
||||||
filename, input size, output size, space saved, and encode time.
|
filename, input size, output size, space saved, and encode time.
|
||||||
Currently the payload is minimal.
|
Currently, the payload is minimal.
|
||||||
- **Add Telegram integration** — bot token + chat ID, same event
|
- **Add Telegram integration** — bot token + chat ID, same event
|
||||||
model as Discord. No new dependencies needed (reqwest already present).
|
model as Discord. No new dependencies needed (reqwest already present).
|
||||||
|
- **Improve Discord notifications** — add bot token support where it meaningfully improves delivery or richer messaging.
|
||||||
- **Add email support** — SMTP with TLS. Lower priority than Telegram.
|
- **Add email support** — SMTP with TLS. Lower priority than Telegram.
|
||||||
Most self-hosters already have Discord or Telegram.
|
Most self-hosters already have Discord or Telegram.
|
||||||
|
|
||||||
|
#### Solution
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Low Priority
|
## Low Priority
|
||||||
|
|
||||||
|
### Planning / Simulation Mode
|
||||||
|
- Not a current focus. If revisited, start with a single current-config dry-run before attempting comparison mode.
|
||||||
|
- Add a first-class simulation flow that answers what Alchemist would transcode, remux, or skip without mutating the library.
|
||||||
|
- Show estimated total bytes recoverable, action counts, top skip reasons, and per-file predicted actions.
|
||||||
|
- Reuse the scanner, analyzer, and planner, but stop before executor and promotion stages.
|
||||||
|
- Only add profile/codec/threshold comparison snapshots after the simple single-config flow proves useful.
|
||||||
|
|
||||||
|
#### Solution
|
||||||
|
|
||||||
### API Token Authentication + API Documentation
|
### API Token Authentication + API Documentation
|
||||||
- Add support for static bearer tokens as an alternative to session cookies
|
- Add support for static bearer tokens as an alternative to session cookies
|
||||||
- Enables programmatic access from scripts, home automation (Home Assistant, n8n), and CLI tools without managing session state
|
- Enables programmatic access from scripts, home automation (Home Assistant, n8n), and CLI tools without managing session state
|
||||||
- Tokens generated and revoked from Settings; no expiry by default, revocable any time
|
- Tokens generated and revoked from Settings; no expiry by default, revocable any time
|
||||||
- Expand API documentation to cover all endpoints with request/response examples
|
- Expand API documentation to cover all endpoints with request/response examples
|
||||||
|
|
||||||
|
#### Solution
|
||||||
|
|
||||||
### Passthrough Mode
|
### Passthrough Mode
|
||||||
- A toggle that keeps all watch folders and watcher active but prevents the planner from queuing new jobs
|
- A toggle that keeps all watch folders and watcher active but prevents the planner from queuing new jobs
|
||||||
- Different from Pause — Pause stops active encodes; Passthrough lets the system observe and index the library without touching anything
|
- Different from Pause — Pause stops active encodes; Passthrough lets the system observe and index the library without touching anything
|
||||||
- Useful when testing settings or onboarding a new library without triggering encodes immediately
|
- Useful when testing settings or onboarding a new library without triggering encodes immediately
|
||||||
|
|
||||||
|
#### Solution
|
||||||
|
|
||||||
### Base URL / Subpath Configuration
|
### Base URL / Subpath Configuration
|
||||||
- Allow Alchemist to be served at a non-root path (e.g. `/alchemist/`) via `ALCHEMIST_BASE_URL`
|
- Allow Alchemist to be served at a non-root path (e.g. `/alchemist/`) via `ALCHEMIST_BASE_URL`
|
||||||
- Common self-hosting pattern for reverse proxy setups running multiple services on one domain
|
- Common self-hosting pattern for reverse proxy setups running multiple services on one domain
|
||||||
- Low urgency — most users run Alchemist on a dedicated subdomain or port
|
- Low urgency — most users run Alchemist on a dedicated subdomain or port
|
||||||
|
|
||||||
|
#### Solution
|
||||||
|
|
||||||
### Features from DESIGN_PHILOSOPHY.md
|
### Features from DESIGN_PHILOSOPHY.md
|
||||||
- Add batch job templates
|
- Add batch job templates
|
||||||
|
|
||||||
|
#### Solution
|
||||||
|
|
||||||
### Code Quality
|
### Code Quality
|
||||||
- Increase test coverage for edge cases
|
- Increase test coverage for edge cases
|
||||||
- Add property-based testing for codec parameter generation
|
- Add property-based testing for codec parameter generation
|
||||||
- Add fuzzing for FFprobe output parsing
|
- Add fuzzing for FFprobe output parsing
|
||||||
|
|
||||||
|
#### Solution
|
||||||
|
|
||||||
### Documentation
|
### Documentation
|
||||||
- Add architecture diagrams
|
- Add architecture diagrams
|
||||||
- Add contributor guide with development setup
|
- Add contributor guide with development setup
|
||||||
- Video tutorials for common workflows
|
- Video tutorials for common workflows
|
||||||
|
|
||||||
|
#### Solution
|
||||||
|
|
||||||
### Distribution
|
### Distribution
|
||||||
- Add Homebrew formula
|
- Add Homebrew formula
|
||||||
- Add AUR package
|
- Add AUR package
|
||||||
- Add Flatpak/Snap packages
|
- Add Flatpak/Snap packages
|
||||||
- Improve Windows installer (WiX) with auto-updates
|
- Improve Windows installer (WiX) with auto-updates
|
||||||
|
|
||||||
|
#### Solution
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Completed (Recent)
|
## Completed (Recent)
|
||||||
|
|||||||
124
security_best_practices_report.md
Normal file
124
security_best_practices_report.md
Normal file
@@ -0,0 +1,124 @@
|
|||||||
|
# Security Best Practices Report
|
||||||
|
|
||||||
|
## Executive Summary
|
||||||
|
|
||||||
|
I found one critical security bug and one additional high-severity issue in the setup/bootstrap flow.
|
||||||
|
|
||||||
|
The critical problem is that first-run setup is remotely accessible without authentication while the server listens on `0.0.0.0`. A network-reachable attacker can win the initial setup race, create the first admin account, and take over the instance.
|
||||||
|
|
||||||
|
I did not find evidence of major client-side XSS sinks or obvious SQL injection paths during this audit. Most of the remaining concerns I saw were hardening-level issues rather than immediately exploitable major bugs.
|
||||||
|
|
||||||
|
## Critical Findings
|
||||||
|
|
||||||
|
### ALCH-SEC-001
|
||||||
|
|
||||||
|
- Severity: Critical
|
||||||
|
- Location:
|
||||||
|
- `src/server/middleware.rs:80-86`
|
||||||
|
- `src/server/wizard.rs:95-210`
|
||||||
|
- `src/server/mod.rs:176-197`
|
||||||
|
- `README.md:61-79`
|
||||||
|
- Impact: Any attacker who can reach the service before the legitimate operator completes setup can create the first admin account and fully compromise the instance.
|
||||||
|
|
||||||
|
#### Evidence
|
||||||
|
|
||||||
|
`auth_middleware` exempts the full `/api/setup` namespace from authentication:
|
||||||
|
|
||||||
|
- `src/server/middleware.rs:80-86`
|
||||||
|
|
||||||
|
`setup_complete_handler` only checks `setup_required` and then creates the user, session cookie, and persisted config:
|
||||||
|
|
||||||
|
- `src/server/wizard.rs:95-210`
|
||||||
|
|
||||||
|
The server binds to all interfaces by default:
|
||||||
|
|
||||||
|
- `src/server/mod.rs:176-197`
|
||||||
|
|
||||||
|
The documented Docker quick-start publishes port `3000` directly:
|
||||||
|
|
||||||
|
- `README.md:61-79`
|
||||||
|
|
||||||
|
#### Why This Is Exploitable
|
||||||
|
|
||||||
|
On a fresh install, or any run where `setup_required == true`, the application accepts unauthenticated requests to `/api/setup/complete`. Because the listener binds `0.0.0.0`, that endpoint is reachable from any network that can reach the host unless an external firewall or reverse proxy blocks it.
|
||||||
|
|
||||||
|
That lets a remote attacker:
|
||||||
|
|
||||||
|
1. POST their own username and password to `/api/setup/complete`
|
||||||
|
2. Receive the initial authenticated session cookie
|
||||||
|
3. Persist attacker-controlled configuration and start operating as the admin user
|
||||||
|
|
||||||
|
This is a full-authentication-bypass takeover of the instance during bootstrap.
|
||||||
|
|
||||||
|
#### Recommended Fix
|
||||||
|
|
||||||
|
Require setup completion to come only from a trusted local origin during bootstrap, matching the stricter treatment already used for `/api/fs/*` during setup.
|
||||||
|
|
||||||
|
Minimal safe options:
|
||||||
|
|
||||||
|
1. Restrict `/api/setup/*` and `/api/settings/bundle` to loopback-only while `setup_required == true`.
|
||||||
|
2. Alternatively require an explicit one-time bootstrap secret/token generated on startup and printed locally.
|
||||||
|
3. Consider binding to `127.0.0.1` by default until setup is complete, then allowing an explicit public bind only after bootstrap.
|
||||||
|
|
||||||
|
#### Mitigation Until Fixed
|
||||||
|
|
||||||
|
- Do not expose the service to any network before setup is completed.
|
||||||
|
- Do not publish the container port directly on untrusted networks.
|
||||||
|
- Complete setup only through a local-only tunnel or host firewall rule.
|
||||||
|
|
||||||
|
## High Findings
|
||||||
|
|
||||||
|
### ALCH-SEC-002
|
||||||
|
|
||||||
|
- Severity: High
|
||||||
|
- Location:
|
||||||
|
- `src/server/middleware.rs:116-117`
|
||||||
|
- `src/server/settings.rs:244-285`
|
||||||
|
- `src/config.rs:366-390`
|
||||||
|
- `src/main.rs:369-383`
|
||||||
|
- `src/db.rs:2566-2571`
|
||||||
|
- Impact: During setup mode, an unauthenticated remote attacker can read and overwrite the full runtime configuration; after `--reset-auth`, this can expose existing notification endpoints/tokens and let the attacker reconfigure the instance before the operator reclaims it.
|
||||||
|
|
||||||
|
#### Evidence
|
||||||
|
|
||||||
|
While `setup_required == true`, `auth_middleware` explicitly allows `/api/settings/bundle` without authentication:
|
||||||
|
|
||||||
|
- `src/server/middleware.rs:116-117`
|
||||||
|
|
||||||
|
`get_settings_bundle_handler` returns the full `Config`, and `update_settings_bundle_handler` writes an attacker-supplied `Config` back to disk and runtime state:
|
||||||
|
|
||||||
|
- `src/server/settings.rs:244-285`
|
||||||
|
|
||||||
|
The config structure includes notification targets and optional `auth_token` fields:
|
||||||
|
|
||||||
|
- `src/config.rs:366-390`
|
||||||
|
|
||||||
|
`--reset-auth` only clears users and sessions, then re-enters setup mode:
|
||||||
|
|
||||||
|
- `src/main.rs:369-383`
|
||||||
|
- `src/db.rs:2566-2571`
|
||||||
|
|
||||||
|
#### Why This Is Exploitable
|
||||||
|
|
||||||
|
This endpoint is effectively a public config API whenever the app is in setup mode. On a brand-new install that broadens the same bootstrap attack surface as ALCH-SEC-001. On an existing deployment where an operator runs `--reset-auth`, the previous configuration remains on disk while authentication is removed, so a remote caller can:
|
||||||
|
|
||||||
|
1. GET `/api/settings/bundle` and read the current config
|
||||||
|
2. Learn configured paths, schedules, webhook targets, and any stored notification bearer tokens
|
||||||
|
3. PUT a replacement config before the legitimate operator finishes recovery
|
||||||
|
|
||||||
|
That creates both confidential-data exposure and unauthenticated remote reconfiguration during recovery/bootstrap windows.
|
||||||
|
|
||||||
|
#### Recommended Fix
|
||||||
|
|
||||||
|
Do not expose `/api/settings/bundle` anonymously.
|
||||||
|
|
||||||
|
Safer options:
|
||||||
|
|
||||||
|
1. Apply the same loopback-only setup restriction used for `/api/fs/*`.
|
||||||
|
2. Split bootstrap-safe fields from privileged configuration and expose only the minimal bootstrap payload anonymously.
|
||||||
|
3. Redact secret-bearing config fields such as notification tokens from any unauthenticated response path.
|
||||||
|
|
||||||
|
## Notes
|
||||||
|
|
||||||
|
- I did not find a major DOM-XSS path in `web/src`; there were no `dangerouslySetInnerHTML`, `innerHTML`, `insertAdjacentHTML`, `eval`, or similar high-risk sinks in the audited code paths.
|
||||||
|
- I also did not see obvious raw SQL string interpolation issues; the database code I reviewed uses parameter binding.
|
||||||
74
stitch_findings.md
Normal file
74
stitch_findings.md
Normal file
@@ -0,0 +1,74 @@
|
|||||||
|
# Alchemist Project Audit & Findings
|
||||||
|
|
||||||
|
This document provides a comprehensive audit of the Alchemist media transcoding project (v0.3.0-rc.3), covering backend architecture, frontend design, database schema, and operational workflows.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. Project Architecture & Pipeline
|
||||||
|
|
||||||
|
Alchemist implements a robust, asynchronous media transcoding pipeline managed by a central `Agent`. The pipeline follows a strictly ordered lifecycle:
|
||||||
|
|
||||||
|
1. **Scanner (`src/media/scanner.rs`):** Performs a high-speed traversal of watch folders. It uses `mtime_hash` (seconds + nanoseconds) to detect changes without full file analysis, efficiently handling re-scans and minimizing DB writes.
|
||||||
|
2. **Analyzer (`src/media/analyzer.rs`):** Executes `ffprobe` to extract normalized media metadata (codecs, bit depth, BPP, bitrate). Analysis results are used to populate the `DetailedEncodeStats` and `Decision` tables.
|
||||||
|
3. **Planner (`src/media/planner.rs`):** A complex decision engine that evaluates whether to **Skip**, **Remux**, or **Transcode** a file based on user profiles.
|
||||||
|
* *Finding:* The planning logic is heavily hardcoded with "magic thresholds" (e.g., Bits-per-pixel thresholds). While effective, these could be more exposed as "Advanced Settings" in the UI.
|
||||||
|
4. **Executor (`src/media/executor.rs`):** Orchestrates the `ffmpeg` process. It dynamically selects encoders (NVENC, VAAPI, QSV, ProRes, or CPU fallback) based on the target profile and host hardware capabilities detected in `src/system/hardware.rs`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. Backend & API Design (Rust/Axum)
|
||||||
|
|
||||||
|
* **Concurrency:** Utilizes `tokio` for async orchestration and `rayon` for CPU-intensive tasks (like file hashing or list processing). The scheduler supports multiple concurrency modes: `Background` (1 job), `Balanced` (capped), and `Throughput` (uncapped).
|
||||||
|
* **State Management:** The backend uses `broadcast` channels to separate high-volume events (Progress, Logs) from low-volume system events (Config updates). This prevents UI "flicker" and unnecessary re-renders in the frontend.
|
||||||
|
* **API Structure:**
|
||||||
|
* **RESTful endpoints** for jobs, settings, and stats.
|
||||||
|
* **SSE (`src/server/sse.rs`)** for real-time progress updates, ensuring a reactive UI without high-frequency polling.
|
||||||
|
* **Auth (`src/server/auth.rs`):** Implements JWT-based authentication with Argon2 hashing for the initial setup.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. Database Schema (SQLite/SQLx)
|
||||||
|
|
||||||
|
* **Stability:** The project uses 16+ migrations, showing a mature evolution from a simple schema to a sophisticated job-tracking system.
|
||||||
|
* **Decision Logging:** The `decisions` and `job_failure_explanations` tables are a standout feature. They store the "why" behind every action as structured JSON, which is then humanized in the UI (e.g., explaining exactly why a file was skipped).
|
||||||
|
* **Data Integrity:** Foreign keys and WAL (Write-Ahead Logging) mode ensure database stability even during heavy concurrent I/O.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. Frontend Design (Astro/React/Helios)
|
||||||
|
|
||||||
|
* **Stack:** Astro 5 provides a fast, static-first framework with React 18 handles the complex stateful dashboards.
|
||||||
|
* **Design System ("Helios"):**
|
||||||
|
* *Identity:* A dark-themed, data-dense industrial aesthetic.
|
||||||
|
* *Findings:* While functional, the system suffers from "component bloat." `JobManager.tsx` (~2,000 lines) is a significant maintainability risk. It contains UI logic, filtering logic, and data transformation logic mixed together.
|
||||||
|
* **Data Visualization:** Uses `recharts` for historical trends and performance metrics.
|
||||||
|
* *Improvement:* The charts are currently static snapshots. Adding real-time interactivity (brushing, zooming) would improve the exploration of large datasets.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5. System & Hardware Integration
|
||||||
|
|
||||||
|
* **Hardware Discovery:** `src/system/hardware.rs` is extensive, detecting NVIDIA, Intel, AMD, and Apple Silicon capabilities. It correctly maps these to `ffmpeg` encoder flags.
|
||||||
|
* **FS Browser:** A custom filesystem browser (`src/system/fs_browser.rs`) allows for secure directory selection during setup, preventing path injection and ensuring platform-agnostic path handling.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 6. Critical Areas for Improvement
|
||||||
|
|
||||||
|
### **Maintainability (High Priority)**
|
||||||
|
* **Decouple `JobManager.tsx`:** Refactor into functional hooks (`useJobs`, `useFilters`) and smaller, presentation-only components.
|
||||||
|
* **Standardize Formatters:** Move `formatBytes`, `formatTime`, and `formatReduction` into a centralized `lib/formatters.ts` to reduce code duplication across the Dashboard and Stats pages.
|
||||||
|
|
||||||
|
### **UX & Performance (Medium Priority)**
|
||||||
|
* **Polling vs. SSE:** Ensure all real-time metrics (like GPU temperature) are delivered via SSE rather than periodic polling to reduce backend load and improve UI responsiveness.
|
||||||
|
* **Interactive Decision Explanations:** The current skip reasons are helpful but static. Adding links to the relevant settings (e.g., "Change this threshold in Transcoding Settings") would close the loop for users.
|
||||||
|
|
||||||
|
### **Reliability (Low Priority)**
|
||||||
|
* **E2E Testing:** While Playwright tests exist, they focus on "reliability." Expanding these to cover complex "edge cases" (like network-attached storage disconnects during a scan) would improve long-term stability.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 7. Stitch Recommendation
|
||||||
|
Use Stitch to generate **atomic component refinements** based on this audit.
|
||||||
|
* *Prompt Example:* "Refine the JobTable row to use iconic status indicators with tooltips for skip reasons, as outlined in the Alchemist Audit."
|
||||||
|
* *Prompt Example:* "Create a unified `Formatter` utility library in TypeScript that handles bytes, time, and percentage formatting for the Helios design system."
|
||||||
Reference in New Issue
Block a user