TURFI

Documentation Search

Search only within Turfi documentation pages.

Video Inventory & Media Lifecycle

Governed physical video inventory, storage paths, lifecycle states, database invariants, duplicate handling, orphan detection, upload API contracts, and operational jobs.

IMPLEMENTED

This document is internal platform documentation for engineers. It defines the governed physical video inventory (public.videos), its relationship to Supabase Storage, and the non-negotiable rules enforced in the database and at the application API boundary.

If implementation work contradicts this document, the implementation is wrong until the document is formally revised.

Shared status legend: [docs/_shared/status-legend.md](./_shared/status-legend.md)

A. System overview

Why this system exists

Turfi stores large binary video files in object storage and stores authoritative metadata and lineage in PostgreSQL. Without a strict inventory layer, the platform experiences:

Duplicate files: the same bytes uploaded multiple times under different paths, with no canonical identity.
Partial uploads: database rows claiming a file exists when the object is missing, truncated, or never finalized.
Missing relationships: derived outputs (moments, processed renditions) detached from the source they were generated from.
Storage versus database drift: objects present in the bucket with no row, or rows pointing at paths that do not exist.

The video inventory system exists to eliminate drift by making the database the system of record for inventory metadata (which rows exist, paths, lifecycle)—distinct from the platform-wide consensus model for match facts in Platform Philosophy and Scope. For physical files, the inventory layer defines:

what files are allowed to exist,
where they live,
what lifecycle state they are in,
how they relate to each other.

Source of truth (inventory layer)

The database (public.videos and related registry rows) is the source of truth for inventory identity, lifecycle, lineage, and the canonical storage_path string.
Object storage is a dumb byte layer. Storage holds opaque files at paths that the application writes only through governed flows. Storage never “decides” lifecycle, duplicates, or relationships.

Problems this system solves (assertively)

Duplicate files: duplicate candidates are detected using metadata and checksum (SHA-256 hex) as the canonical duplicate key for completed and ready inventory. Rows may reference a prior row via duplicate_of_video_id when the operator accepts duplicate linkage.
Partial uploads: source rows begin in uploading. They must not transition to completed without a valid checksum and finalized metadata. Session finalization abandons competing rows when one source completes (see invariants).
Missing relationships: derived rows must set source_video_id to a valid source row. Database triggers enforce group consistency with the parent source.
Drift: orphan detection records storage paths that exist without a matching videos.storage_path. Purge operations are explicit and never silent.

B. Core concepts

1. Source video

A source row represents the original upload for a governed path family under /videos/{source_video_id}/….
The video_type lookup value must be source.
source_video_id must be NULL on a source row.
For a given upload_session_id, at most one source row must reach completed (enforced by a partial unique index on upload session for completed sources).
The source row’s id must match the folder UUID in storage_path for governed layout (see storage section).

2. Derived videos

Derived rows include moments and processed renditions (and any future derived kinds modeled under the same video_type contract).
source_video_id must reference the source row that anchors the /videos/{source_video_id}/ prefix.
video_group_id must equal the parent source’s video_group_id (enforced by triggers after denormalization).
Derived rows must not claim is_primary = true.

3. Video group

video_group_id ties all related inventory rows (source, duplicates in the same logical group, derived outputs) to one group root.
Exactly one row in a group must have is_primary = true (partial unique index on video_group_id where is_primary).
The primary row must be a source row (video_type = source) and duplicate_of_video_id must be NULL (enforced in trg_videos_99_strict_validate).

4. Upload session

upload_session_id groups all rows created during one logical upload attempt chain (for example: duplicate candidates created alongside the primary attempt).
Only one valid completed source must exist per session: the partial unique index videos_one_completed_source_per_upload_session enforces this at the database level.
When a source transitions to completed, the trg_videos_after_complete_finalize_session trigger must set all other rows sharing the same upload_session_id to abandoned (or equivalent terminal non-success states per migration) and must clear is_primary on those abandoned rows. This prevents ghost duplicate sources inside the same session.

5. Video lifecycle states (`video_status` lookup)

These keys are defined in lookup_values under lookup type video_status. Application code must treat unknown statuses as invalid.

Status	Meaning
`uploading`	Row exists; bytes are expected or in flight. Governed source uploads must be `uploading` before `completed`.
`completed`	Source upload is finalized in inventory terms: file is committed for that row’s `storage_path`, with required checksum for this status.
`processing`	Derived (or pipeline) work is active. This status is not terminal. Workers must transition rows out of `processing` to `ready` or `processing_failed`.
`ready`	Derived output is usable. Checksum is required for `ready` (same rule as `completed`).
`failed`	Upload or validation failed for a row (terminal failure distinct from processing failure semantics; exact usage is workflow-specific but must remain terminal unless a new row supersedes it).
`abandoned`	Superseded by session finalization or operator cleanup; must not be treated as active inventory.
`processing_failed`	Processing job failed or timed out. `processing_completed_at` must be set when entering this state. Rows must not remain stuck in `processing` indefinitely.

Rule: completed and ready always require a non-empty checksum (database trigger enforcement).

C. Database model (`public.videos`)

The authoritative schema evolves through additive SQL migrations. This section documents the intent of key fields; engineers must read the latest migrations for exact constraints.

Identity and classification

id (uuid): Primary key. For source rows, this UUID must appear as the directory name in governed storage_path.
video_type_id: FK to lookup_values for lookup type video_type. Must be source, moment, processed, or future allowed keys only as defined by migrations.
status_id: FK to lookup_values for lookup type video_status. Must always reflect the true lifecycle state.

Lineage and grouping

source_video_id: NULL for sources; non-NULL for derived rows, pointing at the source row.
video_group_id: Logical group. Must match the group of the parent source and duplicate targets (enforced by triggers).
is_primary: At most one true per video_group_id. Must be true only for the canonical source in the group, never for duplicates linked via duplicate_of_video_id.
duplicate_of_video_id: Optional link to an earlier inventory row when this row is a duplicate of existing media. Must be NULL when is_primary = true.

Upload session and replacement

upload_session_id: Correlates rows from one upload attempt. Must obey one completed source per session (index).
replaced_by_video_id: Optional chain when a derived output is superseded by a newer inventory row. Use only as governed by product workflows; never as a substitute for source_video_id lineage rules.

Physical file facts

storage_path: Canonical storage key for the object (for example videos/{uuid}/source.mp4). Must be unique across all rows (unique index). Must match governed patterns when the strict check constraint is active in the environment.
checksum: SHA-256 digest, lowercase hex, 64 characters. Required whenever status is completed or ready. This is the canonical duplicate-detection key for finalized files.

Processing telemetry

processing_job_id: Optional reference to a processing job record (when modeled). Must be treated as diagnostic; lifecycle must still be reflected in status_id and timestamps.
processing_started_at, processing_completed_at: Must be set by the workflows that own processing transitions. processing_completed_at must be set when entering ready, processing_failed, or any terminal processing outcome that stops work.

Versioning

derivation_generation: Integer version for regeneration of derived outputs within a group. Must increment only according to defined workflows.

Optional future-facing column

camera_angle_id: Optional FK into lookup_values for multi-angle capture. Not enforced by business rules yet; exists so schema and admin surfaces can grow without another breaking migration.

Supporting tables and views

video_storage_orphans: Registry of bucket + storage_path pairs found in storage scans without a matching videos.storage_path. Must flow through detected → confirmed → purged operations; never delete storage based only on a scan without the governed purge path.
v_videos_source_inventory: Read model of valid source inventory for admin listing: must include only sources whose status is in the completed/ready family and whose checksum is present (definition in migration).
v_video_processing_queue: Operational view for monitoring sources in uploading, processing, or processing_failed, including derived moment counts and timing fields as defined in migration.

D. Invariants (critical)

These rules always hold in a correct deployment. They are enforced by database constraints and triggers and by API guards where applicable. Violations are bugs, not edge cases.

Every governed storage file maps to exactly one videos row for paths under the governed videos/ layout used by inventory. Never write governed paths without a matching row and storage_path equality.
Every videos row maps to exactly one storage object for that row’s storage_path in normal operation. Drift is handled only via orphan detection and explicit purge.
Only one primary video per group: is_primary = true appears at most once per video_group_id.
Primary video must be a source: is_primary = true implies video_type = source and duplicate_of_video_id IS NULL (trigger).
Derived videos must reference a source: non-source types must have source_video_id set to a source row (enforced by trg_videos_enforce_type_and_parent).
Only one completed source per upload session (partial unique index).
No row remains in processing forever: workers must call failure or success transitions; use processing_failed with processing_completed_at on failure or timeout.
completed and ready require checksum: enforced in trg_videos_99_strict_validate.
Storage paths must follow the governed pattern when the strict regex check constraint is enabled in the environment (skipped only during legacy backfill; must be enabled after cleanup).
Group consistency: if source_video_id is set, video_group_id must match the parent source’s video_group_id. If duplicate_of_video_id is set, video_group_id must match the duplicate target’s video_group_id (trigger).
Session finalization: when a source reaches completed, all other rows with the same upload_session_id must be terminalized as abandoned (per migration) and must not remain competing sources.

E. Storage structure

Governed layout (inventory videos)

These paths are the only governed shapes for rows whose storage_path lives under the videos/ prefix:

/videos/{source_video_id}/source.mp4
/videos/{source_video_id}/moments/{video_id}.mp4
/videos/{source_video_id}/processed/{video_id}.mp4

Notes:

{source_video_id} must equal the source row UUID for that folder.
{video_id} in moments and processed must equal the derived row’s id.
The storage_path stored in videos does not include a leading slash in application usage in all call sites; normalize comparisons accordingly. The logical layout remains as above.

Why this layout exists

Stable routing: all outputs for a match or capture bundle live under one UUID prefix.
Operational clarity: administrators, jobs, and support tooling list by source_video_id.
Authorization: API guards parse paths and must reject writes that do not match a row’s id, storage_path, and allowed status.

What breaks when violated

API upload authorization fails: assertAuthorizedVideoBucketUpload rejects the operation.
Database inserts/updates fail: check constraints (when enabled) and triggers reject inconsistent storage_path values.
Monitoring views lie: paths that do not match conventions will not align with admin tooling expectations.

Highlights and non-inventory paths

Paths under highlights/ follow separate product rules. The upload API allows highlights/ writes on the video bucket without a videos row check. Do not treat highlights/ objects as governed videos inventory unless explicitly linked by product logic.

F. Upload flow (source)

This is the required sequence for governed source uploads through the application:

Create the videos row with video_type = source, status = uploading, storage_path = videos/{new_uuid}/source.mp4, upload_session_id set, is_primary set according to duplicate policy, and duplicate_of_video_id set only when explicitly creating a duplicate candidate row.
Request a signed upload URL via POST /api/media/upload-url with bucket and path. The route must pass assertAuthorizedVideoBucketUpload: the row must exist, storage_path must match exactly, status must be uploading for source paths.
Upload bytes using the signed resumable flow (client uses tus with the returned token). Alternative: POST /api/media/upload with the same bucket and path query parameters; the same authorization gate applies.
Compute SHA-256 of the local file bytes on the client before or after upload (client must compute the same hash the server will persist).
Mark completed only through completeSourceVideoUpload, passing checksum as 64-char lowercase hex. The server rejects invalid checksum format. The database rejects missing checksum on completed.
Session finalization runs in the database: competing rows in the same upload_session_id are abandoned automatically when the source hits completed.

Rejection conditions (non-exhaustive)

Missing or mismatched storage_path for the UUID in the path.
status not uploading for source upload.
Invalid checksum format or missing checksum at completed.
Attempted write to videos/ paths without passing server API authorization (direct client writes that bypass the API must be blocked by storage policies in production; the application must route through the API for governed inventory).

G. Processing flow

When processing starts

Derived rows (for example moments) are created with status = processing (or uploading first if the workflow stages bytes separately—must still end in ready or processing_failed with checksum rules satisfied for ready).
processing_started_at must be set when entering processing if the workflow uses that column.

Worker responsibilities

Workers must poll or subscribe to job sources as designed, update processing_job_id if used, and must transition status:
to ready with checksum set (same SHA-256 requirement) and processing_completed_at, or
to processing_failed with processing_completed_at on failure or timeout.
Workers must call markVideoProcessingFailed (authenticated) or markVideoProcessingFailedServiceRole (no user session) instead of leaving rows in processing.

Moment creation

createMomentDerivedVideo inserts a derived row with source_video_id, storage_path, and processing state as implemented. Must preserve video_group_id consistency (trigger-maintained).

Terminal outcomes

processing → ready: requires checksum in the same update path as ready.
processing → processing_failed: must set processing_completed_at.

H. Duplicate handling

Session behavior

Multiple rows may exist in one upload_session_id during duplicate candidate creation. Only one source must reach completed per session (index + finalization trigger).

Detection logic

Checksum on completed and ready rows is the canonical duplicate key for identical bytes.
Additional heuristics (filename, size) may exist for candidate surfacing; must not replace checksum as the authoritative identity test for finalized media.

`duplicate_of_video_id`

When populated, indicates this row is a duplicate of an earlier inventory record. Must respect group rules: is_primary must be false for duplicates as enforced by validation triggers.

I. Orphan management

What an orphan is

An orphan is an object in storage under the scanned prefix (for example videos/) whose path is not exactly equal to any videos.storage_path value.

Detection job

Service role scanning lists storage recursively and compares to the database set.
GET /api/cron/video-storage-orphans with Authorization: Bearer ${CRON_SECRET} persists orphans into video_storage_orphans with status = detected using persistDetectedVideoStorageOrphansServiceRole.

Confirm / purge workflow

Operators review video_storage_orphans, confirm rows intended for deletion, then purge storage through the governed API that requires confirmed state before removal.
Safety rule: Never delete storage objects based solely on a local guess; must use the confirm/purge pipeline so audit fields are populated.

J. API contracts

Upload endpoints (Next.js routes)

POST /api/media/upload-url: JSON body { bucket, path }. Must be authenticated (session cookie). For the configured video bucket, videos/ paths must pass assertAuthorizedVideoBucketUpload. highlights/ paths must pass without a videos row check.
POST /api/media/upload: query params bucket, path; body is raw bytes. Same authorization as upload-url.

Status and inventory functions (server modules)

startSourceVideoUpload: creates uploading source row and governed storage_path.
completeSourceVideoUpload: requires checksum (SHA-256 hex); sets completed.
failSourceVideoUpload: fails upload attempt.
setVideoStatusReady: sets ready, requires checksum in the update.
markVideoProcessingFailed / markVideoProcessingFailedServiceRole: sets processing_failed and processing_completed_at.
createMomentDerivedVideo: creates derived inventory rows tied to a completed or ready source.

Failure handling

API routes must return explicit HTTP errors when authorization fails (400/403/500 as implemented). Never swallow authorization failures.

Checksum requirements

Client and server must use 64-character lowercase hex SHA-256.
Invalid format is rejected before database update in assertValidInventoryChecksumHex.

K. Operational jobs

Cleanup job

GET /api/cron/video-inventory-cleanup: Authorization: Bearer ${CRON_SECRET} when CRON_SECRET is set. Invokes fn_video_cleanup_stale_inventory via runVideoInventoryCleanupRpc. Must run on a schedule in each environment that requires stale inventory hygiene.

Orphan detection job

GET /api/cron/video-storage-orphans: same CRON_SECRET scheme. Persists orphan paths to video_storage_orphans.

Scheduling expectations

Cleanup: hourly or daily depending on volume; must run in production.
Orphans: daily or weekly; must run in production so drift is detected without manual action.

L. Common failure scenarios

Partial upload

Row remains uploading until failSourceVideoUpload, operator cleanup RPC, or timeout policies delete stale rows. Must not mark completed without a full file and checksum.

Duplicate upload

Session creates multiple candidates; one completes; others become abandoned via finalization. Must rely on checksum for post-hoc duplicate identification.

Failed processing

Transition to processing_failed with processing_completed_at. Never leave processing silent.

Orphan files

Appear in video_storage_orphans. Must be confirmed, then purged through the governed purge path.

M. Future extensions

These items are planned or partially modeled; must not bypass current invariants when implemented:

Multi-angle video: camera_angle_id exists for main / sideline / tactical / goalie lookups; enforcement is intentionally future-facing.
AI detection pipelines: must create inventory rows before writing outputs; must attach lineage to source_video_id.
Analytics: must read from videos and views, never infer ownership from storage alone.
Monetization: must treat governed videos rows as the billing/entitlement anchor for physical media, not raw URLs alone.

N. Video diagnostics (admin operational layer)

Why main inventory stays narrow

The main source inventory (v_videos_source_inventory) must list only source rows with completed or ready, a non-empty checksum, and the view definition in migrations. Derived rows, abandoned rows, uploading rows, and duplicate-linked rows must not appear there.
Storage can still hold additional objects under the same governed prefix: partial uploads, superseded attempts, orphan bytes, or derived outputs not yet linked the way operators expect.

Product principle

Governed inventory remains intentionally clean for operators and downstream features.
Video diagnostics (/admin/media-operations/video-diagnostics) is the admin-only bridge between:
inventory truth in PostgreSQL, and
raw storage reality in the bucket.

What diagnostics exposes

Search across videos by id, source_video_id, upload_session_id, storage_path, original_filename, status, type, and filters for processing failure, abandonment, duplicate linkage, and orphan presence under the governed folder.
Detail for one row:
Canonical summary and a plain-language explanation of main inventory visibility (same rules as explainMainInventoryVisibility / v_videos_source_inventory).
Related DB rows collected by video_group_id, upload_session_id, source_video_id, duplicate_of_video_id, and replaced_by_video_id (strongest key: group).
Storage artifacts under videos/{source_folder_id}/ via service-role recursive listing, tagged as mapped, related, orphan, suspected_duplicate, or unknown_artifact by comparing videos.storage_path, video_storage_orphans, and heuristics.
Orphan registry rows matching the folder prefix.
Duplicate checksum groups among related rows.

Admin actions (server-guarded)

Diagnostics actions must respect database triggers and invariants. They must not bypass assertAuthorizedVideoBucketUpload for normal uploads; they exist to repair or classify inventory after human review.

Mark as duplicate of a canonical source (duplicate_of_video_id, is_primary = false, group inherited from target).
Set group primary on a source row with duplicate_of_video_id null (clears other primaries in the group, then sets target).
Reclassify video_type to source / moment / processed with valid source_video_id for derived types (database rejects invalid parentage).
Mark terminal status: failed, abandoned, processing_failed (with processing_completed_at where applicable).
Attach orphan path: insert a new videos row for a storage_path, then remove the matching video_storage_orphans row (checksum rules apply for completed / ready).
Confirm and purge orphans must use the existing confirm → purge pipeline (no raw deletes without confirmation).

Why storage can still contain multiple files

Upload sessions can create multiple rows before finalization; only one completed source must survive per session (index + finalization trigger).
Derived and orphan objects must be visible in diagnostics even when they must not pollute the main inventory grid.

Video Instances Manager (global `videos` grid)

Route: /admin/media-operations/video-instances
Purpose: show every row in public.videos with filters (type, status, primary/duplicate flags, group, session, checksum, orphan suspects, processing_failed, abandoned) and row actions that defer deep repairs to Video diagnostics where appropriate.
Inventory vs instances: the Videos admin surface (game_videos) stays clean and product-facing. v_videos_source_inventory lists only valid source rows with finalized checksums. Instances exists because operators still need a complete view of duplicate candidates, derived outputs, abandoned session rows, and failures — without merging that noise into the default registry.
Instances vs diagnostics: Instances answers “what exists in the database across all rows?”. Diagnostics answers “what is true for this videos.id, storage folder, and related rows?” and runs safe, reviewed repairs. Use Open diagnostics from Instances (or the processing console) for per-video control.

Video processing console

Route: /admin/media-operations/video-processing
Purpose: operational dashboard backed by v_video_processing_queue: videos in uploading / processing, processing_failed rows, and a recently completed list of source videos in ready / completed. Refreshes on a short poll. Row actions include retry processing, mark failed, and open diagnostics on the source_video_id.

Cross-links

Videos toolbar links to Video instances and Processing console. When game_videos.inventory_video_id is set, the row menu includes Open video diagnostics to the linked inventory row.
Instances and Processing link rows to /admin/media-operations/video-diagnostics/{videos.id} so diagnostics remains the shared deep inspection layer.

Turfi Platform Documentation