Turfi Platform Documentation
Official Turfi documentation portal for users, admins, and developers.
Documentation Search
Search only within Turfi documentation pages.
Video Inventory & Media Lifecycle
Governed physical video inventory, storage paths, lifecycle states, database invariants, duplicate handling, orphan detection, upload API contracts, and operational jobs.
IMPLEMENTED
This document is internal platform documentation for engineers. It defines the governed physical video inventory (public.videos), its relationship to Supabase Storage, and the non-negotiable rules enforced in the database and at the application API boundary.
If implementation work contradicts this document, the implementation is wrong until the document is formally revised.
Shared status legend: [docs/_shared/status-legend.md](./_shared/status-legend.md)
A. System overview
Why this system exists
Turfi stores large binary video files in object storage and stores authoritative metadata and lineage in PostgreSQL. Without a strict inventory layer, the platform experiences:
- Duplicate files: the same bytes uploaded multiple times under different paths, with no canonical identity.
- Partial uploads: database rows claiming a file exists when the object is missing, truncated, or never finalized.
- Missing relationships: derived outputs (moments, processed renditions) detached from the source they were generated from.
- Storage versus database drift: objects present in the bucket with no row, or rows pointing at paths that do not exist.
The video inventory system exists to eliminate drift by making the database the system of record for inventory metadata (which rows exist, paths, lifecycle)—distinct from the platform-wide consensus model for match facts in Platform Philosophy and Scope. For physical files, the inventory layer defines:
- what files are allowed to exist,
- where they live,
- what lifecycle state they are in,
- how they relate to each other.
Source of truth (inventory layer)
- The database (
public.videosand related registry rows) is the source of truth for inventory identity, lifecycle, lineage, and the canonicalstorage_pathstring. - Object storage is a dumb byte layer. Storage holds opaque files at paths that the application writes only through governed flows. Storage never “decides” lifecycle, duplicates, or relationships.
Problems this system solves (assertively)
- Duplicate files: duplicate candidates are detected using metadata and checksum (SHA-256 hex) as the canonical duplicate key for completed and ready inventory. Rows may reference a prior row via
duplicate_of_video_idwhen the operator accepts duplicate linkage. - Partial uploads: source rows begin in
uploading. They must not transition tocompletedwithout a valid checksum and finalized metadata. Session finalization abandons competing rows when one source completes (see invariants). - Missing relationships: derived rows must set
source_video_idto a valid source row. Database triggers enforce group consistency with the parent source. - Drift: orphan detection records storage paths that exist without a matching
videos.storage_path. Purge operations are explicit and never silent.
B. Core concepts
1. Source video
- A source row represents the original upload for a governed path family under
/videos/{source_video_id}/…. - The
video_typelookup value must besource. source_video_idmust be NULL on a source row.- For a given
upload_session_id, at most one source row must reachcompleted(enforced by a partial unique index on upload session for completed sources). - The source row’s
idmust match the folder UUID instorage_pathfor governed layout (see storage section).
2. Derived videos
- Derived rows include moments and processed renditions (and any future derived kinds modeled under the same
video_typecontract). source_video_idmust reference the source row that anchors the/videos/{source_video_id}/prefix.video_group_idmust equal the parent source’svideo_group_id(enforced by triggers after denormalization).- Derived rows must not claim
is_primary = true.
3. Video group
video_group_idties all related inventory rows (source, duplicates in the same logical group, derived outputs) to one group root.- Exactly one row in a group must have
is_primary = true(partial unique index onvideo_group_idwhereis_primary). - The primary row must be a source row (
video_type = source) andduplicate_of_video_idmust be NULL (enforced intrg_videos_99_strict_validate).
4. Upload session
upload_session_idgroups all rows created during one logical upload attempt chain (for example: duplicate candidates created alongside the primary attempt).- Only one valid completed source must exist per session: the partial unique index
videos_one_completed_source_per_upload_sessionenforces this at the database level. - When a source transitions to
completed, thetrg_videos_after_complete_finalize_sessiontrigger must set all other rows sharing the sameupload_session_idtoabandoned(or equivalent terminal non-success states per migration) and must clearis_primaryon those abandoned rows. This prevents ghost duplicate sources inside the same session.
5. Video lifecycle states (video_status lookup)
These keys are defined in lookup_values under lookup type video_status. Application code must treat unknown statuses as invalid.
| Status | Meaning |
|---|---|
uploading | Row exists; bytes are expected or in flight. Governed source uploads must be uploading before completed. |
completed | Source upload is finalized in inventory terms: file is committed for that row’s storage_path, with required checksum for this status. |
processing | Derived (or pipeline) work is active. This status is not terminal. Workers must transition rows out of processing to ready or processing_failed. |
ready | Derived output is usable. Checksum is required for ready (same rule as completed). |
failed | Upload or validation failed for a row (terminal failure distinct from processing failure semantics; exact usage is workflow-specific but must remain terminal unless a new row supersedes it). |
abandoned | Superseded by session finalization or operator cleanup; must not be treated as active inventory. |
processing_failed | Processing job failed or timed out. processing_completed_at must be set when entering this state. Rows must not remain stuck in processing indefinitely. |
Rule: completed and ready always require a non-empty checksum (database trigger enforcement).
C. Database model (public.videos)
The authoritative schema evolves through additive SQL migrations. This section documents the intent of key fields; engineers must read the latest migrations for exact constraints.
Identity and classification
id(uuid): Primary key. For source rows, this UUID must appear as the directory name in governedstorage_path.video_type_id: FK tolookup_valuesfor lookup typevideo_type. Must besource,moment,processed, or future allowed keys only as defined by migrations.status_id: FK tolookup_valuesfor lookup typevideo_status. Must always reflect the true lifecycle state.
Lineage and grouping
source_video_id: NULL for sources; non-NULL for derived rows, pointing at the source row.video_group_id: Logical group. Must match the group of the parent source and duplicate targets (enforced by triggers).is_primary: At most onetruepervideo_group_id. Must betrueonly for the canonical source in the group, never for duplicates linked viaduplicate_of_video_id.duplicate_of_video_id: Optional link to an earlier inventory row when this row is a duplicate of existing media. Must be NULL whenis_primary = true.
Upload session and replacement
upload_session_id: Correlates rows from one upload attempt. Must obey one completed source per session (index).replaced_by_video_id: Optional chain when a derived output is superseded by a newer inventory row. Use only as governed by product workflows; never as a substitute forsource_video_idlineage rules.
Physical file facts
storage_path: Canonical storage key for the object (for examplevideos/{uuid}/source.mp4). Must be unique across all rows (unique index). Must match governed patterns when the strict check constraint is active in the environment.checksum: SHA-256 digest, lowercase hex, 64 characters. Required wheneverstatusiscompletedorready. This is the canonical duplicate-detection key for finalized files.
Processing telemetry
processing_job_id: Optional reference to a processing job record (when modeled). Must be treated as diagnostic; lifecycle must still be reflected instatus_idand timestamps.processing_started_at,processing_completed_at: Must be set by the workflows that own processing transitions.processing_completed_atmust be set when enteringready,processing_failed, or any terminal processing outcome that stops work.
Versioning
derivation_generation: Integer version for regeneration of derived outputs within a group. Must increment only according to defined workflows.
Optional future-facing column
camera_angle_id: Optional FK intolookup_valuesfor multi-angle capture. Not enforced by business rules yet; exists so schema and admin surfaces can grow without another breaking migration.
Supporting tables and views
video_storage_orphans: Registry ofbucket+storage_pathpairs found in storage scans without a matchingvideos.storage_path. Must flow through detected → confirmed → purged operations; never delete storage based only on a scan without the governed purge path.v_videos_source_inventory: Read model of valid source inventory for admin listing: must include only sources whose status is in the completed/ready family and whose checksum is present (definition in migration).v_video_processing_queue: Operational view for monitoring sources inuploading,processing, orprocessing_failed, including derived moment counts and timing fields as defined in migration.
D. Invariants (critical)
These rules always hold in a correct deployment. They are enforced by database constraints and triggers and by API guards where applicable. Violations are bugs, not edge cases.
- Every governed storage file maps to exactly one
videosrow for paths under the governedvideos/layout used by inventory. Never write governed paths without a matching row andstorage_pathequality. - Every
videosrow maps to exactly one storage object for that row’sstorage_pathin normal operation. Drift is handled only via orphan detection and explicit purge. - Only one primary video per group:
is_primary = trueappears at most once pervideo_group_id. - Primary video must be a source:
is_primary = trueimpliesvideo_type = sourceandduplicate_of_video_id IS NULL(trigger). - Derived videos must reference a source: non-source types must have
source_video_idset to a source row (enforced bytrg_videos_enforce_type_and_parent). - Only one completed source per upload session (partial unique index).
- No row remains in
processingforever: workers must call failure or success transitions; useprocessing_failedwithprocessing_completed_aton failure or timeout. completedandreadyrequirechecksum: enforced intrg_videos_99_strict_validate.- Storage paths must follow the governed pattern when the strict regex check constraint is enabled in the environment (skipped only during legacy backfill; must be enabled after cleanup).
- Group consistency: if
source_video_idis set,video_group_idmust match the parent source’svideo_group_id. Ifduplicate_of_video_idis set,video_group_idmust match the duplicate target’svideo_group_id(trigger). - Session finalization: when a source reaches
completed, all other rows with the sameupload_session_idmust be terminalized as abandoned (per migration) and must not remain competing sources.
E. Storage structure
Governed layout (inventory videos)
These paths are the only governed shapes for rows whose storage_path lives under the videos/ prefix:
/videos/{source_video_id}/source.mp4
/videos/{source_video_id}/moments/{video_id}.mp4
/videos/{source_video_id}/processed/{video_id}.mp4
Notes:
{source_video_id}must equal the source row UUID for that folder.{video_id}inmomentsandprocessedmust equal the derived row’sid.- The
storage_pathstored invideosdoes not include a leading slash in application usage in all call sites; normalize comparisons accordingly. The logical layout remains as above.
Why this layout exists
- Stable routing: all outputs for a match or capture bundle live under one UUID prefix.
- Operational clarity: administrators, jobs, and support tooling list by
source_video_id. - Authorization: API guards parse paths and must reject writes that do not match a row’s
id,storage_path, and allowed status.
What breaks when violated
- API upload authorization fails:
assertAuthorizedVideoBucketUploadrejects the operation. - Database inserts/updates fail: check constraints (when enabled) and triggers reject inconsistent
storage_pathvalues. - Monitoring views lie: paths that do not match conventions will not align with admin tooling expectations.
Highlights and non-inventory paths
Paths under highlights/ follow separate product rules. The upload API allows highlights/ writes on the video bucket without a videos row check. Do not treat highlights/ objects as governed videos inventory unless explicitly linked by product logic.
F. Upload flow (source)
This is the required sequence for governed source uploads through the application:
- Create the
videosrow withvideo_type = source,status = uploading,storage_path = videos/{new_uuid}/source.mp4,upload_session_idset,is_primaryset according to duplicate policy, andduplicate_of_video_idset only when explicitly creating a duplicate candidate row. - Request a signed upload URL via
POST /api/media/upload-urlwithbucketandpath. The route must passassertAuthorizedVideoBucketUpload: the row must exist,storage_pathmust match exactly,statusmust beuploadingfor source paths. - Upload bytes using the signed resumable flow (client uses
tuswith the returned token). Alternative:POST /api/media/uploadwith the samebucketandpathquery parameters; the same authorization gate applies. - Compute SHA-256 of the local file bytes on the client before or after upload (client must compute the same hash the server will persist).
- Mark
completedonly throughcompleteSourceVideoUpload, passingchecksumas 64-char lowercase hex. The server rejects invalid checksum format. The database rejects missing checksum oncompleted. - Session finalization runs in the database: competing rows in the same
upload_session_idare abandoned automatically when the source hitscompleted.
Rejection conditions (non-exhaustive)
- Missing or mismatched
storage_pathfor the UUID in the path. statusnotuploadingfor source upload.- Invalid checksum format or missing checksum at
completed. - Attempted write to
videos/paths without passing server API authorization (direct client writes that bypass the API must be blocked by storage policies in production; the application must route through the API for governed inventory).
G. Processing flow
When processing starts
- Derived rows (for example moments) are created with
status = processing(oruploadingfirst if the workflow stages bytes separately—must still end inreadyorprocessing_failedwith checksum rules satisfied forready). processing_started_atmust be set when enteringprocessingif the workflow uses that column.
Worker responsibilities
- Workers must poll or subscribe to job sources as designed, update
processing_job_idif used, and must transition status: - to
readywith checksum set (same SHA-256 requirement) andprocessing_completed_at, or - to
processing_failedwithprocessing_completed_aton failure or timeout. - Workers must call
markVideoProcessingFailed(authenticated) ormarkVideoProcessingFailedServiceRole(no user session) instead of leaving rows inprocessing.
Moment creation
createMomentDerivedVideoinserts a derived row withsource_video_id,storage_path, and processing state as implemented. Must preservevideo_group_idconsistency (trigger-maintained).
Terminal outcomes
processing → ready: requires checksum in the same update path asready.processing → processing_failed: must setprocessing_completed_at.
H. Duplicate handling
Session behavior
- Multiple rows may exist in one
upload_session_idduring duplicate candidate creation. Only one source must reachcompletedper session (index + finalization trigger).
Detection logic
- Checksum on
completedandreadyrows is the canonical duplicate key for identical bytes. - Additional heuristics (filename, size) may exist for candidate surfacing; must not replace checksum as the authoritative identity test for finalized media.
duplicate_of_video_id
- When populated, indicates this row is a duplicate of an earlier inventory record. Must respect group rules:
is_primarymust be false for duplicates as enforced by validation triggers.
I. Orphan management
What an orphan is
- An orphan is an object in storage under the scanned prefix (for example
videos/) whose path is not exactly equal to anyvideos.storage_pathvalue.
Detection job
- Service role scanning lists storage recursively and compares to the database set.
GET /api/cron/video-storage-orphanswithAuthorization: Bearer ${CRON_SECRET}persists orphans intovideo_storage_orphanswithstatus = detectedusingpersistDetectedVideoStorageOrphansServiceRole.
Confirm / purge workflow
- Operators review
video_storage_orphans, confirm rows intended for deletion, then purge storage through the governed API that requires confirmed state before removal. - Safety rule: Never delete storage objects based solely on a local guess; must use the confirm/purge pipeline so audit fields are populated.
J. API contracts
Upload endpoints (Next.js routes)
POST /api/media/upload-url: JSON body{ bucket, path }. Must be authenticated (session cookie). For the configured video bucket,videos/paths must passassertAuthorizedVideoBucketUpload.highlights/paths must pass without avideosrow check.POST /api/media/upload: query paramsbucket,path; body is raw bytes. Same authorization asupload-url.
Status and inventory functions (server modules)
startSourceVideoUpload: createsuploadingsource row and governedstorage_path.completeSourceVideoUpload: requireschecksum(SHA-256 hex); setscompleted.failSourceVideoUpload: fails upload attempt.setVideoStatusReady: setsready, requireschecksumin the update.markVideoProcessingFailed/markVideoProcessingFailedServiceRole: setsprocessing_failedandprocessing_completed_at.createMomentDerivedVideo: creates derived inventory rows tied to a completed or ready source.
Failure handling
- API routes must return explicit HTTP errors when authorization fails (400/403/500 as implemented). Never swallow authorization failures.
Checksum requirements
- Client and server must use 64-character lowercase hex SHA-256.
- Invalid format is rejected before database update in
assertValidInventoryChecksumHex.
K. Operational jobs
Cleanup job
GET /api/cron/video-inventory-cleanup:Authorization: Bearer ${CRON_SECRET}whenCRON_SECRETis set. Invokesfn_video_cleanup_stale_inventoryviarunVideoInventoryCleanupRpc. Must run on a schedule in each environment that requires stale inventory hygiene.
Orphan detection job
GET /api/cron/video-storage-orphans: sameCRON_SECRETscheme. Persists orphan paths tovideo_storage_orphans.
Scheduling expectations
- Cleanup: hourly or daily depending on volume; must run in production.
- Orphans: daily or weekly; must run in production so drift is detected without manual action.
L. Common failure scenarios
Partial upload
- Row remains
uploadinguntilfailSourceVideoUpload, operator cleanup RPC, or timeout policies delete stale rows. Must not markcompletedwithout a full file and checksum.
Duplicate upload
- Session creates multiple candidates; one completes; others become
abandonedvia finalization. Must rely on checksum for post-hoc duplicate identification.
Failed processing
- Transition to
processing_failedwithprocessing_completed_at. Never leaveprocessingsilent.
Orphan files
- Appear in
video_storage_orphans. Must be confirmed, then purged through the governed purge path.
M. Future extensions
These items are planned or partially modeled; must not bypass current invariants when implemented:
- Multi-angle video:
camera_angle_idexists formain/sideline/tactical/goalielookups; enforcement is intentionally future-facing. - AI detection pipelines: must create inventory rows before writing outputs; must attach lineage to
source_video_id. - Analytics: must read from
videosand views, never infer ownership from storage alone. - Monetization: must treat governed
videosrows as the billing/entitlement anchor for physical media, not raw URLs alone.
N. Video diagnostics (admin operational layer)
Why main inventory stays narrow
- The main source inventory (
v_videos_source_inventory) must list only source rows withcompletedorready, a non-empty checksum, and the view definition in migrations. Derived rows, abandoned rows, uploading rows, and duplicate-linked rows must not appear there. - Storage can still hold additional objects under the same governed prefix: partial uploads, superseded attempts, orphan bytes, or derived outputs not yet linked the way operators expect.
Product principle
- Governed inventory remains intentionally clean for operators and downstream features.
- Video diagnostics (
/admin/media-operations/video-diagnostics) is the admin-only bridge between: - inventory truth in PostgreSQL, and
- raw storage reality in the bucket.
What diagnostics exposes
- Search across
videosby id,source_video_id,upload_session_id,storage_path,original_filename, status, type, and filters for processing failure, abandonment, duplicate linkage, and orphan presence under the governed folder. - Detail for one row:
- Canonical summary and a plain-language explanation of main inventory visibility (same rules as
explainMainInventoryVisibility/v_videos_source_inventory). - Related DB rows collected by
video_group_id,upload_session_id,source_video_id,duplicate_of_video_id, andreplaced_by_video_id(strongest key: group). - Storage artifacts under
videos/{source_folder_id}/via service-role recursive listing, tagged as mapped, related, orphan, suspected_duplicate, or unknown_artifact by comparingvideos.storage_path,video_storage_orphans, and heuristics. - Orphan registry rows matching the folder prefix.
- Duplicate checksum groups among related rows.
Admin actions (server-guarded)
Diagnostics actions must respect database triggers and invariants. They must not bypass assertAuthorizedVideoBucketUpload for normal uploads; they exist to repair or classify inventory after human review.
- Mark as duplicate of a canonical source (
duplicate_of_video_id,is_primary = false, group inherited from target). - Set group primary on a source row with
duplicate_of_video_idnull (clears other primaries in the group, then sets target). - Reclassify
video_typeto source / moment / processed with validsource_video_idfor derived types (database rejects invalid parentage). - Mark terminal status: failed, abandoned, processing_failed (with
processing_completed_atwhere applicable). - Attach orphan path: insert a new
videosrow for astorage_path, then remove the matchingvideo_storage_orphansrow (checksum rules apply for completed / ready). - Confirm and purge orphans must use the existing confirm → purge pipeline (no raw deletes without confirmation).
Why storage can still contain multiple files
- Upload sessions can create multiple rows before finalization; only one completed source must survive per session (index + finalization trigger).
- Derived and orphan objects must be visible in diagnostics even when they must not pollute the main inventory grid.
Video Instances Manager (global videos grid)
- Route:
/admin/media-operations/video-instances - Purpose: show every row in
public.videoswith filters (type, status, primary/duplicate flags, group, session, checksum, orphan suspects,processing_failed,abandoned) and row actions that defer deep repairs to Video diagnostics where appropriate. - Inventory vs instances: the Videos admin surface (
game_videos) stays clean and product-facing.v_videos_source_inventorylists only valid source rows with finalized checksums. Instances exists because operators still need a complete view of duplicate candidates, derived outputs, abandoned session rows, and failures — without merging that noise into the default registry. - Instances vs diagnostics: Instances answers “what exists in the database across all rows?”. Diagnostics answers “what is true for this
videos.id, storage folder, and related rows?” and runs safe, reviewed repairs. Use Open diagnostics from Instances (or the processing console) for per-video control.
Video processing console
- Route:
/admin/media-operations/video-processing - Purpose: operational dashboard backed by
v_video_processing_queue: videos in uploading / processing,processing_failedrows, and a recently completed list of source videos inready/completed. Refreshes on a short poll. Row actions include retry processing, mark failed, and open diagnostics on thesource_video_id.
Cross-links
- Videos toolbar links to Video instances and Processing console. When
game_videos.inventory_video_idis set, the row menu includes Open video diagnostics to the linked inventory row. - Instances and Processing link rows to
/admin/media-operations/video-diagnostics/{videos.id}so diagnostics remains the shared deep inspection layer.
Related documentation
- [
docs/media-match-intelligence.md](./media-match-intelligence.md) - [
docs/video-ingestion-stats-mapping-system.md](./video-ingestion-stats-mapping-system.md) - [
docs/admin-data-operations.md](./admin-data-operations.md)
Canonical UI path: /support/docs/video-inventory-media-lifecycle
Admin UI: /admin/media-operations/video-diagnostics