Media Scan Plugins

How a plugin participates in the post-ingestion media scan pipeline, returns chapters or other typed outputs, and uses the shared fingerprint cache to short-circuit expensive re-analysis.

The media scan pipeline runs after ingestion and lets a plugin analyse the contents of a media file. It is the right place for any work that:

needs the actual decoded bytes of a file (not just metadata).
produces durable enrichment (intro/credits chapters, face detections, perceptual hashes, scene fingerprints).
benefits from caching an expensive intermediate result so a re-scan can skip the expensive part.

The first plugin built on top of this pipeline is the bundled aviato-intro-credits plugin, which fingerprints audio across episodes of a TV season and writes intro/credits chapters. The protocol and on-disk fingerprint cache are deliberately generic so plugins for face and object detection on photos, scene change hashes on long videos, perceptual deduplication, and similar follow-on work can reuse exactly the same scaffolding.

When to use this versus a hook

Both subsystems extend the ingestion pipeline. They serve different needs.

Situation	Use
You want to enrich item metadata (a title, an external id, a year)	a hook on `pipeline.index.afterProcess`
You want to read the actual decoded media (audio or pixels) and produce derived data	a media scan plugin
Your work takes seconds or minutes per file and you want it cached so re-scans are cheap	a media scan plugin
Your work runs once per file and is sub-second	a hook

Media scans run on a separate, long-timeout, single-concurrency queue so heavy ffmpeg or model work cannot back up the rest of the ingestion pipeline.

Anatomy of a media scan job

Every media scan job runs three tasks against a single library item:

Prepare: the server resolves the local path of the primary media file (and, for TV, of every sibling episode in the same season), then attaches any cached fingerprints it already has for those files. The plugin sees a list of MediaScanFileInput records.
Plugin: the server calls one of the plugin's RPC methods (mediaScan.scanSingle or mediaScan.scanBatch) and parses the response through a Zod schema. A malformed payload fails the task cleanly without poisoning the database.
Persist: the server writes any returned fingerprints to the media_file_scan_fingerprints table and any returned chapters to the chapters table.

Each task is a separate row in pipeline_tasks, so per-step progress and timing are visible in the admin job inspector.

Declaring the capability

Add media-scan to the capabilities array in plugin.json. The manifest's mediaTypes field gates which library types your plugin applies to: a plugin without a mediaTypes entry runs against every library; a plugin that lists ["movies", "tv"] is invoked only for movie and TV libraries.

{
  "id": "aviato-intro-credits",
  "name": "Intro & Credits Detection",
  "version": "1.0.0",
  "description": "...",
  "author": "Aviato",
  "license": "MIT",
  "engine": "bun",
  "entry": "src/index.ts",
  "aviato": { "minVersion": "0.1.0" },

  "capabilities": ["media-scan"],
  "mediaTypes": ["movies", "tv"]
}

The server discovers all plugins with the media-scan capability at startup, indexes them, and the scheduler calls every eligible plugin in turn whenever a media-scan job runs. Multiple plugins coexist: the chapter detector and a future face-detection plugin both declare media-scan and both run on the same job, each producing fingerprints under their own pluginId namespace and chapters under whatever roles they own.

Per-library opt-in

A media scan plugin participates in a library's pipeline only when the per-library toggle is on. This is the same pipelinePlugins map every pipeline plugin uses (see Configuration). A library record stores:

{
  "pipelinePlugins": {
    "your-plugin-id": { "enabled": true }
  }
}

The default is enabled. Setting enabled: false prevents the scheduler from queueing scans for that library. The executor re-checks this flag at run time, so a library admin can toggle mid-queue without races.

Wire protocol

These are the types the server and plugin agree on. The canonical copies live in packages/server/src/libraries/media-scan/protocol.ts (server) and plugins/intro-credits/src/schemas.ts (reference plugin).

File input

Every scan call passes an array of files (batch) or a single file:

interface MediaScanFileInput {
  fileId: string
  itemId: string
  /** Local filesystem path (already resolved by the server). */
  path: string
  /** Duration in seconds, from the ingestion probe. */
  duration: number
  /** Server-supplied cache, one entry per (type, algorithmVersion). */
  cachedFingerprints: Array<{
    type: string
    algorithmVersion: string
    fingerprint?: string | null
    metadata?: Record<string, unknown> | null
  }>
}

The plugin should consume cachedFingerprints opportunistically: each entry whose algorithmVersion matches the plugin's current version is reusable as is. Mismatched or missing entries trigger re-fingerprinting, and the new fingerprints must be returned to the server in the response so they get persisted.

Response shape

interface MediaScanResponse {
  /** New fingerprints to cache. The server upserts on
   *  (fileId, pluginId, type) and stamps `algorithmVersion`. */
  fingerprints: Array<{
    fileId: string
    type: string
    algorithmVersion: string
    fingerprint?: string | null
    metadata?: Record<string, unknown> | null
  }>

  /** Chapter rows the server should persist. */
  chapters: Array<{
    fileId: string
    role: 'intro' | 'credits' | 'chapter' | 'scene'
    startTime: number
    endTime: number
    title?: string | null
    metadata?: Record<string, unknown> | null
  }>

  /** Per-file diagnostics for files the plugin chose not to write
   *  output for. Surfaces in logs and the admin task inspector. */
  skipped: Array<{
    fileId: string
    reason: string
    message?: string
  }>
}

The buckets are independent. A plugin that produces only fingerprints (and writes no chapters) leaves chapters: []. A plugin that writes only chapters and has no caching needs leaves fingerprints: []. New buckets (faces, objects, scene markers) will be added in future server versions; the Zod parser tolerates buckets it does not yet recognise.

Handler interface

The SDK exposes a MediaScanHandlers interface; pass it under the media-scan key when calling createPlugin:

import type { MediaScanHandlers } from '@aviato/plugin-sdk'
import { createPlugin } from '@aviato/plugin-sdk'

const ALGORITHM_VERSION = '1'

const handlers: MediaScanHandlers = {
  algorithmVersion: async () => ({ algorithmVersion: ALGORITHM_VERSION }),

  scanSingle: async ({ file, hints }) => {
    // analyse a single file (movie, photo, isolated episode)
    return analyseOne(file, hints)
  },

  scanBatch: async ({ files, groupKey, hints }) => {
    // analyse a group of related files (typically a TV season)
    return analyseGroup(files, groupKey, hints)
  },
}

createPlugin({ 'media-scan': handlers })

The SDK wires three RPC methods automatically:

mediaScan.algorithmVersion lets the server ask, before launching a scan, whether the plugin's current version matches the cached rows. Bumping the version invalidates all cached fingerprints for that plugin.
mediaScan.scanBatch is called for TV episodes that share a season (server-side sibling resolution) and other multi-file groups.
mediaScan.scanSingle is called for everything else (movies, photos, isolated TV episodes).

A plugin that supports only one mode may stub the other to throw or return an empty response. The server treats per-plugin failures as isolated: one buggy plugin does not stop the others' work from being persisted in the same job.

The fingerprint cache

Each row in media_file_scan_fingerprints belongs to one (fileId, pluginId, type) triple. This means:

Multiple types per file are independent rows. An aviato-intro-credits row for type: 'intro' and another for type: 'credits' coexist without coupling.
Multiple plugins per file are independent rows. A future face detection plugin can write (fileId, 'aviato-face-detect', 'face-vector') rows alongside the chapter detector's rows.
The metadata column is a JSON sidecar. Use it for anything you need to remember alongside the opaque blob (timing windows, model identifiers, internal confidence numbers). Do not put large arrays there; the column is intended for kilobytes, not megabytes.

The server is uninterested in what fingerprint actually contains. It is an opaque text field. Most plugins use base64 of a binary buffer; some write JSON; the only constraint is that it round-trips through the cache verbatim.

Cache lookup pattern

The server supplies cached entries; the plugin decides whether to reuse them. The reference implementation in plugins/intro-credits/src/seasonAnalyzer.ts follows this pattern:

function findCached (file: MediaScanFileInput, type: string) {
  const entry = file.cachedFingerprints.find(c => c.type === type)
  if (!entry) return null
  if (entry.algorithmVersion !== ALGORITHM_VERSION) return null
  return entry.fingerprint ?? null
}

const cached = findCached(file, 'intro')
const fp = cached
  ? base64ToUint32Array(cached)
  : await fingerprintWindow(file.path, ...)

If a fresh fingerprint was computed, return it in response.fingerprints so the server can write it back.

Algorithm versioning

algorithmVersion is the plugin's contract with itself: bump it whenever a code change makes existing cached fingerprints unusable. Examples that warrant a bump:

Changing the audio extraction window length or sample rate.
Switching the matcher to a different similarity score.
Changing the binary layout of the cached blob.

Examples that do not warrant a bump:

Tightening minimum or maximum duration thresholds.
Tweaking comparison sensitivity that still consumes the same fingerprint shape.

The server keys cache rows on algorithmVersion, so a bump silently invalidates them. They are overwritten the next time the file is scanned; no eager deletion is required.

Concurrency and timeouts

The media scan job type is registered with globalDefault: 1, perItem: 1. Only one scan runs across the entire server at a time. This is deliberate: ffmpeg and model inference saturate a CPU core per process, and queueing many fingerprint jobs in parallel produces no speedup on a typical host while starving the rest of the ingestion pipeline.

The plugin call timeout is 15 minutes. A 24-episode season fingerprinted from cold typically completes in 5 to 10 minutes; the ceiling exists to recover from a hung ffmpeg subprocess without trapping the worker indefinitely.

If your plugin needs more than 15 minutes for a single call, split the work yourself: have the plugin process N files at a time internally and return early. The server will queue the next item.

Designing your fingerprint type

Two design rules carry over from the bundled plugin and apply to any new media scan plugin:

One type per row, one row per type. Do not pack two logical results (an intro fingerprint and a credits fingerprint, or a face vector and an object embedding) into the same row. The composite key (fileId, pluginId, type) exists so each can be invalidated and rewritten independently.
Fingerprints are private to your plugin. The server stores them; only your plugin reads them. There is no shared interop format. If you need other code to consume your output, write it to chapters (or to a future typed bucket) instead.

Diagnostics

While a scan is running its task rows show up in pipeline_tasks:

prepare (system task, owner __system:media-scan)
plugin (one row per dispatched plugin, owned by that plugin's id; if three plugins are eligible, three plugin rows appear, each timed and tracked independently)
persist (system task)

The deltas returned from each task body are recorded as task metadata, so the admin job inspector can show file counts, fingerprint counts, and skipped reasons without parsing logs.

Media Scan Plugins

On this page