Media Scan Plugins
How a plugin participates in the post-ingestion media scan pipeline, returns chapters or other typed outputs, and uses the shared fingerprint cache to short-circuit expensive re-analysis.
The media scan pipeline runs after ingestion and lets a plugin analyse the contents of a media file. It is the right place for any work that:
- needs the actual decoded bytes of a file (not just metadata).
- produces durable enrichment (intro/credits chapters, face detections, perceptual hashes, scene fingerprints).
- benefits from caching an expensive intermediate result so a re-scan can skip the expensive part.
The first plugin built on top of this pipeline is the bundled
aviato-intro-credits plugin, which fingerprints audio across
episodes of a TV season and writes intro/credits chapters. The
protocol and on-disk fingerprint cache are deliberately generic so
plugins for face and object detection on photos, scene change
hashes on long videos, perceptual deduplication, and similar
follow-on work can reuse exactly the same scaffolding.
When to use this versus a hook
Both subsystems extend the ingestion pipeline. They serve different needs.
| Situation | Use |
|---|---|
| You want to enrich item metadata (a title, an external id, a year) | a hook on pipeline.index.afterProcess |
| You want to read the actual decoded media (audio or pixels) and produce derived data | a media scan plugin |
| Your work takes seconds or minutes per file and you want it cached so re-scans are cheap | a media scan plugin |
| Your work runs once per file and is sub-second | a hook |
Media scans run on a separate, long-timeout, single-concurrency queue so heavy ffmpeg or model work cannot back up the rest of the ingestion pipeline.
Anatomy of a media scan job
Every media scan job runs three tasks against a single library item:
- Prepare: the server resolves the local path of the primary
media file (and, for TV, of every sibling episode in the same
season), then attaches any cached fingerprints it already has
for those files. The plugin sees a list of
MediaScanFileInputrecords. - Plugin: the server calls one of the plugin's RPC methods
(
mediaScan.scanSingleormediaScan.scanBatch) and parses the response through a Zod schema. A malformed payload fails the task cleanly without poisoning the database. - Persist: the server writes any returned fingerprints to
the
media_file_scan_fingerprintstable and any returned chapters to thechapterstable.
Each task is a separate row in pipeline_tasks, so per-step
progress and timing are visible in the admin job inspector.
Declaring the capability
Add media-scan to the capabilities array in plugin.json. The
manifest's mediaTypes field gates which library types your plugin
applies to: a plugin without a mediaTypes entry runs against
every library; a plugin that lists ["movies", "tv"] is invoked
only for movie and TV libraries.
{
"id": "aviato-intro-credits",
"name": "Intro & Credits Detection",
"version": "1.0.0",
"description": "...",
"author": "Aviato",
"license": "MIT",
"engine": "bun",
"entry": "src/index.ts",
"aviato": { "minVersion": "0.1.0" },
"capabilities": ["media-scan"],
"mediaTypes": ["movies", "tv"]
}The server discovers all plugins with the media-scan capability
at startup, indexes them, and the scheduler calls every eligible
plugin in turn whenever a media-scan job runs. Multiple plugins
coexist: the chapter detector and a future face-detection plugin
both declare media-scan and both run on the same job, each
producing fingerprints under their own pluginId namespace and
chapters under whatever roles they own.
Per-library opt-in
A media scan plugin participates in a library's pipeline only when
the per-library toggle is on. This is the same pipelinePlugins
map every pipeline plugin uses (see
Configuration). A library record stores:
{
"pipelinePlugins": {
"your-plugin-id": { "enabled": true }
}
}The default is enabled. Setting enabled: false prevents the
scheduler from queueing scans for that library. The executor
re-checks this flag at run time, so a library admin can toggle
mid-queue without races.
Wire protocol
These are the types the server and plugin agree on. The canonical
copies live in
packages/server/src/libraries/media-scan/protocol.ts (server)
and plugins/intro-credits/src/schemas.ts (reference plugin).
File input
Every scan call passes an array of files (batch) or a single file:
interface MediaScanFileInput {
fileId: string
itemId: string
/** Local filesystem path (already resolved by the server). */
path: string
/** Duration in seconds, from the ingestion probe. */
duration: number
/** Server-supplied cache, one entry per (type, algorithmVersion). */
cachedFingerprints: Array<{
type: string
algorithmVersion: string
fingerprint?: string | null
metadata?: Record<string, unknown> | null
}>
}The plugin should consume cachedFingerprints opportunistically:
each entry whose algorithmVersion matches the plugin's current
version is reusable as is. Mismatched or missing entries trigger
re-fingerprinting, and the new fingerprints must be returned to the
server in the response so they get persisted.
Response shape
interface MediaScanResponse {
/** New fingerprints to cache. The server upserts on
* (fileId, pluginId, type) and stamps `algorithmVersion`. */
fingerprints: Array<{
fileId: string
type: string
algorithmVersion: string
fingerprint?: string | null
metadata?: Record<string, unknown> | null
}>
/** Chapter rows the server should persist. */
chapters: Array<{
fileId: string
role: 'intro' | 'credits' | 'chapter' | 'scene'
startTime: number
endTime: number
title?: string | null
metadata?: Record<string, unknown> | null
}>
/** Per-file diagnostics for files the plugin chose not to write
* output for. Surfaces in logs and the admin task inspector. */
skipped: Array<{
fileId: string
reason: string
message?: string
}>
}The buckets are independent. A plugin that produces only
fingerprints (and writes no chapters) leaves chapters: []. A
plugin that writes only chapters and has no caching needs leaves
fingerprints: []. New buckets (faces, objects, scene markers)
will be added in future server versions; the Zod parser tolerates
buckets it does not yet recognise.
Handler interface
The SDK exposes a MediaScanHandlers interface; pass it under the
media-scan key when calling createPlugin:
import type { MediaScanHandlers } from '@aviato/plugin-sdk'
import { createPlugin } from '@aviato/plugin-sdk'
const ALGORITHM_VERSION = '1'
const handlers: MediaScanHandlers = {
algorithmVersion: async () => ({ algorithmVersion: ALGORITHM_VERSION }),
scanSingle: async ({ file, hints }) => {
// analyse a single file (movie, photo, isolated episode)
return analyseOne(file, hints)
},
scanBatch: async ({ files, groupKey, hints }) => {
// analyse a group of related files (typically a TV season)
return analyseGroup(files, groupKey, hints)
},
}
createPlugin({ 'media-scan': handlers })The SDK wires three RPC methods automatically:
mediaScan.algorithmVersionlets the server ask, before launching a scan, whether the plugin's current version matches the cached rows. Bumping the version invalidates all cached fingerprints for that plugin.mediaScan.scanBatchis called for TV episodes that share a season (server-side sibling resolution) and other multi-file groups.mediaScan.scanSingleis called for everything else (movies, photos, isolated TV episodes).
A plugin that supports only one mode may stub the other to throw or return an empty response. The server treats per-plugin failures as isolated: one buggy plugin does not stop the others' work from being persisted in the same job.
The fingerprint cache
Each row in media_file_scan_fingerprints belongs to one
(fileId, pluginId, type) triple. This means:
- Multiple types per file are independent rows. An
aviato-intro-creditsrow fortype: 'intro'and another fortype: 'credits'coexist without coupling. - Multiple plugins per file are independent rows. A future face
detection plugin can write
(fileId, 'aviato-face-detect', 'face-vector')rows alongside the chapter detector's rows. - The
metadatacolumn is a JSON sidecar. Use it for anything you need to remember alongside the opaque blob (timing windows, model identifiers, internal confidence numbers). Do not put large arrays there; the column is intended for kilobytes, not megabytes.
The server is uninterested in what fingerprint actually contains.
It is an opaque text field. Most plugins use base64 of a binary
buffer; some write JSON; the only constraint is that it round-trips
through the cache verbatim.
Cache lookup pattern
The server supplies cached entries; the plugin decides whether to
reuse them. The reference implementation in
plugins/intro-credits/src/seasonAnalyzer.ts follows this pattern:
function findCached (file: MediaScanFileInput, type: string) {
const entry = file.cachedFingerprints.find(c => c.type === type)
if (!entry) return null
if (entry.algorithmVersion !== ALGORITHM_VERSION) return null
return entry.fingerprint ?? null
}
const cached = findCached(file, 'intro')
const fp = cached
? base64ToUint32Array(cached)
: await fingerprintWindow(file.path, ...)If a fresh fingerprint was computed, return it in
response.fingerprints so the server can write it back.
Algorithm versioning
algorithmVersion is the plugin's contract with itself: bump it
whenever a code change makes existing cached fingerprints unusable.
Examples that warrant a bump:
- Changing the audio extraction window length or sample rate.
- Switching the matcher to a different similarity score.
- Changing the binary layout of the cached blob.
Examples that do not warrant a bump:
- Tightening minimum or maximum duration thresholds.
- Tweaking comparison sensitivity that still consumes the same fingerprint shape.
The server keys cache rows on algorithmVersion, so a bump
silently invalidates them. They are overwritten the next time the
file is scanned; no eager deletion is required.
Concurrency and timeouts
The media scan job type is registered with
globalDefault: 1, perItem: 1. Only one scan runs across the
entire server at a time. This is deliberate: ffmpeg and model
inference saturate a CPU core per process, and queueing many
fingerprint jobs in parallel produces no speedup on a typical
host while starving the rest of the ingestion pipeline.
The plugin call timeout is 15 minutes. A 24-episode season fingerprinted from cold typically completes in 5 to 10 minutes; the ceiling exists to recover from a hung ffmpeg subprocess without trapping the worker indefinitely.
If your plugin needs more than 15 minutes for a single call, split the work yourself: have the plugin process N files at a time internally and return early. The server will queue the next item.
Designing your fingerprint type
Two design rules carry over from the bundled plugin and apply to any new media scan plugin:
- One type per row, one row per type. Do not pack two
logical results (an intro fingerprint and a credits
fingerprint, or a face vector and an object embedding) into the
same row. The composite key
(fileId, pluginId, type)exists so each can be invalidated and rewritten independently. - Fingerprints are private to your plugin. The server stores them; only your plugin reads them. There is no shared interop format. If you need other code to consume your output, write it to chapters (or to a future typed bucket) instead.
Diagnostics
While a scan is running its task rows show up in pipeline_tasks:
prepare(system task, owner__system:media-scan)plugin(one row per dispatched plugin, owned by that plugin's id; if three plugins are eligible, threepluginrows appear, each timed and tracked independently)persist(system task)
The deltas returned from each task body are recorded as task metadata, so the admin job inspector can show file counts, fingerprint counts, and skipped reasons without parsing logs.
Indexer Capability
How a plugin identifies media files against an external metadata source, with rate-limit support for upstream API quotas.
Webhooks
Forward server events from Aviato to external HTTP endpoints. Webhooks share the plugin event bus, so any event a plugin can subscribe to can be delivered as a webhook.