AI for
Stories that Matter

A collaboration — February 2026

350+

stories produced

20

years of work

4

Emmy Awards

The Foundation

Knowledge Graph +
Vector Store

Turn 20 years of transcripts and metadata into a searchable, interconnected, intelligent library that powers everything we build next.

Transcripts

→

Clean + Enrich

→

Knowledge Graph

→

Vector Search

→

AI Products

topics people locations themes emotional tone narrative structure key quotes techniques contributors summaries

How It Works

The Pipeline

Step 1

Parse Transcripts

Extract subtitles from the proprietary XML format. Convert frame-based timecodes to standard timestamps. Clean into chunked, searchable text.

MySQL → Python parser

Step 2

Enrich with AI

Run each transcript through a small LLM to extract topics, people, locations, themes, tone, narrative structure, key quotes, and summaries.

OpenRouter API

Step 3

Store & Connect

Write enriched data into a relational schema. Stories linked to entities via junction tables — forming the knowledge graph.

Supabase Postgres

Step 4

Vectorize

Embed transcript chunks and summaries using open-source models. Store vectors alongside source text for semantic similarity search.

HuggingFace BGE → pgvector

Step 5

Build RAG Pipeline

Query → embed → retrieve similar chunks + structured filters → generate grounded answers with citations and timestamps.

Hybrid search + OpenRouter

Step 6

Ship the Demo

Deploy a working interface where you can ask questions, explore connections, and find clips across the entire library.

HuggingFace Space + custom domain

Your Existing Stack

MySQL stays.
Supabase adds.

Nothing changes about your platform. Supabase is an AI layer alongside MySQL — not a replacement.

MySQL 5.6 — Stays As-Is

Platform & content management
Video metadata & library
User authentication & accounts
All existing functionality

Supabase — New AI Layer

Vector search & semantic queries
Knowledge graph relationships
Embeddings & hybrid search
RAG pipeline for AI answers

Architecture

How They Work Together

A one-time export seeds the AI layer. A lightweight sync function keeps it current. Zero changes to your existing platform.

Source of Truth

MySQL

Stories & projects
Video metadata
Transcripts (XML)
Users & permissions

→

Sync Function

cron job or webhook — watches for updates

AI Layer

Supabase

Enriched metadata
Vector embeddings
Knowledge graph
Search & RAG

Key: No migration. No downtime. No changes to the PHP codebase.

What It Unlocks

Possibilities

For Audiences

“Stories about families affected by displacement” Semantic search across the full library. Understands themes, not just keywords.
“The moment the rhino keeper talks about loss” Clip-level results with exact timestamps. Jump straight into any video.
“How does MediaStorm structure a character-driven doc?” Workshop companion that teaches craft through real examples from 350+ productions.

For Clients & As a Service

“We want this for our own video archive” The same AI enrichment pipeline, offered as a service to Platform licensees. This demo is the proof of concept.
“Find all our conservation content for the annual report” Clients search their own enriched libraries — instant portfolio discovery for pitches, grants, and impact reports.
“Every video we upload should be auto-enriched” New uploads get instant AI metadata — topics, people, locations, summaries. Hours of manual tagging, gone.

To Get Started

What I Need

Data Access

Read-only MySQL access or database dump
Sample transcript data (to build the XML parser)
Data sensitivity or rights restrictions to be aware of
Priority audience: Channel, Clients, or Training?

API Keys & Services

OpenRouter API key (LLM enrichment & RAG)
Supabase project (admin access, or I host and bill annually)
Vimeo API token (for thumbnails & re-transcription if needed)

Filled = blockers · Open = helps quality

Estimated cost: OpenRouter ~$5–15 for full library enrichment (one-time) · Supabase free tier for prototype, ~$25/mo prod · HF Space free + $5/mo persistent storage

The Plan

3-Day Build

Day 1

Extract & Enrich

Parse XML transcripts into clean timestamped text. Run LLM enrichment for topics, people, locations, themes, tone, structure, quotes. Build Supabase schema.

Day 2

Vectorize & Connect

Embed all text with HuggingFace models into pgvector. Build hybrid search with structured filters. Construct knowledge graph relationships. Wire up RAG pipeline.

Day 3

Demo & Ship

Build demo interface on HuggingFace Space. Conversational search, story explorer, clip finder. Test with real queries. Deploy with custom domain.

Future bonus: Fine-tune a “MediaStorm Voice” model on the cleaned corpus — a model that writes in 20 years of award-winning editorial style.

Looking Ahead

MediaStorm Narrative MCP

An MCP connector that plugs into ChatGPT or Claude. Users get feedback and improve their stories using the MediaStorm archive as a reference.

“How have great docs structured stories about loss?” Narrative patterns and structural analysis from real productions.

“My second act feels flat — what works here?” Coaching on structure, pacing, tone, and transitions from the archive.

Benefits

Exclusive to MediaStorm members & subscribers
Innovation & visibility in the AI storytelling space
A model Platform clients can replicate for their own archives

A Request

Case Study &
Demo Permission

I’d like to use this project as a case study and live demo when consulting with other media organizations on AI archive enrichment.

MediaStorm’s name and brand would be credited as the flagship implementation. No proprietary content would be exposed — only the enriched metadata, search capabilities, and pipeline architecture.

AI forStories that Matter