AI for
Stories that Matter
A collaboration — February 2026
The Foundation
Knowledge Graph +
Vector Store
Turn 20 years of transcripts and metadata into a
searchable, interconnected, intelligent library
that powers everything we build next.
Transcripts
→
Clean + Enrich
→
Knowledge Graph
→
Vector Search
→
AI Products
topics
people
locations
themes
emotional tone
narrative structure
key quotes
techniques
contributors
summaries
How It Works
The Pipeline
Step 1
Parse Transcripts
Extract subtitles from the proprietary XML format. Convert frame-based timecodes to standard timestamps. Clean into chunked, searchable text.
MySQL → Python parser
Step 2
Enrich with AI
Run each transcript through a small LLM to extract topics, people, locations, themes, tone, narrative structure, key quotes, and summaries.
OpenRouter API
Step 3
Store & Connect
Write enriched data into a relational schema. Stories linked to entities via junction tables — forming the knowledge graph.
Supabase Postgres
Step 4
Vectorize
Embed transcript chunks and summaries using open-source models. Store vectors alongside source text for semantic similarity search.
HuggingFace BGE → pgvector
Step 5
Build RAG Pipeline
Query → embed → retrieve similar chunks + structured filters → generate grounded answers with citations and timestamps.
Hybrid search + OpenRouter
Step 6
Ship the Demo
Deploy a working interface where you can ask questions, explore connections, and find clips across the entire library.
HuggingFace Space + custom domain
Your Existing Stack
MySQL stays.
Supabase adds.
Nothing changes about your platform. Supabase is an AI layer alongside MySQL — not a replacement.
MySQL 5.6 — Stays As-Is
- Platform & content management
- Video metadata & library
- User authentication & accounts
- All existing functionality
Supabase — New AI Layer
- Vector search & semantic queries
- Knowledge graph relationships
- Embeddings & hybrid search
- RAG pipeline for AI answers
Architecture
How They Work Together
A one-time export seeds the AI layer. A lightweight sync function keeps it current. Zero changes to your existing platform.
Source of Truth
MySQL
- Stories & projects
- Video metadata
- Transcripts (XML)
- Users & permissions
→
Sync Function
cron job or webhook — watches for updates
AI Layer
Supabase
- Enriched metadata
- Vector embeddings
- Knowledge graph
- Search & RAG
Key: No migration. No downtime. No changes to the PHP codebase.
What It Unlocks
Possibilities
For Audiences
-
“Stories about families affected by displacement”
Semantic search across the full library. Understands themes, not just keywords.
-
“The moment the rhino keeper talks about loss”
Clip-level results with exact timestamps. Jump straight into any video.
-
“How does MediaStorm structure a character-driven doc?”
Workshop companion that teaches craft through real examples from 350+ productions.
For Clients & As a Service
-
“We want this for our own video archive”
The same AI enrichment pipeline, offered as a service to Platform licensees. This demo is the proof of concept.
-
“Find all our conservation content for the annual report”
Clients search their own enriched libraries — instant portfolio discovery for pitches, grants, and impact reports.
-
“Every video we upload should be auto-enriched”
New uploads get instant AI metadata — topics, people, locations, summaries. Hours of manual tagging, gone.
To Get Started
What I Need
Data Access
- Read-only MySQL access or database dump
- Sample transcript data (to build the XML parser)
- Data sensitivity or rights restrictions to be aware of
- Priority audience: Channel, Clients, or Training?
API Keys & Services
- OpenRouter API key (LLM enrichment & RAG)
- Supabase project (admin access, or I host and bill annually)
- Vimeo API token (for thumbnails & re-transcription if needed)
Filled = blockers · Open = helps quality
Estimated cost: OpenRouter ~$5–15 for full library enrichment (one-time) · Supabase free tier for prototype, ~$25/mo prod · HF Space free + $5/mo persistent storage
The Plan
3-Day Build
Day 1
Extract & Enrich
Parse XML transcripts into clean timestamped text. Run LLM enrichment for topics, people, locations, themes, tone, structure, quotes. Build Supabase schema.
Day 2
Vectorize & Connect
Embed all text with HuggingFace models into pgvector. Build hybrid search with structured filters. Construct knowledge graph relationships. Wire up RAG pipeline.
Day 3
Demo & Ship
Build demo interface on HuggingFace Space. Conversational search, story explorer, clip finder. Test with real queries. Deploy with custom domain.
Future bonus: Fine-tune a “MediaStorm Voice” model on the cleaned corpus — a model that writes in 20 years of award-winning editorial style.
Looking Ahead
MediaStorm Narrative MCP
An MCP connector that plugs into ChatGPT or Claude. Users get feedback and improve their stories using the MediaStorm archive as a reference.
“How have great docs structured stories about loss?”
Narrative patterns and structural analysis from real productions.
“My second act feels flat — what works here?”
Coaching on structure, pacing, tone, and transitions from the archive.
Benefits
- Exclusive to MediaStorm members & subscribers
- Innovation & visibility in the AI storytelling space
- A model Platform clients can replicate for their own archives
A Request
Case Study &
Demo Permission
I’d like to use this project as a case study and live demo when consulting with other media organizations on AI archive enrichment.
MediaStorm’s name and brand would be credited as the flagship implementation. No proprietary content would be exposed — only the enriched metadata, search capabilities, and pipeline architecture.
Let's build something
awesome.