Building a Content Pipeline in Public: Month One

Started: March 26, 2026. This is the month one retrospective. Raw numbers, honest failures, and the stuff that actually worked.

689

commits

143

modules

52K

lines of code

2,062

videos produced

uploaded to YouTube

$52

total API cost

The system runs on a $6/month VPS. Total infrastructure cost for month one: $58. That produced 2,062 candidate videos, reviewed all of them through a 7-agent quality panel, and uploaded the 73 that passed every gate.

What went right

Extraction quality

The core extraction pipeline -- the part that watches a VOD and identifies the interesting segments -- works better than expected. On roleplay content, we're hitting an 80% RP ratio on arc detection. That means 8 out of 10 extracted segments contain actual storyline content, not filler. For context, manual editors typically work at ~90% accuracy but take 4-6 hours per stream. We process a stream in under 30 minutes.

The review system

Seven AI agents reviewing every video independently before upload. This is the feature that makes the whole thing viable. Without it, we'd be uploading 2,062 videos of wildly varying quality. With it, we upload 73 that all clear a quality bar. The 29% kill rate is a feature. Every killed video is a video that would have hurt the channel.

Automation depth

26 cron jobs running 24/7. VOD detection, download, extraction, assembly, review, upload, Discord notifications, health checks, analytics collection, cleanup. The system runs without human intervention. The creator gets a Discord digest with the day's reviewed videos and clicks "publish" on the ones they like. That's the entire human touchpoint.

What went wrong

Title hallucination

The single worst failure mode. The AI reads a transcript, finds a brief mention of something dramatic, and writes a title about it. But the actual video content is about something completely different. We caught a video titled "Cop's CI just exposed the criminal character" where the actual content was the streamer talking about Walmart. Title Accuracy agent catches most of these now, but the first week was rough. We killed dozens of videos that would have been misleading clickbait.

Shorts flooding

Early config: extract up to 25 Shorts per VOD. That was way too many. A 6-hour stream might have 5-8 genuinely good Short moments, not 25. We were extracting filler -- mid-sentence clips, awkward transitions, reaction shots with no context. Dialed it back to 8-12 per VOD and quality jumped immediately. The lesson: volume is not value.

The scoping bug

Week two. A variable scoping bug in the pipeline caused the extraction module to process the same VOD segment repeatedly, generating duplicate videos. The review system caught most of them (Distinctiveness agent flagged the duplicates), but a few slipped through before we found the root cause. Three identical videos uploaded in a row. Not a great look.

Auth endpoint blocked by middleware

We built a magic link authentication system for the web dashboard. Worked perfectly in development. Deployed to production and the auth endpoint returned 403 for every request. Spent four hours debugging before realizing our security middleware was blocking the endpoint because it didn't have a session token -- which is the whole point of a login endpoint. A one-line middleware exclusion fixed it.

Biggest lesson

The system that produces content and the system that judges content must be separate.

This sounds obvious but the temptation is strong. You have an AI that writes titles -- why not have it also evaluate whether the titles are good? Because it approves its own mistakes. It wrote "Cop's CI just exposed [character name]" and when asked "is this title accurate?" it said yes, because it used the same reasoning that produced the title in the first place.

Separation of concerns isn't just a software architecture principle. It's a quality control principle. The reviewer can't be the creator. Different models, different prompts, different context windows. Our review agents never see the generation prompts. They evaluate the output cold, like a viewer would.

This is also why we use an ensemble of 7 agents instead of 1. A single reviewer develops blind spots that correlate with the generator's blind spots. Seven independent reviewers with different evaluation criteria are much harder to fool.

What's next

Long arc videos: 45+ minute character stories that follow a narrative across an entire stream. The extraction is harder but the content is significantly more valuable -- these are the videos that build a channel.
Multi-platform distribution: The pipeline currently outputs to YouTube only. TikTok, Instagram Reels, and X are next. Same content, reformatted per platform.
Podcast support: The extraction engine doesn't care if the source is a Twitch stream or a podcast recording. We're building the podcast-specific tuning now.
YouTube Analytics feedback loop: Using actual view duration, CTR, and retention data to retrain the review agents. Right now they predict quality. Soon they'll learn from real performance data.

Month one: 689 commits, $58 total cost, 2,062 videos produced, 73 uploaded. The system works. Month two is about making it better.

Building a content pipeline in public: month one