Started: March 26, 2026. This is the month one retrospective. Raw numbers, honest failures, and the stuff that actually worked.
The system runs on a $6/month VPS. Total infrastructure cost for month one: $58. That produced 2,062 candidate videos, reviewed all of them through a 7-agent quality panel, and uploaded the 73 that passed every gate.
The core extraction pipeline -- the part that watches a VOD and identifies the interesting segments -- works better than expected. On roleplay content, we're hitting an 80% RP ratio on arc detection. That means 8 out of 10 extracted segments contain actual storyline content, not filler. For context, manual editors typically work at ~90% accuracy but take 4-6 hours per stream. We process a stream in under 30 minutes.
Seven AI agents reviewing every video independently before upload. This is the feature that makes the whole thing viable. Without it, we'd be uploading 2,062 videos of wildly varying quality. With it, we upload 73 that all clear a quality bar. The 29% kill rate is a feature. Every killed video is a video that would have hurt the channel.
26 cron jobs running 24/7. VOD detection, download, extraction, assembly, review, upload, Discord notifications, health checks, analytics collection, cleanup. The system runs without human intervention. The creator gets a Discord digest with the day's reviewed videos and clicks "publish" on the ones they like. That's the entire human touchpoint.
The single worst failure mode. The AI reads a transcript, finds a brief mention of something dramatic, and writes a title about it. But the actual video content is about something completely different. We caught a video titled "Cop's CI just exposed the criminal character" where the actual content was the streamer talking about Walmart. Title Accuracy agent catches most of these now, but the first week was rough. We killed dozens of videos that would have been misleading clickbait.
Early config: extract up to 25 Shorts per VOD. That was way too many. A 6-hour stream might have 5-8 genuinely good Short moments, not 25. We were extracting filler -- mid-sentence clips, awkward transitions, reaction shots with no context. Dialed it back to 8-12 per VOD and quality jumped immediately. The lesson: volume is not value.
Week two. A variable scoping bug in the pipeline caused the extraction module to process the same VOD segment repeatedly, generating duplicate videos. The review system caught most of them (Distinctiveness agent flagged the duplicates), but a few slipped through before we found the root cause. Three identical videos uploaded in a row. Not a great look.
We built a magic link authentication system for the web dashboard. Worked perfectly in development. Deployed to production and the auth endpoint returned 403 for every request. Spent four hours debugging before realizing our security middleware was blocking the endpoint because it didn't have a session token -- which is the whole point of a login endpoint. A one-line middleware exclusion fixed it.
The system that produces content and the system that judges content must be separate.
This sounds obvious but the temptation is strong. You have an AI that writes titles -- why not have it also evaluate whether the titles are good? Because it approves its own mistakes. It wrote "Cop's CI just exposed [character name]" and when asked "is this title accurate?" it said yes, because it used the same reasoning that produced the title in the first place.
Separation of concerns isn't just a software architecture principle. It's a quality control principle. The reviewer can't be the creator. Different models, different prompts, different context windows. Our review agents never see the generation prompts. They evaluate the output cold, like a viewer would.
Month one: 689 commits, $58 total cost, 2,062 videos produced, 73 uploaded. The system works. Month two is about making it better.
An autonomous content pipeline, built in public. Follow along or try it yourself.
Try the free demo