How 7 AI Agents Decide If Your Video Is Good Enough

Every video our pipeline produces goes through a review panel before it can be uploaded to YouTube. Not one reviewer -- seven. They run independently, don't see each other's scores, and vote on whether the video ships or dies. Majority rules.

This is the system that kills ~29% of everything we produce. Here's exactly how each agent works, what it catches, and why we built it this way.

1. Brand Guardian

Question it answers: Does this video represent the creator accurately?

Brand Guardian ingests the creator's channel description, recent upload history, and a brand brief that defines their content identity. Then it watches the candidate video and asks: would this creator be proud to have this on their channel?

This agent catches the most dangerous failure mode -- content that's technically fine but doesn't match who the creator is. A roleplay streamer known for elaborate crime stories doesn't want a highlight reel of them complaining about queue times. A variety streamer known for chill vibes doesn't want a clip that makes them look angry.

Brand Guardian: 1/10. "Completely misrepresentative. Creator's channel is character-driven crime RP. This video contains zero character content -- just OOC chatter about fast food preferences."

That kill saved the channel from a video that would confuse every subscriber who clicked on it.

2. First Impression

Question it answers: Would you click this? Would you stay past 30 seconds?

First Impression evaluates the video from the perspective of a new viewer seeing it in their feed. It looks at the title, the opening seconds, and the thumbnail potential. If the first 30 seconds don't establish what the video is about, if there's no hook, if the opening is dead air or loading screens -- it flags the video.

This agent is calibrated against YouTube click-through rate data. It knows that a video with a weak opening loses 40-60% of viewers in the first 15 seconds, and those viewers never come back.

3. Audio Clarity

Question it answers: Is the audio clean enough to watch?

Audio Clarity analyzes the audio track for problems that make content unwatchable: muted sections from DMCA strikes, sudden volume drops, background noise that drowns out speech, clipping from the streamer yelling, and sections where the creator is simply inaudible.

This sounds basic but it catches a lot. Livestreams are not studio recordings. The streamer's mic peaks when they get excited. Their Discord call bleeds into the audio. The game audio spikes during action sequences. A 10-minute video with 90 seconds of garbled audio in the middle will lose every viewer who hits that section.

4. Pacing

Question it answers: Does the video maintain momentum from start to finish?

Pacing maps the energy curve of the video. It detects dead air (silence longer than 3 seconds), repetitive segments (the same gameplay loop with no narrative progression), awkward cuts that break flow, and sections where the pace drops so low that a viewer would reach for the scroll button.

The target is a consistent or escalating energy curve. The worst pattern Pacing catches is the "valley" -- a video that opens strong, dies in the middle, and picks up again at the end. By the time the ending payoff arrives, most viewers are already gone.

5. Title Accuracy

Question it answers: Does the title match what actually happens in the video?

This is the agent that exists because AI writes bad titles. Not always -- most of the time the titles are solid. But when they're bad, they're catastrophically bad. The AI reads a transcript, latches onto a brief mention of something dramatic, and writes a title about it. The actual video is about something completely different.

Title: "Cop's CI just exposed the criminal character as biggest drug dealer" -- Actual content: streamer chatting about Walmart dates and sauce packets. Title Accuracy: 1/10.

That's not intentional clickbait. It's worse -- it's accidental clickbait. The AI genuinely thought the title matched the content because the topic was mentioned for 8 seconds in a 12-minute video. Title Accuracy catches this by comparing the title's claims against the full transcript and visual content.

6. Completion Predictor

Question it answers: Will viewers watch to the end?

Completion Predictor evaluates narrative arc and payoff. It asks: does this video build toward something? Is there a resolution? Or does it just... stop?

Livestream clips often have this problem. The extraction algorithm finds an interesting 10-minute segment, but the interesting part is in the first 3 minutes and the remaining 7 minutes are aftermath with no new information. Completion Predictor identifies these asymmetric videos where front-loading the good stuff means the back half is dead weight.

It also catches videos with no arc at all -- a flat sequence of events with no buildup and no payoff. These are the videos that get 30% average view duration on YouTube. Technically "content," practically unwatchable.

7. Distinctiveness

Question it answers: Is this video different enough from recent uploads?

Distinctiveness compares the candidate video against the last 10-20 uploads on the channel. It catches near-duplicates (two clips from the same stream that cover overlapping moments), thematic repetition (three videos in a row about the same in-game event), and visual sameness (five consecutive Shorts that all look identical).

This agent protects against the most common failure of automated content systems: flooding the channel with variations of the same thing. When you process a 10-hour stream, the pipeline might extract 30+ candidate clips. Without Distinctiveness, 8 of those clips might be slightly different angles on the same 20-minute incident. Your subscribers don't want to see the same event eight times.

How scoring works

Each agent scores the video independently on a 1-10 scale. The scores are collected and evaluated against two thresholds:

Individual floor: Any single agent scoring below 3 is an automatic kill. One catastrophic failure -- a completely misleading title, a totally off-brand clip -- overrides everything else.
Panel average: The average score across all 7 agents must clear a minimum threshold. A video that's mediocre across the board (all 5s and 6s) still gets killed. We want videos that at least a few agents are genuinely enthusiastic about.

The agents don't negotiate. They don't see each other's scores. There's no "well, Brand Guardian gave it a 2 but everyone else liked it" compromise. A 2 from Brand Guardian kills the video regardless of what the other six agents think. This is intentional -- the failure modes are asymmetric. A video that's well-paced but off-brand still damages the channel.

We review killed videos weekly to calibrate the system. Sometimes the agents are too aggressive -- they kill a video that would have been fine. We accept that. The cost of killing a good video is one missed upload. The cost of publishing a bad video is measurable channel damage that suppresses the next five good uploads. We tune for caution.

Why 7 agents instead of 1

A single reviewer with a single prompt could check all seven dimensions. We tried that first. The problem: a single model optimizes for internal consistency. It gives a video a 7 on title accuracy because it already decided the video was "pretty good" based on pacing. The scores correlate with each other instead of measuring independent dimensions.

Seven independent agents with separate prompts, separate contexts, and no shared state produce scores that actually disagree with each other. A video can score 9 on pacing and 2 on brand accuracy. That disagreement is information. A single reviewer would have averaged those into a 6 and let the video through.

The ensemble approach is more expensive -- 7x the API calls per video. At our scale, that's a few extra dollars per day. The channel protection it provides is worth orders of magnitude more than the cost.

How 7 AI agents decide if your video is good enough