Automated Video Production¶

Video content dominates engagement metrics across every platform. But producing quality video at scale requires a pipeline that handles scripting, rendering, post-production, and distribution — without human intervention at every step.

This guide covers building an automated video production system with Hermes: tools, pipeline design, script generation strategies, rendering workflows, and platform-specific optimization.

The Production Pipeline¶

┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐
│  Topic   │───▶│  Script  │───▶│  Media   │───▶│  Post-   │───▶│  Publish │
│ Research │    │ Writing  │    │ Creation │    │ Production│    │ & Track  │
└──────────┘    └──────────┘    └──────────┘    └──────────┘    └──────────┘

Each stage can be automated, but some benefit from human-in-the-loop gates at quality checkpoints.

Stage 1: Topic Research¶

Before writing a script, you need to know what to create. The research stage analyzes what's working and identifies opportunities.

Data Sources¶

YouTube Analytics: Identify your channel's best-performing topics by watch time, retention, and click-through rate
Competitor Analysis: Track competitor channels for topic gaps and emerging trends
Search Trends: Monitor what your audience is actively searching for
Community Feedback: Analyze comments and engagement for direct audience signals

Hermes Implementation¶

def research_video_topics(channel_id: str, competitors: list[str]) -> list[dict]:
    """Research video topics using Hermes tools and knowledge base."""

    # Analyze own channel performance
    own_analytics = hermes_tools.get_youtube_analytics(
        channel_id=channel_id,
        dimensions=["video"],
        metrics=["views", "watchTime", "subscribersGained"]
    )

    # Analyze competitors
    competitor_topics = []
    for competitor in competitors:
        videos = hermes_tools.get_youtube_channel_videos(
            handle=competitor,
            max_results=10
        )
        competitor_topics.extend(videos)

    # Synthesize recommendations
    recommendations = hermes_llm.complete(
        messages=[
            {"role": "system", "content": """Analyze channel performance and competitor data.
            Identify 5 video topic opportunities. For each: topic, estimated performance,
            rationale based on data, suggested format (tutorial, review, commentary, etc.)."""},
            {"role": "user", "content": f"""
            Own performance: {json.dumps(own_analytics)}
            Competitor topics: {json.dumps(competitor_topics)}
            """}
        ],
        tier="premium"
    )

    return recommendations

Scheduling Research¶

# hermes/cron/video_topic_research.yaml
name: video_topic_research
schedule: "0 9 * * 1,4"  # Monday and Thursday
task: video.research.topics
timeout: 600
notify_on: ["failure"]

Stage 2: Script Generation¶

The script is the foundation. A well-structured script makes the rest of the pipeline predictable.

Script Structure Template¶

Every video script should follow a proven structure:

# HOOK (0:00-0:15)
[Attention-grabbing opening that states the problem or promise]

# CONTEXT (0:15-0:45)
[Why this topic matters now. Establish credibility.]

# MAIN CONTENT (0:45-5:00)
[Point 1] - [Explanation] - [Example] - [Transition]
[Point 2] - [Explanation] - [Example] - [Transition]
[Point 3] - [Explanation] - [Example] - [Transition]

# RECAP (5:00-5:30)
[Summarize the 3 key takeaways]

# CALL TO ACTION (5:30-6:00)
[What the viewer should do next — subscribe, comment, visit link]

Script Generation with Reflexion¶

Use Reflexion to ensure script quality before rendering:

def generate_video_script(topic: str, target_duration_seconds: int = 360) -> str:
    """Generate a video script with quality refinement."""

    script = hermes_llm.complete(
        messages=[
            {"role": "system", "content": f"""Write a video script following this structure:
            HOOK (15 sec): Attention-grabbing opening
            CONTEXT (30 sec): Why this matters
            MAIN (80% of time): 3 key points with examples
            RECAP (30 sec): Summarize takeaways
            CTA (15 sec): Clear call to action

            Target duration: {target_duration_seconds} seconds (~{target_duration_seconds // 60} minutes)
            Style: Conversational, engaging, no jargon without explanation.
            Include visual notes [in brackets] for the editor.

            Topic: {topic}"""}
        ],
        tier="standard"
    )

    # Reflexion: Evaluate script quality
    evaluation = hermes_llm.complete(
        messages=[
            {"role": "system", "content": """Evaluate this video script:
            1. Does the hook grab attention in the first 5 seconds?
            2. Is the structure clear (hook → context → points → recap → CTA)?
            3. Are there 3 distinct, actionable points?
            4. Is the language conversational and engaging?
            5. Would this hold viewer attention for the full duration?

            Return PASS or FAIL with specific feedback for each criterion."""},
            {"role": "user", "content": script}
        ],
        tier="lightweight"
    )

    if "FAIL" in evaluation:
        # Refine with feedback
        script = hermes_llm.complete(
            messages=[
                {"role": "system", "content": "Improve this script based on feedback. Keep the same topic and structure."},
                {"role": "user", "content": f"Script: {script}\nFeedback: {evaluation}"}
            ],
            tier="standard"
        )

    return script

Platform-Specific Scripts¶

Different platforms demand different script styles:

Platform	Duration	Style	Hook Strategy
YouTube	5-20 min	Educational, in-depth	Problem statement or bold claim
TikTok	15-60 sec	Fast-paced, trend-driven	Immediate visual hook
Instagram Reels	30-90 sec	Visual, lifestyle	Pattern interrupt
LinkedIn Video	1-3 min	Professional, insight-driven	Data point or counterintuitive take
YouTube Shorts	15-60 sec	Quick tip or reaction	Text overlay with bold statement

Stage 3: Media Creation / Avatar Rendering¶

Option A: HeyGen Avatar Videos¶

For talking-head style content without filming:

def create_avatar_video(script: str, output_path: str) -> str:
    """Generate an AI avatar video using HeyGen."""

    # Prepare the script for HeyGen
    segments = parse_script_to_segments(script)  # Split by visual cues

    # Generate avatar video
    video = heygen_client.create_video(
        avatar_id="default_professional",
        voice_id="natural_male_1",
        background="#1a1a2e",  # Dark tech background
        script=segments,
        resolution="1080p",
        output_format="mp4"
    )

    # Download and store
    video_url = video.download(output_path)
    return video_url

Best practices for avatar videos: - Keep segments under 90 seconds (avatar attention fatigue) - Insert B-roll or screen recordings between avatar segments (2-3 per video) - Use captions — 85% of social video is watched without sound - Test voice/avatar combinations for audience preference

Option B: Remotion Programmatic Video¶

For data-driven, code-generated videos:

// Remotion composition for data visualization videos
import { Composition, useCurrentFrame, useVideoConfig } from 'remotion';

const DataVideo: React.FC<{data: ChartData}> = ({data}) => {
    const frame = useCurrentFrame();
    const {fps} = useVideoConfig();
    const seconds = frame / fps;

    return (
        <div style={{background: '#0f0f23', color: 'white', padding: 40}}>
            <h1 style={{opacity: Math.min(seconds, 1)}}>{data.title}</h1>
            <AnimatedChart
                data={data.values}
                progress={Math.min(seconds / 3, 1)}
            />
            <p style={{opacity: Math.max(0, Math.min(seconds - 2, 1))}}>
                {data.insight}
            </p>
        </div>
    );
};

Remotion excels for: data visualizations, text animations, screen recordings with overlays, and any video where content is parameterized.

Option C: FFmpeg Assembly¶

For compositing existing assets:

# Assemble intro + main content + outro with transitions
ffmpeg \
  -i intro.mp4 \
  -i content.mp4 \
  -i outro.mp4 \
  -filter_complex "
    [0:v]scale=1920:1080,setdar=16/9[v0];
    [1:v]scale=1920:1080,setdar=16/9[v1];
    [2:v]scale=1920:1080,setdar=16/9[v2];
    [v0][v1]xfade=transition=fade:duration=1:offset=4[f0];
    [f0][v2]xfade=transition=fade:duration=1:offset=304[f1]
  " \
  -map "[f1]" \
  -c:v libx264 -preset medium -crf 23 \
  -c:a aac -b:a 128k \
  final_video.mp4

Stage 4: Post-Production¶

Automated Caption Generation¶

def generate_captions(video_path: str) -> str:
    """Generate and burn captions into the video."""

    # Extract audio and transcribe
    transcript = hermes_tools.transcribe_audio(video_path)

    # Generate SRT subtitle file with timing
    srt_content = format_as_srt(transcript)

    # Burn captions into video using FFmpeg
    captioned_path = video_path.replace(".mp4", "_captioned.mp4")
    subprocess.run([
        "ffmpeg", "-i", video_path,
        "-vf", f"subtitles={srt_path}:force_style='FontSize=24,PrimaryColour=&H00FFFFFF,OutlineColour=&H00000000,BorderStyle=3'",
        "-c:a", "copy",
        captioned_path
    ])

    return captioned_path

Thumbnail Generation¶

Thumbnails drive click-through rate. Automate A/B thumbnail testing:

def generate_thumbnails(video_title: str, script: str) -> list[str]:
    """Generate multiple thumbnail options for A/B testing."""

    thumbnails = []

    for style in ["bold_text", "face_reaction", "comparison", "question"]:
        thumbnail = hermes_tools.generate_image(
            prompt=f"""YouTube thumbnail in {style} style.
            Title: {video_title}
            Key visual: {script[:200]}
            Requirements: High contrast, readable text, 1280x720, compelling.
            No clickbait. Professional but eye-catching.""",
            dimensions="1280x720"
        )
        thumbnails.append(thumbnail)

    return thumbnails

Quality Checklist¶

Automated post-production checks:

[ ] Audio levels normalized (-14 LUFS for YouTube)
[ ] Captions present and synchronized
[ ] No black frames at start or end
[ ] Resolution matches target (1080p or 4K)
[ ] File size optimized (<2GB for YouTube, <500MB for social)
[ ] End screen elements positioned correctly

Stage 5: Distribution¶

Multi-Platform Publishing¶

# hermes/content/video_distribution.yaml
videos:
  - platform: youtube
    title_template: "{topic}: {hook_summary}"
    description_template: |
      {full_description}

      🔗 Links mentioned:
      {links}

      ⏱ Timestamps:
      {timestamps}

      #hashtag1 #hashtag2 #hashtag3
    tags: [tag1, tag2, tag3, tag4, tag5]
    category: "Science & Technology"
    privacy: "private"  # Start private, review, then public
    schedule: "optimal_time"  # Based on audience analytics

  - platform: tiktok
    clip_duration: 60  # Extract best 60 seconds
    caption: "{hook} — full video on YouTube 🔗"
    hashtags: [fyp, viral_topic, niche_hashtag]
    schedule: "+2h"  # 2 hours after YouTube

  - platform: instagram_reels
    clip_duration: 90
    caption: "{topic} — watch the full breakdown 👆 link in bio"
    schedule: "+4h"

Cross-Platform Content Strategy¶

Don't upload the same video everywhere. Adapt:

YouTube: Full video (5-20 min). The canonical version.
TikTok: Best 60 seconds extracted. Hook-first, fast-paced edit.
Instagram Reels: Best 90 seconds. More polished visual style.
YouTube Shorts: 60-second vertical clip with YouTube-native CTAs.
LinkedIn: 2-3 minute professional cut. Business-focused framing.
Twitter/X: 2-minute clip with thread context.

Scheduling Optimization¶

Post times matter. Analyze your audience data:

def get_optimal_post_time(platform: str, channel_id: str) -> str:
    """Determine best posting time based on historical engagement data."""

    analytics = hermes_tools.get_youtube_analytics(
        channel_id=channel_id,
        dimensions=["dayOfWeek", "hour"],
        metrics=["views", "averageViewDuration"]
    )

    # Find the day-hour combination with highest engagement
    best_slot = max(analytics, key=lambda x: x["views"] * x["averageViewDuration"])

    return f"{best_slot['dayOfWeek']} at {best_slot['hour']}:00"

Performance Tracking¶

Automate post-publish monitoring:

def track_video_performance(video_id: str, platforms: list[str]):
    """Track video performance for the first 72 hours."""

    # Schedule hourly checks for the first 24 hours
    for hour in range(1, 73):
        metrics = {}

        if "youtube" in platforms:
            yt_data = hermes_tools.get_youtube_video_analytics(video_id)
            metrics["youtube"] = {
                "views": yt_data["views"],
                "watch_time": yt_data["estimatedMinutesWatched"],
                "retention": yt_data["averageViewDuration"],
                "ctr": yt_data.get("clickThroughRate", 0)
            }

        # Alert on anomalies
        if hour == 24 and metrics["youtube"]["ctr"] < 0.04:
            hermes_tools.send_alert(
                channel="content-ops",
                message=f"⚠️ Video {video_id} CTR below 4% at 24h. Consider thumbnail change."
            )

        time.sleep(3600)  # Check hourly

Tools Summary¶

Tool	Purpose	Cost Model
HeyGen	AI avatar video generation	Per-minute credits
Remotion	Programmatic React-based video	Self-hosted (compute only)
FFmpeg	Video processing, captions, assembly	Free, self-hosted
YouTube API	Publishing, analytics	Free (with quota limits)
TikTok API	Publishing, analytics	Free (with approval)
Instagram Graph API	Reels publishing	Free
OpenAI Whisper	Audio transcription	Pay-per-minute
FAL / DALL-E	Thumbnail generation	Pay-per-image

Common Issues¶

Avatar uncanny valley: Mix avatar segments with screen recordings, B-roll, or graphics. Pure avatar for 10 minutes loses viewers.

Script too long for rendering: Always render a 30-second test segment first. Catching a bad voice or avatar choice early saves hours of rendering.

Platform rejects video: Check codec, resolution, and duration requirements before uploading. YouTube accepts almost anything; TikTok and Reels are stricter.

Caption timing drift: Transcribe audio first, then align captions to the transcript timing rather than predicting timing from script alone.

Next: Social Publishing · Engagement · Outputs & Results