TL;DR: Built a pipeline that downloads viral TikToks, analyzes them frame-by-frame with Gemini, and regenerates them with custom branding using AI video models. Currently at \~$2-4 per video depending on length.
I run content for a few brands. The playbook everyone uses: find viral content, recreate it manually with your spin. Works, but brutal to scale.
Manual recreation meant:
One 30-second video = 4-6 hours of work. Didn't scale.
Built a CLI pipeline in TypeScript. Four stages:
1. Ingest Download TikTok via ScrapeCreators API. Calculate segment boundaries (I break videos into 2-4 second chunks for generation).
2. Analyze Feed full video to Gemini 2.5 Flash. Extract three things:
vibe_check — overall energy, pacing, hook structurecharacter_bible — who's in it, their role, visual stylesegment_analysis — per-segment descriptions, camera angles, motionThis is where the magic happens. Gemini's video understanding is genuinely impressive.
3. Synthesize For each segment:
Two modes: parallel (faster) or chained (pass last frame to next segment for continuity).
4. Stitch FFmpeg merges clips, adds music/narration. Output: your "twinned" video.
Per 30-second video (3 segments):
Hailuo is cheaper (\~$1.80 total) but slower. Wan is experimental.
Considering adding trend discovery (scrape trending sounds/formats) and batch processing. Right now it's one video at a time.
Happy to share more about specific parts. The Gemini prompt for vibe extraction took forever to get right — can share that if useful.