Inspiration Long-form content (YouTube videos) is valuable, but adapting it for short-form social media (TikTok, Reels, Shorts) is costly and time-consuming. We aimed to fully automate the editing and marketing tasks, enabling creators to multiply their viral presence without the manual work of detecting cuts and writing engaging captions.

What it does CLIPER is a Command Line Interface (CLI) tool that automatically transforms long YouTube videos into short, viral-ready clips.

Smart Analysis: It uses AI to transcribe the video (WhisperX) and then detects optimal cut points using semantic analysis (ClipsAI's TextTiling).

Content Generation: It automatically generates embedded subtitles and viral captions optimized using Gemini 2.0 and LangGraph.

Social Media Ready Export: It exports the final clip in a 9:16 aspect ratio with embedded subtitles, ready for immediate publishing on TikTok, Reels, or Shorts.

How we built it Video Handling: We used yt-dlp for smart video downloading and FFmpeg for video processing and 9:16 export.

Transcription (Local AI): We implemented WhisperX for highly accurate, timestamped transcription. Crucially, this runs locally, eliminating the need for an external transcription API.

Clipping Intelligence: The ClipsAI's TextTiling algorithm is responsible for detecting the best sections based on semantic shifts in the conversation.

Advanced Copy Generation: Viral caption writing uses LangGraph to orchestrate an AI workflow with Google Gemini 2.0, leveraging its reasoning capabilities.

Challenges we ran into Performance Optimization: Ensuring that the AI models (especially WhisperX) ran efficiently, which led us to specifically optimize the code for Apple Silicon (M1/M2/M3) architectures.

API and Dependency Integration: Correctly syncing dependencies like Python 3.9+, FFmpeg, and the specific requirements of WhisperX across different operating environments (macOS/Linux).

Cut Accuracy: Refining the TextTiling algorithm to ensure cuts didn't awkwardly chop a sentence or idea mid-flow.

Accomplishments that we're proud of Local Integration: We achieved a major cost saving by running the costly and vital task of video transcription 100% locally (with WhisperX), offering a zero-cost-per-use solution.

Marketing Automation: The integration of LangGraph and Gemini to generate viral captions represents marketing automation, not just editing.

What we learned We learned that the key to viral automation lies in the orchestration of specialized AI models (one for transcription, one for cutting, one for copy), and that performance depends on optimizing the local hardware stack.

What's next for CLIPER Cross-Platform Support: Expanding and validating full compatibility with Windows environments.

Visual Detection: Incorporating visual analysis (face detection, high-energy moments) to further improve cut point accuracy, complementing the semantic analysis

Built With

Share this project:

Updates