Viral Content System

I wanted to understand what makes content go viral. After seeing some posts get millions of views while similar ones barely get noticed, I set out to build a system that could analyze patterns, predict potential, and generate optimized content. The goal was to create an autonomous pipeline that could: Analyze trends across platforms Extract signals that predict virality Generate content optimized for engagement Learn and improve over time What I Learned Multi-platform data ingestion: Handling rate limits, retries, and API differences across YouTube, Instagram, TikTok, and Reddit Feature engineering: Extracting meaningful signals from text, video, audio, and engagement metrics ML model design: Building scoring models that combine multiple signals to predict viral potential System architecture: Designing a modular pipeline with observability, recovery, and data lineage Reinforcement learning: Using RL agents to learn from performance and improve generation strategies Mathematically, the viral potential score combines multiple features: V=αE+βT+γS+δU, where 𝑉 represents the overall virality score, 𝐸 measures engagement (such as likes, comments, shares, and watch time), 𝑇 reflects how closely the content aligns with current trends, 𝑆 captures audience sentiment and emotional response, and U represents velocity, or how quickly the content is gaining views and interactions. The coefficients α,β, γ, and δ are weighting factors that determine how much each component contributes to the final score, allowing the model to prioritize the signals that matter most. where each component is weighted based on historical performance data. How I Built It The system was built in phases: Ingestion layer: Started with YouTube API integration, then expanded to other platforms with unified interfaces Feature extraction: Built a dependency graph system to compute features efficiently and track lineage Scoring engine: Trained ML models on historical viral content data to identify patterns Generation system: Integrated LLMs and RL agents to create optimized content Infrastructure: Added observability (Prometheus/Grafana), multiple persistence backends, and automatic recovery The architecture follows a pipeline pattern: Ingestion → Feature Extraction → Scoring → Generation → Posting, with each component being independently testable and scalable. Challenges Faced API rate limiting: Implemented exponential backoff, request queuing, and multi-key rotation Data quality: Built validation layers and data cleaning pipelines to handle inconsistent platform data Feature computation: Designed dependency graphs to avoid redundant calculations and enable incremental updates Model accuracy: Iterated on feature selection and model architectures to improve viral potential prediction System reliability: Added health checks, automatic recovery, and comprehensive error handling for production use Multi-modal analysis: Extracting meaningful features from video, audio, and text required specialized processing pipelines The biggest challenge was balancing automation with safety - ensuring the system generates high-quality, appropriate content while maintaining account health across platforms.

Built With

aiohttp
amazon-web-services
apscheduler
async
black
celery
click
cloud
docker
fastapi
flake8
google
grafana
gunicorn
httpx
ipython
kubernetes
logging:
loguru
matplotlib
monitoring
mypy
other:
prometheus
pytest
rich
s3
sdk
sentry
task
tqdm
uvicorn
yaml

Updates

Alan Siemaszko started this project — Mar 01, 2026 04:57 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.