🚀 Inspiration Over the past year, Mani and Bhanu have been diving deep into one of AI's toughest challenges: video understanding. Despite progress in foundation models, reliably interpreting videos especially in real-world, dynamic settings remains unsolved. Our direction was shaped by:
- The Twelve Labs blog on Context Engineering, which argued that the next leap won’t come from bigger models, but richer, adaptive context and self-healing memory.
- A discussion at the All-In Summit 2025 where Mark Cuban and Tucker Carlson debated the future of video AI reinforcing the need for systems that are both context-aware and user-aware. These ideas led us to build mem[v] -> The context and memory layer for multimodal Agents.
🧠 What It Does mem[v] creates a persistent memory graph from video content, extracting:
- Episodic context (what happened)
- Temporal context (when and in what order)
- Semantic context (relationships and meaning) Instead of re-processing videos repeatedly, AI agents query this memory layer instantly-enabling real-time insights at 40x speed and 1% the cost. Process once. Remember everything. Query instantly. It also integrates external business documents - like brand guidelines, product specs, and campaign briefs—into a unified graph, turning raw video data into actionable business intelligence.
🛠️ How We Built It Tech Stack:
- Video Understanding: Twelve Labs (Pegasus + Marengo)
- Reasoning: OpenAI GPT-4
- Context Graph: Neon Postgres (Graph schema)
- Query Layer: Redis cache + GPT-powered logic
- Frontend: Next.js
- Auth: Clerk We built intelligent chunking, stateful context tracking, and custom prompt pipelines to overcome limitations in API context length and lack of multi-turn capabilities.
⚔️ Challenges We Faced
- No multi-turn chat support in Twelve Labs → Built our own context manager
- Rate limiting & unclear errors → Upgraded mid-hackathon to pay-as-you-go
- Limited video context length → Engineered smart chunking strategies
- No fine-tuning options → Relied on prompt engineering for domain-specific graphs
✅ Accomplishments 🔧 Technical Wins
- Built a memory layer on top of Twelve Labs
- From one-time API calls → persistent, queryable memory
- Integrated external business context
- PDFs, decks, catalogs, and performance data into a multimodal graph
- 40x speed improvement
- From 30s+ video queries → <100ms with Redis + Graph
- Graph-based video reasoning "Find moments where Product X appears after a competitor mention and aligns with brand guidelines (section 3.2)"
- First working prototype in 24 hours
- Processed 20+ ad videos
- Ingested 5+ docs
- Created 500+ graph nodes and 2K+ relationships
- Tackled $80B ad waste problem
- Reuses video memory across campaigns, teams, and platforms
- Built a “single source of truth” for video intelligence
- Unifies video content with business knowledge
- Context as infrastructure
- Democratizing memory + context for all video AI applications
🔍 Why It Matters
- We amplify, not compete with, Twelve Labs Like Pinecone powers OpenAI - we power Twelve Labs outputs
- Closed the context gap Bridge between raw video understanding and institutional knowledge
- Unlocked real-world scalability 40x faster and 100x cheaper = deployable at scale
- Built what the industry theorized First working prototype of context-engineered video memory
- Immediate revenue path Ad industry needs this now: massive ROI, immediate need
- Multimodal data lake Videos, documents, structured data—> all queryable via natural language
📚 What We Learned
- Context beats model size
- Memory compounds
- Graphs + vectors = 🔥
- Most AI failures = context failures, not model limitations
- Foundation models need infrastructure to become usable
🚧 What’s Next
- Launching SDKs: memvai on pip + npm (already registered)
- Collaborate with Twelve Labs and become a selected customer for fine-tuning.
- Onboarding 5–10 design partners in advertising
- Proving 40%+ CPM improvements in real-world campaigns
- Building privacy-preserving federated memory for cross-customer learning
- Expanding into fashion, e-learning, and media AI Long-term: mem[v] becomes the universal memory layer for multimodal AI.
🌐 The Bigger Picture Twelve Labs democratized video understanding models. We’re making video memory + business context usable. Together, we're building the infrastructure for next-gen AI agents where video understanding meets institutional memory, and insights become truly actionable.
Built With
- clerk
- javascript
- neon
- openai
- redis
- vectordb
- vercel

Log in or sign up for Devpost to join the conversation.