DeepRecall

🔥 Inspiration

As college students, we often sit through hours of lectures, online courses, and study sessions, only to struggle when trying to find key concepts later. Rewatching long videos or skimming through notes isn’t always an efficient option. We wanted to build a tool that could automatically transcribe, summarize, and search lecture recordings, making it easier to review material, find important topics instantly, and save time studying. Whether you're preparing for exams, catching up on missed classes, or revisiting complex topics, DeepRecall ensures you never miss a crucial moment in your learning journey. 🚀

🎥 What It Does

DeepRecall is an AI-powered tool that processes videos and provides:

Transcription of speech from videos into text while preserving timestamps.
Summarization of content, generating both short summary and detailed summary.
Semantic search that lets users find relevant moments in videos based on queries.
Highlight extraction based on keywords to identify important sections.
Fast caching of transcripts and summaries, so repeated requests don’t require reprocessing.

🛠 How We Built It

Backend (Flask-based RESTful API):

Flask & Flask-CORS for API development.
OpenAI Whisper for speech-to-text transcription.
GPT-4 for summarization.
Sentence Transformers for semantic search with cosine similarity.
Redis caching to store transcripts & summaries.
FFmpeg for extracting audio from video files.
PyTorch for handling embeddings and search.
CUDA for multithreaded processing.

Frontend (React + Vite + Tailwind CSS):

React + Vite for a fast frontend setup with optimized build times.
Tailwind CSS as a main styling tool of UI design.
ShadCN/UI for styled, customizable UI components.
Radix UI for low-level UI primitives.
Theme Switching (Dark/Light Mode) to enhance accessibility.
Toast Notifications for real-time user feedback on uploads, searches, and errors.
This project used AI tools in development

⚠️ Challenges We Ran Into

Handling Large Video Files – Processing lengthy videos efficiently required optimizing audio extraction with FFmpeg and caching with Redis.
Summarization Token Limits – GPT-4 has a token limit, so we had to split transcripts into chunks before summarizing.
Embedding Storage – Initially, embeddings weren’t cached, causing slow searches. Storing them reduced response time significantly.
Frontend-backend integration – Managing API requests, file uploads, and large data responses without lagging the UI.

🏆 Accomplishments That We're Proud Of

End-to-end AI-powered pipeline that efficiently transcribes, summarizes, and searches video content.
Semantic search implementation that understands contextual meaning rather than just matching keywords.
Optimized caching strategy with Redis to reduce processing time and enhance performance.
Accurate transcript extraction with Whisper, preserving timestamps for precise search results.
Parallel processing with PyTorch & CUDA (if GPU available) to accelerate embedding generation and search.
6 minutes to process, summarize, and display an 80 minute video, then just a few seconds on the next reprocess of the same video.

📚 What We Learned

Handling large datasets efficiently – Processing long lecture recordings and transcripts required us to break them into manageable chunks while maintaining accuracy.
Designing for speed and scalability – To make search and transcription fast, we had to optimize data storage, indexing, and retrieval.
Balancing detail with conciseness – Summarizing long transcripts required finding the right balance between capturing key points and keeping summaries brief.
The importance of caching – Storing processed data reduced redundant computations and significantly improved performance.
Enhancing search accuracy – Simply matching words wasn’t enough; understanding context and meaning was crucial for finding relevant results.
Optimizing frontend API requests – Managing states without accidentally sending thousands of requests was a surprising major challenge.

🚀 What's Next for DeepRecall?

Real-time transcription & search – Enable live video captioning and search while the video plays.
Cloud deployment – Deploy on AWS/GCP for scalability and multi-user support.
Multi-language support – Extend DeepRecall’s capabilities to transcribe and summarize in multiple languages.
User authentication & history tracking – Allow users to store and retrieve previously processed videos.