Inspiration
When users watch a video on TikTok, they are often captivated by the content, which can spark an interest in related products. However, finding these products within the TikTok Shop can be cumbersome due to the lack of a personalized, streamlined recommendation system that bridges the gap between video content and relevant product suggestions.
What it does
- Extract key frames and videos transcripts with Phi-3 and WhisperAI
- Process visual and textual data with our AI system, utilizing Llama3 for precise keyword generation
- Display relevant products on TikTok videos to enhance discoverability
- Redirect users from keywords to product pages
How we built it
- Multi-modal LLM (Phi-3-Vision): Generate description and understand videos by extracting & analysing video frames.
- Video frames: FFmpeg to extract set amount of video frames
- Audio transcript: OpenAI Whisper Large Model
- Llama3: Chain of Thought (CoT), Multi Prompt
Challenges we ran into
- Lack of Compute Power: We ran our LLM locally on consumer grade hardware, resulting in long processing times
- Prompt Engineering: Difficulty in fine tuning the LLM to avoid hallucinations and extract relevant keywords
What we learned
- Prompt Engineering
- Event Messaging Architecture
- Video Streaming
What's next for VidScan
- Scrape Comment Sections: Obtain product keywords from user comments to enhance the accuracy of product recognition and recommendation
- User Feedback Loop: Incorporate user feedback to refine and personalize keywords recommendations further.
- Improved Prompt Interpretation: Develop prompt interpretation mechanisms to reduce misinterpretation of video content, ensuring more accurate and relevant product suggestions.
Built With
- docker
- express.js
- fastapi
- kafka
- llama3
- minio
- mongodb
- phi3vision
- react
Log in or sign up for Devpost to join the conversation.