🎬 Video Analysis Agent

AI-powered Video Analysis Agent with LLM integration and RAG capabilities for comprehensive video content analysis.

Features

Frame-by-Frame Analysis: Extracts and analyzes video frames using OpenAI's Vision API
Object Detection: Identifies objects, people, and scenes in video content
Emotion Recognition: Detects emotions and expressions from video frames
Text Extraction: OCR capabilities to extract text from video frames
RAG Integration: Build knowledge base from video analysis for Q&A
Sentiment Analysis: Overall sentiment analysis of video content
Interactive UI: Streamlit-based web interface for easy interaction

Requirements

Python 3.11+
OpenAI API Key
Video files (mp4, avi, mov, mkv)

Installation

Clone this repository:

git clone <your-repo-url>
cd video-analysis-agent

Install dependencies:
```
pip install -e .
```
Set up your OpenAI API key:
- Get an API key from OpenAI Platform
- Enter it in the Streamlit interface when prompted

Usage

Run the Streamlit application:

streamlit run main.py --server.address=0.0.0.0 --server.port=5000

Open your browser and navigate to the application URL
Enter your OpenAI API key in the sidebar
Upload a video file (supported formats: mp4, avi, mov, mkv)
Click "Analyze Video" to start the analysis
Explore the results in different tabs:
- Summary: Overall video summary and key topics
- Frame Analysis: Detailed frame-by-frame breakdown
- Insights: Key insights extracted from the video
- Q&A: Ask questions about the video content
- Metrics: Analysis statistics and metrics

Architecture

The Video Analysis Agent consists of several key components:

VideoAnalysisAgent: Main class handling video processing and analysis
Frame Extraction: Extracts frames at regular intervals from video
Vision Analysis: Uses OpenAI GPT-4 Vision for frame analysis
RAG System: Builds searchable knowledge base from analysis results
Streamlit UI: User-friendly web interface

API Usage

The agent can also be used programmatically:

from main import VideoAnalysisAgent

# Initialize agent
agent = VideoAnalysisAgent(api_key="your-openai-api-key")

# Analyze video
result = await agent.analyze_video("path/to/video.mp4", "video_id")

# Query video content
answer = agent.query_video_content("What objects are visible in the video?")

Configuration

Key configuration options:

Frame Interval: Adjust frame extraction interval (default: 30 seconds)
Analysis Depth: Control detail level of frame analysis
RAG Parameters: Customize knowledge base chunking and retrieval

Limitations

Analysis time depends on video length and OpenAI API response times
Large videos may require significant processing time
Requires stable internet connection for OpenAI API calls
API usage costs depend on video length and analysis depth

Contributing

Fork the repository
Create a feature branch
Make your changes
Submit a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

For issues and questions, please open a GitHub issue or contact the development team.

Built With

python

Updates

Ruptanu De started this project — Aug 04, 2025 03:33 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.