Inspiration

I participate in Code & Conquer hackathon, for which I have to create something creative. I already working in computer vision and have knowledge of classification and detection models. So I that's why I integrate Yolo model with GPT-4o.

What it does

It actually take image and did analysis based on objects in it. We provide prompt to GPT-4o in context of emergency image.

How we built it

We utilize Yolo 11 nano from ultralytics which is the latest and light weight for object detection in image. Based on the objects in image and already defined prompt in code, it is provided to GPT-4o, then it generate content to user.

Challenges we ran into

The challenge I face, while developing this project is how to get the contextual information from image.

Accomplishments that we're proud of

The accomplishment I am proud of is that we have created a multimodal application, and combine yolo and gpt for our task. This is just the start of this application, we'll make it to next level in the future.

What we learned

It was a productive time while developing this project, we utilize python library flask for web application, computer vision model yolo and llm model gpt-4o.

What's next for Emergency Scence Analyzer

Next, we are planning to make it real time and bring it on economical hardware and will integrate some techniques to get contextual information from frame also.

Built With

Share this project:

Updates