BackSeat_Driver

Peek at some object tracking

Inspiration

Our inspiration came from a series of different methods to detect traffic signs, cars, and scenerios on the rode. Our main usages are BLIP, CLIP, a RoboFlow model using LISA data, and YOLO8n model for car detection models to detect scenarios, cars, and traffic signs. We also have seen the power of LLM like chatGPT, so we used this model to do so.

The project name was inspired by the fact that the model would be responding to traffic related scenarios, acting like an arrogant backseat driver, (by giving advice). Hopefully it proves helpful!

What it does

Our project uses pre-trained models to extract from videos different traffic related data, building a scenario into text in which then we feed to a chatGPT api in order to determine a related question. Videos and multiple choice questions are given to us by Tesla.

How we built it

We build our project on NVIDIA VMs, for that extra compute power on the cloud and in order to run some of our models like YOLO and BLIP. We used BLIP, CLIP, a LISA based model, and YOLO8n image model in order to extract what exists and what is going on in a driving scenario. Each model extracts different things, like a description of what is going on, specific traffic signs, object detection, or object movement. We harness each model's strength, by taking a video and frame by frame (8 frames per 5 sec video approximately), and running these models on each one. Afterwards we aggregate this data per video and feed it into chatGPT's API. Additionally we give it a multiple choice question and with that data, and prompting it will make an answer.
Afterwards the answer is taken an converted into a csv file, which we use to submit.

Challenges we ran into

So many dependency issues, too many installations and conflicts!

Training models is time consuming! We opted to use pre-trained ones to save time, but training them ourselves would have allowed for on device models paired with NVIDIA VMs, thus faster compute.