Inspiration
Our inspiration came from a series of different methods to detect traffic signs, cars, and scenerios on the rode. Our main usages are BLIP, CLIP, a RoboFlow model using LISA data, and YOLO8n model for car detection models to detect scenarios, cars, and traffic signs. We also have seen the power of LLM like chatGPT, so we used this model to do so.
The project name was inspired by the fact that the model would be responding to traffic related scenarios, acting like an arrogant backseat driver, (by giving advice). Hopefully it proves helpful!
What it does
Our project uses pre-trained models to extract from videos different traffic related data, building a scenario into text in which then we feed to a chatGPT api in order to determine a related question. Videos and multiple choice questions are given to us by Tesla.
How we built it
We build our project on NVIDIA VMs, for that extra compute power on the cloud and in order to run some of our models like YOLO and BLIP. We used BLIP, CLIP, a LISA based model, and YOLO8n image model in order to extract what exists and what is going on in a driving scenario. Each model extracts different things, like a description of what is going on, specific traffic signs, object detection, or object movement. We harness each model's strength, by taking a video and frame by frame (8 frames per 5 sec video approximately), and running these models on each one. Afterwards we aggregate this data per video and feed it into chatGPT's API. Additionally we give it a multiple choice question and with that data, and prompting it will make an answer.
Afterwards the answer is taken an converted into a csv file, which we use to submit.
Challenges we ran into
So many dependency issues, too many installations and conflicts!
Training models is time consuming! We opted to use pre-trained ones to save time, but training them ourselves would have allowed for on device models paired with NVIDIA VMs, thus faster compute.
Accomplishments that we're proud of
We were able to build a VLM while making use of several models! We also learned to use new tools like chatGPT and RoboFlow APIs and NVIDIA VMs!
What we learned
We learned to use new tools like chatGPT and RoboFlow APIs and NVIDIA VMs!
What's next for BackSeat_Driver
We hope to put many of the models we call via API onto actual device plus tune them with more data.
Log in or sign up for Devpost to join the conversation.