Inspiration
We are fascinated by agents and wanted to try to design one for this hackathon. Several ideas came to mind, we kept this one because we found the idea of creating a simulation for our agent cool.
What it does
The agent powered by Mistral 7B is able to take control of the car in our simulation and decide what the next action to take will be. The latter drives the car as he sees fit while respecting the highway code (he does not drive on the sidewalk, respects traffic signs, red/green lights, etc.)
How we built it
We first created a simulation to be able to control a car in a 2D environment. The model communicates can communicate the environment with a system that translates the situation into a matrix. Each value of the matrix corresponds to a decorative element (example: 1 = roads, 0 = sidewalk, 2 = car, 3 = pedestrians etc.) However, using the model as is was not enough for him to understand how use the environment. That's why we dove into finetuning.
Challenges we ran into
As Mistral was not enough to understand our environment, we said to ourselves that we had to train it. However, our problem is very specific, so there is no data at our disposal to train the model. This was a challenge to overcome. For this, we designed a tool allowing us to play our simulation and save the situations in which we found ourselves in the correct data format. Thanks to this we were able to generate a large amount of data and we finetuned Mistral using LoRA on our data.
Accomplishments that we're proud of
We are very proud to have created an end-to-end project from the simulation to the inference of the model on the simulation through the training of the latter.
What we learned
We learned that it is possible to do reinforcement learning based solely on LLM training concepts without necessarily using RL tools such as policies.
What's next for MistralDriver
The next step is to make the simulation even more complex in order to have an environment even closer to reality and the real highway code.
GitHub link
Built With
- mistral
Log in or sign up for Devpost to join the conversation.