Project Story
About the project
The inspiration for PixtralSlides came from a real-life challenge: we have to present a scientific paper at a conference next year, and we turned to LLMs to help create a scientific presentation for it. However, the results were underwhelming. The content was fine, but the formatting and layout left a lot to be desired. One major issue was that the LLM couldn't "see" how the slides it generated actually looked.
This is where PixTral came into play! PixTral, a multimodal model, can analyze the visuals of the slides and offer valuable feedback on their layout, format, and overall presentation. With this ability, we built PixtralSlides, which takes a LaTeX-generated presentation, runs it through PixTral for critique, and then iteratively improves the slides based on that feedback.
What we learned
We learned a lot about multimodal models, dealing with their API responses, handling image conversions, debugging for hours, and the importance of teamwork. This project gave us deeper insight into the complexities of handling multimodal inputs and ensuring smooth integration between text-based LLMs and visual feedback models like PixTral.
How we built it
We first take either LaTeX files or a PDF, parse the content, and extract the images. Next, we ask the first round of the LLM to write a scientific presentation based on the paper. Once we have the initial presentation, we compile it and convert it into images. These images are then fed back, in batches, into the PixTral model for critique. Finally, we take that critique and feed it back into an LLM that improves the LaTeX presentation code based on the feedback provided.
Challenges we faced
The challenges we faced were numerous. LaTeX compilation issues were a major hurdle, especially when iterating through multiple revisions. We also ran into API limitations and the occasional response delay. Managing the limited capabilities of the models while balancing the pipeline structure was difficult. However, through perseverance and teamwork, we were able to build a functioning pipeline that iteratively improves the presentation quality.
Log in or sign up for Devpost to join the conversation.