Co-Author

Inspiration

Control-Net was an exciting advancement in the field but was focused on a single modality. But the world is largely multi-modal, and there is no critic that governs control-net or co-pilots in general. So we wanted to build a Control-Net (solely via prompting for now) for the multi-modal world while it's being governed. We thought the best way to show this is through literature!

What it does

We built a visual novel/poem book where the authors can generate content based on images (or text) and change stories at any point of the book based on the rest of the book. Inspired by the latest versions of coding co-pilot where best performances stem from ingesting the entire codebase. All changes are monitored and evaluated by the critic to ensure the best changes are being maintained.

How we built it

Started from basic python scripts to realize the prompts and their use case. Moved forward from there to build an interactive UI around it.

Challenges we ran into

Maintaining context while dynamically changing all parts of a book was difficult from the prompting perspective for the critic as it was easily confused. So we had to carefully design the prompt to ensure the critic had fair and sensible evaluations.

Accomplishments that we're proud of

Multi-modal critics for multi-modal copilots are a very new concept (https://arxiv.org/abs/2410.02712). We are quite happy with our use-case and its integration into pixtral.

What we learned

The nuances followed in prompting text models differ quite differently when interacting with multi-modal models. Most of our time was invested in adapting to this (and experimenting with mistral's suite).

What's next for Co-Author

Building a better UI for ease of use for actual users.
Integrating more sophisticated prompts to maximize context understanding.
Finetune models to our use case. Especially the critic.

Built With

mistral
python
streamlit

Updates

Soumya Snigdha Kundu started this project — Oct 06, 2024 07:23 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.