miROR

Sequencing data is complex, and the data exists for it not to be.

Researchers perform sequencing tests every day, and a lot of that information goes stale because parsing it is too time consuming. LLMs make it possible to actually understand that information without putting in nearly the same amount of required effort.

What it does

miROR is a Mistral base model fine tuned on ~10,000 PubMed papers to understand micro RNA (miRNA) data, and can take a series of miRNA molecules as string and give you window to see what's happening in that organism.

How we built it

Crawled miRNA databases to gather 3000 relevant miRNAs
Crawled Pubmed to gather articles about those RNAs
Use those articles to fine tune the Mistral base model
Created a benchmark of 20 questions and compared outputs across the fine tuned model, mistral-large-latest, gpt-3.5-turbo, and gpt-4-1106-preview. As of the time of writing, the fine-tuned model out performed the the base mistral model, mistral-large-latest, and gpt-3.5-turbo assessed on recall.

Challenges we ran into

We had some troubles connecting to a GPU which set us back several hours, and made a critical mistake to not fine tune on the instruct model to start.
Model currently does not output tokens as reliably as needed because of these issues, however, it does do a good job answering questions and recalling the right content based on the prompt

Accomplishments that we're proud of

Trained a model successfully, validation loss and training loss acted as expected and the model recalled information well without any RAG ## What we learned
Prompt engineering is critical when trying to run benchmarks.
Accurate benchmarking takes a significant amount of time and ffort. ## What's next for miROR
Clean up the code and get it operational, use deploy internally at my company, and then swap it out for gpt-3.5-turbo which we are currently using

Built With

colab
mistral
python

Updates

Oluseun Omonije started this project — Mar 24, 2024 02:49 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.