Inspiration
Recently, language models such as ChatGPT have proven capabilities to pass medical examinations such as the USMLE. Some LLM companies have begun launching models for health purposes without considering the impact of bias. Large language models also offer new opportunities for understanding the impact of bias in medical data in applied settings.
What it does
This program compiles USMLE step-3 case questions and alters demographic details, such as race, gender, and disability. Models test each other on whether they are capable of answering correctly across a variety of demographic factors.
How we built it
I used an Inspect base in Python and added Gemini, Claude, and OpenAI APIs. I also used Gemini prompt engineering to develop the database.
Challenges we ran into
computer has broken and deleted my code
Accomplishments that we're proud of
not crying
What we learned
always commit before closing vscode
What's next for US-ML-Eval
Nobel peace prize
Built With
- chatgpt
- claude
- gemini

Log in or sign up for Devpost to join the conversation.