Inspiration
We were inspired by another hackathon that we participated in that was about using generative AI for sustainability and circular economy.
What it does
Chemini, revolutionizes the evaluation of innovative ideas by harnessing the power of generative AI. Serving as a three-in-one tool, Chemini acts as an idea validator, filter, and moonshot finder. It automates the screening process, assigning scores based on user-defined metrics and weights, facilitating fair and efficient evaluations. Notably, Chemini distinguishes itself by flagging both suboptimal ideas, saving valuable evaluator time, and identifying ambitious moonshots, offering a comprehensive assessment. Its innovation lies in its adaptability, allowing users to customize evaluation criteria, ensuring relevance to their specific needs. Chemini represents a significant leap in augmenting human experiences in the dynamic realm of idea evaluation.
How we built it
We used gemini-pro as our main engine. We relied on prompt engineering to write a prompt that guide the llm to perform the desired task. We followed an iterative process in developing the prompt using prompt engineering guidelines. The final prompt contain instructions about the role that the llm should play, the input format, the metrics and their descriptions, steps to follow in order to score and flag the provided idea, and the format of the output. We noticed how the model get fooled by the use of words such as “revolutionary” and “cutting edge” and thus flag some shallow ideas as “moonshots”, to remediate this we added an intermediate step in which the llm should transform the provided solution into a neutral language before scoring the idea.
Challenges we ran into
At first, we had a problem with the model’s inability to carry out simple math calculations, to solve this we used the llm only to score the different metrics and used code to calculate the overall score. Additionally, the result of the llm were sometimes inconsistent; with the same input we got different outputs, also the scoring was very generous at first leading to useless scores. We solved this by iteratively improving our prompt.
Accomplishments that we're proud of
We are proud of how the final product has turned out and we believe it has potential. It’s a three in one tool. It screen and evaluate ideas giving scores for each metric with clear rationales, it also weeds out lazy ideas saving time for the human evaluator, and it also enable them to access the most interesting and ambitious ideas flagging them as moonshots. Additionally, it is highly customizable, the human evaluator can choose whatever metrics matter to them with their descriptions and level of importance.
What we learned
We learned a lot about prompt engineering and the power/limitation of generative AI tools.
What's next for Chemini
Further development would focus on refining the user interface, expanding the range of customizable metrics, and optimizing the generative AI for more nuanced and context-aware explanations. We would also explore generating a checklist for each metric that the llm can score, so the scoring become more stable. We would also explore other scoring schemes such as AHP-based approach and SWOT analysis…
Log in or sign up for Devpost to join the conversation.