Inspiration
The main inspiration for our project came from our varied experiences with AI agents in the past. It feels like every new month there is some new "best" AI model, and the ones that we have available to us are usually hit or miss. To try to solve this, we wanted to create some live benchmark for LLMS that would test multiple practical skills including coding, structuring design, and working within set parameters. That way we could get a clear and unbiased representation of how each model compares to others agreed upon by the community.
What it does
For each call to the website, a prompt to build a certain lego structure is randomly selected and given to two random LLMS (out of a list) for processing. These LLMS return code that "builds" these structures which our backend then executes, stores, and passes back to the website in the form of a 3D intractable image. Users are encouraged to vote for whichever response best mirrors the intent of the prompt. As more and more users submit their votes, a running tally of votes is calculated for each model and stored in a Leaderboard which can be viewed by the users.
How we built it
We built this program using a microservices architecture with three components. The frontend component was responsible for hosting the website and providing an interactive environment for the users. The frontend would send requests to our second component, the database, through an api call containing the intended prompt and the name of the two LLMS to run. The database would then call a Snowflake model API to generate the LEGO build as a LDraw file, store/cache results in Cloudfare, and record how users voted for each model. Our third component, the Snowflake model, would take an input as a prompt and use the specified AI model to create code, which it would then execute and reutrn back to the database in the format of an LDraw file.
Challenges we ran into
The single biggest challenge that we ran into was loading the 3d LEGO environment inside a hosted website. When we ran it on localhost it worked, but whenever we tried to host, it would simply show the screen with everything but the 3D objects. We ran into this issue because we were using raw javascript for creating the frontend where Cloudfare expected some specific derivation (eg React/Node/etc). We eventually diagnosed this error by testing it on our localhost and concluded that the best way to resolve it would be to convert the raw javascript into NextJS so it was more simpler for Cloudfare to process. After even more trial and error with converting this code this eventually worked and our website hosted as expected.
What we learned
I think one of the most important takeaways that we got form this project was learning not just how microservices work, but how to actually implement them together when working as a team. We have all heard about microservices in class, but this was the first time that all of us had to actually implement multiple different functionalities on such a large scale program.

Log in or sign up for Devpost to join the conversation.