Inspiration

We saw google and compared it to chatgpt and we wanted to combine both of their positive aspects: Energy efficient + Smart

What it does

chat interface that combines a large and a small llm to make one energy efficient system. by default the small llm runs but if a token generated excedes a certain perplexity threshold then the token is deleted and replace with a token generated from the large llm then it switches back

How we built it

We built it through Python, Html, css and Javascript, we worked with Qwen 3 4b 2507 and a ollama model

Challenges we ran into

Image support and online search

Accomplishments that we're proud of

We are happy that we were able to show what text what generated from the large llm and the small llm

What we learned

We learned more about the tokenization process and how its

What's next for fih

Next we will add more models so there will be more large llms and small llms

Share this project:

Updates