ScopeCache

ScopeCache

Inspiration: We noticed that AI is treated as free and infinite by most users, but every query has a real tangible cost (electricity, water, cooling, etc.) that remains invisible to users. As AI usage scales to billions of daily queries, that invisible waste compounded into a significant and growing environmental burden we felt compelled to address. What it does: EcoPrompt is a split screen. AI chat application that reduces the environmental cost of every query through three mechanisms: semantic deduplication (serving cached responses for similar questions via vector search), model right-sizing (routing simple prompts to smaller models and complex ones to larger ones), and a real-time sustainability dashboard that shows exactly how much energy and CO₂ was avoided in measurable physical units like kWh and grams of CO₂.

How we built it: We built EcoPrompt as a Next.js application using Amazon Bedrock and Titan Embeddings to vectorize incoming prompts and JavaScript to perform a cosine similarity search against previously cached queries. Model inference runs through Bedrock's Claude Haiku and Sonnet models, with a heuristic complexity classifier routing each prompt to the appropriate tier.

Challenges: The trickiest challenge was getting the embedding overhead math right; every query pays the vectorization cost regardless of whether it's a cache hit, so we had to carefully tune the similarity threshold (0.92) to balance false positives against semantic matches without inflating the infrastructure footprint we were trying to reduce. We also had to come to terms that the system only delivers net CO₂ savings at sufficient query volume and query repetitiveness and built that nuance into the dashboard rather than overstating the impact.

Accomplishments: We're proud that EcoPrompt makes AI sustainability tangible rather than being strictly theoretical. Users can watch the CO₂ counter update in real time and see exactly how many LLM calls were avoided. We're also proud that the architecture achieves its efficiency goals without restricting what users can ask or degrading response quality in any way!

What we learned: We learned that sustainability in AI is fundamentally an engineering problem, not a behavioral one. We also found out that using dot product search works the same as cosine similarity when the embeddings are already normalized (which Titan outputs are), allowing us to be more efficient without losing any accuracy.

What's next for EcoPrompt: We want to add more LLM models into our domain to increase the diversity of responses a user can receive. Longer term, we'd love to open-source EcoPrompt as a middleware layer that any organization can drop in front of their existing AI infrastructure.