Inspiration

If you want to find a piece of information, what would you do? You would search it up online, scroll through articles, and find the information. Or you might query the modern LLM platforms. But now, what if you couldn't even see your computer screen. What if you were visually impaired in a significant way?

For the visually impaired, finding information the regular ways is very difficult. The internet is filled with text, which could take hours to go through with a TTS tool. Finding a specific piece of information from purely reading out everything on the screen is a nightmare. Asking an LLM requires multiple things, including being able to use a keyboard regularly, and also, sitting through long responses that may be inaccurate or not useful.

What it does

Caecus, 'blind' in Latin, is a single-key, auditory platform for finding any information or research you need, and you do not even need to see what is on your screen. This web app is the definition of simplicity, in both visuals, audio, and accessibility. All you do is ask your question, Caecus will process everything, and give you a reliable answer, copied to your clipboard.

Caecus requires only two things. The enter/return key on your keyboard, and your voice. Every step is coordinated so that a visually impaired person can use a single button and their voice to find any specific information they want.

Click to start. Speak. Caecus will understand what you said. Caecus will then process what you asked. It scrapes the web so you don't have to, using a built-in tool for reliable sources, rather than a subjective response from an LLM. It queries each of the results it finds, extracting all the information from these sources relevant to the question asked. It then feeds this extracted information into Google's open-source Gemma model (Note: Not Gemini, the modern LLM). This model is used to extract further meaningful information reliably, as it processes the reliable information found from previous actions. It uses this distilled version of Gemma to then create a simply 2-3 sentence response to the user's query, based entirely on the research Caecus did previously on the user's question. The user is notified and the result is copied to clipboard for further use.

How we built it

Caecus uses a Flask backend with a minimalistic frontend built in vanilla HTML/CSS/JS. Voice input is captured using the Web Speech API, which sends user queries to the backend. We use Google’s open-source Gemma model accessed via OpenRouter for natural language understanding and summarization of reliably collected information (as opposed to a response generated entirely by experimental LLM). Text-to-speech is handled through ElevenLabs. The entire interface is navigated by voice and a single key on the keyboard alone, providing a streamlined, screen-free experience for blind or visually impaired users.

Challenges we ran into

One major challenge was making voice interaction truly seamless, handling speech recognition, summarization, and TTS without lag or confusion. I also hit API limits during development and had to test multiple platforms for each of the parts of Caecus. Ensuring everything was accessible with just a keyboard and voice also required careful UI and logic design, so a person who could not see the screen was not impeded by this and could still seamlessly navigate.

Accomplishments that we're proud of

We’re proud that Caecus actually works end-to-end: speak a query, get a summarized answer, be able to use it anywhere, all with a single key. Despite time constraints and limited resources, we built something that could be genuinely useful for visually impaired users. It completely changes the requirements needed to find specific, often minute, pieces of information, previously quit inaccessible to someone who could not see the screen.

What we learned

We learned how to integrate multiple services into a smooth pipeline that aligns with the target audience. We also gained a better appreciation for accessibility design, how small choices in UX can make or break usability for someone who cannot use platforms the way they are generally used. Building accessible platforms should be the norm, rather than a difficult-to-find side feature that takes longer to use than the regular methods.

What's next for Caecus

I want to first refine the platform further, being built in a limited time period. I also want to add context memory for follow-up questions, support for other languages, and offline capabilities. Another interesting feature is a gesture-recog system to trigger and send messages to the system, providing another option to communicate with Caecus' interface that would not even require the user's voice. I am also exploring open-source TTS options to make the project more sustainable and cost-effective.

Built With

Share this project:

Updates