Inspiration
We've grown up in a world surrounded by technology. Schooled here, we consistently interact with a variety of interfaces from Canvas to YouTube. To us, the gear icon means settings and a magnifying glass shows us where we can search, but alternate text, what's that?
While building our UW Software Engineering Career Club website, we made an effort to create a friendly interface to serve our 1000+ members. We tossed in cool features and embeds, effectively tailoring the website for ourselves. Unintentionally, we missed that much of our design can't be accessed by those who are visually impaired. In some ways, our website could seen as littered with small, low-contrast text, icons, and unlabeled images. While we were willing to commit to attaining WCAG2 standards, we wondered if there was a way to make all websites more accessible without laborious hours of restyling.
Screen readers and accessibility considerations often go overlooked in website design to save money and time. Although we have made progress, technology has yet to integrate recent advances in rapid content summarization. That's why we decided to come up with HearSay -
What it does
What we do is simple: Visit a website and click on our tab to hear a soothing voice guide you through the website. Now, to navigate, all you have to do is talk.
How it works
We utilize selenium to visit, parse, and interact with web pages, Llama 3 for quick and accurate summarization, and Coqui-AI TTS with ElevenLabs to generate sociable, non-robotic speech. Moreover, we run everything locally and for free using LM Studio. Integrated through an extension, HearSay is easy to set up and works seamlessly with most browsers.
Challenges we ran into
We ran into many challenges trying to run this locally. Our first challenge was choosing a model that had a big enough context size to handle big chunks of text and huge HTML files but also not be super taxing and slow. We also ran into some library issues as typical web scraping libraries like using wget or beautiful soup don't work on JavaScript-generated websites like our clubs.
Accomplishments that we're proud of
We were able to run our model locally and use it as a local API endpoint to summarize web pages. Moving past the stereotypical robotic TTS was a small addition with a significant impact on user experience and our leveraging of AI for positive, impactful use.
What we learned
- UI/UX accessibility is often taken for granted.
- Developing software that harnesses generative AI can be hard to get right because its strength, creativity, is also its biggest weakness, inconsistency.
- Models that run locally can be astonishingly powerful and are amazing for testing!
What's next for HearSay
We're looking into better parsing and text summarization models that demonstrate greater accuracy. We were looking into pairing with more deterministic scraping methods, allowing us to play into gen-ai strengths and make up for its weaknesses. Then, as always: Deploy, Ship, Grow.
Log in or sign up for Devpost to join the conversation.