Inspiration
I had this idea a while ago when I saw AI able to summarize website contents. I was wondering if that could be leveraged for accessibility purposes.
What it does
It uses AI to suggest 5 actions that can be performed on the current page. It also provides a reader which removes cruft from the webpage (i.e. ads and images) and reads the content aloud. Finally, it also gives the user some context of where they are and what the page is (e.g. "you are on a search results page on YouTube for the term "writing a C program").
How we built it
We used the chrome extension developer guide to get the initial template, along with some AI prompts to provide some boilerplate code. We used GitHub to share our code and collaborated within commits to the main branch. We each worked on our own part of the project, such as the reader or text to speech functionality.
Challenges we ran into
A lot of the issues were more meta-related. i.e. we had some trouble early on with ssh keys. We also had some problems with merge conflicts, but eventually we did get over that.
Accomplishments that we're proud of
Overall the project was a success. For example, I was able to navigate to a Youtube video with my eyes closed and the screen narrator on.
I think being able to leverage AI to provide not only the names of the actions but code to preform them was also a big step. The initial getting stuff working required a lot of work to feed gemini enough annotated information to be able to say that this action links to focusing on this HTML element or visiting this URL.
What we learned
- How to develop a chrome extension
- How to use AI within projects and call the correct APIs
- Prompt engineering
- Interacting with ElevenLabs
What's next for AI Websuggest
I think getting it to an open-source-able state would be great. Currently API keys are used directly in the project. We'd like to have those set up once for users.
Getting real visual impaired people to interact and provide feedback would also be helpful. I know we built a lot of TTS functionality but we might be able to leave out if we know what a screen reader will read.
A lot more refinement is likely needed. We don't have tests for example. It would also be good to have features be toggle-able. Maybe TTS could be turned off, say, in the case where a user doesn't have the API key for it.
Finally publishing on the chrome web store would open up others to use this project.
Built With
- chrome
- elevenlabs
- gemini
- html
- javascript
Log in or sign up for Devpost to join the conversation.