Inspiration
The current methods for accessing the web for persons with visual impairments is inefficient and generally inaccessible. Automated braille keyboards are used by a small fraction of people and screen reader technology has not progressed greatly.
What it does
So my goal was to create a bot that can navigate any website and talk with the user at every step. The bot will control navigation of the browser based on the users requests and it can provides answers in a clear, understandable way and includes more information should the user ask for it.
How we built it
In short, it uses a combination of large language models to summarize the websites, process user input and handle navigation of the browser.
OpenAI Whisper is used to transcribe the audio. OpenAI instructional models are used for answering questions and generating commands and page descriptions.
It can be a complicated process to explain because of the multithreaded nature of the program:
- NavBot is continuously outputting to the user what it is doing, responding to the users questions and sometimes waiting for the user to respond.
- Meanwhile, the input of the user is continuously parsed. If it is a request for settings change, then it will go down that pipeline. Otherwise, it is classified as either a question, objective or unknown.
If it is an objective, NavBot will issue commands to the browser to achieve that objective. If it is a question, it is answered.
In both cases, it uses a LLM provided by OpenAI to generate the text.
Challenges we ran into
The output of the LLM's can be unpredictable. It was hard to manage the output and prompt engineer the models to perform as expected.
The other incredibly hard challenge was managing the user flow. For example, the user should be able to interrupt the bot (if it has a long queue of text).
Many websites do not follow accessibility guidelines. This makes it very difficult to parse the html content and figure out what each element does.
Accomplishments that we're proud of
- Handling of settings with validated changes
- Developing the web crawler to accept input from the LLM's
What we learned
There are standard patterns to do with AI engineering. Information verification and generating text follow patterns that can be re-used across programs.
Prompt engineering is necessary for getting the model to do what is necessary.
What's next for Navbot
- Expanding to general web accessibility
- Fine tuning base models for better results
- User testing
- Providing more mediums of input/output
- Integrating a "Be my eyes" feature to describe images.
Log in or sign up for Devpost to join the conversation.