GitHub:
Slides
What it does
LaVague provides a voice interface to surf the internet by turning natural language commands into browser interaction
How we built it
- Continuous audio stream listening
- When audio is above threshold we send audio to a Whisper model for transcription of the voice command
- We then send the current HTML page + voice command to Llama index to generate a prompt
- Prompt is enhanced by Few-Shot learning and Chain of Thought. Prompt engineering is automated with DSPy.
- We send the prompt to model hosted by Fireworks
- We receive code to execute on the selenium controlled browser
Challenges we ran into
Poor retriever for HTML, which means chunks are not well split, which makes it hard to have a good generation.
Accomplishments that we're proud of
Well it's cool and it works and it's non trivial
What's next for LaVague
Thinking of open sourcing and let the community improve it and help build together the future of internet interaction.
Log in or sign up for Devpost to join the conversation.