LaVague

Internet surfing 2.0

Comment

GitHub:

Slides

What it does

LaVague provides a voice interface to surf the internet by turning natural language commands into browser interaction

How we built it

Continuous audio stream listening
When audio is above threshold we send audio to a Whisper model for transcription of the voice command
We then send the current HTML page + voice command to Llama index to generate a prompt
Prompt is enhanced by Few-Shot learning and Chain of Thought. Prompt engineering is automated with DSPy.
We send the prompt to model hosted by Fireworks
We receive code to execute on the selenium controlled browser

Challenges we ran into

Poor retriever for HTML, which means chunks are not well split, which makes it hard to have a good generation.

Accomplishments that we're proud of

Well it's cool and it works and it's non trivial

What's next for LaVague

Thinking of open sourcing and let the community improve it and help build together the future of internet interaction.

Built With

dspy
fireworks
llama-index
python
selenium
whisper

Updates

DanyWin Daniel Huynh started this project — Feb 24, 2024 04:58 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.