Inspiration

We wanted an AI that doesn’t just talk about webpages—it can actually do the work.

What it does

MatePI Copilot lives in Chrome’s side panel. You chat, and it can:

  • Read the page: understand the current tab’s content, summarize it, or find details
  • Click elements: buttons, links, checkboxes, icons
  • Type for you: fill forms, search boxes, write text
  • Go to a URL: open a specific website
  • Go back: return to the previous page

All actions are shown step by step, so you always know what it’s doing.

How we built it

  • Chrome Extension (side panel + content scripts) to execute actions on webpages
  • React-based chat UI for the conversation experience
  • Gemini/OpenAI Function Calling to translate natural language into browser actions (read / click / type / navigate / back)
  • Local, on-device settings (provider / model / API key)
  • ElevenLabs voice for hands-free spoken input and output

Challenges we ran into

  • Integrating function tools effectively — deciding when the AI should reason versus act, and ensuring each function call is reliable and predictable.
  • Using voice permissions responsibly — balancing hands-free interaction with user trust, privacy concerns, and browser permission limitations.
  • Choosing what to showcase — making clear trade-offs under limited time and scope to highlight MatePI’s core value.

Accomplishments that we’re proud of

  • Smooth, real-time streaming chat
  • Reliable multi-step actions (read → type → click)
  • Every action is visible and inspectable

What we learned

  • How to communicate effectively as a team — aligning on goals, responsibilities, and decisions to move fast without confusion.
  • How to use function calls in practice — understanding when and how to connect AI reasoning with real, executable actions.
  • How to deliver a complete product — taking an idea from concept to a working, end-to-end project within real constraints.

What’s next for MatePI

  • Stronger website interaction
    By adding scroll, wait, and hover capabilities, MatePI can handle dynamic loading content and dropdown-based navigation more reliably.

  • Visual and spatial understanding
    With native visual context (seeing rendered pages instead of raw HTML), MatePI can better understand layouts, charts, and spatial relationships on real websites.

  • Add Agent Skills support Current function tools clearly produce illusions in large-scale contexts. Introducing agent skills can both reduce costs and minimize illusions.

Resources

Built With

Share this project:

Updates