1. Inspiration

Most AI systems today are limited to text-based interaction. I wanted to build an AI that feels more alive — something closer to a real assistant like Jarvis. CENTURI was created as an exploration assistant that can listen to users, analyze images through the camera, and respond intelligently using Gemini AI.

  1. What it does

CENTURI is an AI assistant that can:

Listen to user voice commands

Understand questions using Gemini AI

Respond with synthesized speech

Analyze images captured from the camera

Provide explanations of objects or scenes

This creates a more natural human-AI interaction where users can talk to the AI and show objects to it.

  1. How we built it

CENTURI was built using Python and Google's Gemini AI models. The system integrates multiple technologies to create a multimodal AI experience:

Gemini AI for reasoning and vision

Faster-Whisper for speech recognition

gTTS for voice responses

Streamlit for the interactive web interface

OpenCV for camera capture

These components work together to allow the AI to hear, see, and speak.

  1. Challenges we ran into

The biggest challenge was integrating multiple AI components together in real time. Managing speech recognition, camera input, and AI responses while handling API limits required careful debugging and system design. Another challenge was ensuring the system worked smoothly in a local environment while preparing it for demo presentation.

  1. Accomplishments that we're proud of

We successfully created a multimodal AI system that can interact with users using voice and vision rather than only text. CENTURI demonstrates how AI can move beyond static chat interfaces toward more immersive experiences.

  1. What we learned

This project helped us learn how to build multimodal AI systems that combine voice, vision, and large language models. We also learned how to integrate different AI tools and frameworks into a single interactive application.

  1. What's next for CENTURI

Future improvements include adding real-time voice conversations, improving visual recognition, deploying the system to the cloud, and expanding CENTURI into a fully autonomous AI assistant capable of performing tasks across applications.

Built With

Share this project:

Updates