June – Your Personal AI Agent

Inspiration

I believe phones have reached their peak in innovation, and the next big shift is smart glasses. They won't replace phones, but they will transform into the most powerful accessory we carry. Their true potential is to serve as a seamless personal assistant, and that's exactly what I'm building with June.

What it does

Instead of tapping through screens or typing commands, you simply speak naturally, and June takes care of the rest.

Feature	Description
Email	Search, summarize, and answer questions from your Gmail inbox
Calendar	Check availability, schedule, or reschedule meetings instantly
Contacts & Calls	Dial numbers or call people directly from your phonebook
Voice Conversations	Whisper converts your speech to text, GPT-OSS processes intent, and TTS replies back naturally in real time

How we built it

We built June using a microservices architecture with specialized servers handling different aspects of the voice AI pipeline.

Architecture Design

graph TD
    A[Android App] --> B[Main Server :5005]
    B --> C[Whisper Server :5004]
    B --> D[GPT-OSS Server :5003]
    B --> E[TTS Server :5002]
    B --> F[MCP Server :8080]

    C --> G[OpenAI Whisper Model]
    D --> H[Groq API + GPT-OSS 20B]
    E --> I[Text-to-Speech Engine]
    F --> J[Google APIs]

    J --> K[Gmail]
    J --> L[Calendar]
    J --> M[Contacts]
    J --> N[YouTube]

Technical Stack

FastAPI → High-performance async web framework for microservices
Python 3.11+ → Core programming language with async/await support
GPT-OSS 20B → Large language model via Groq's optimized API
OpenAI Whisper → Speech-to-text with high accuracy
Custom TTS Engine → Text-to-speech synthesis for natural responses
Google OAuth2 → Secure authentication for Gmail, Calendar, Contacts

Development Process

Foundation Setup → Created virtual environment and installed dependencies
Model Migration → Replaced Claude AI with GPT-OSS 20B (via Groq API) for cost efficiency
Microservices Design → Built 5 specialized servers (ports 5002–5005, 8080)
Voice Pipeline → Integrated Whisper → GPT-OSS → TTS for seamless voice interactions
Google Integration → Implemented OAuth2 for Gmail, Calendar, Contacts
Mobile App → Built Android client for voice input & real-time communication

Challenges we ran into

[x] Model Size & Compute Limitations → GPT-OSS 20B (40–80GB) too large to run locally
[x] Real-time Performance → Voice interactions required sub-second latency
[x] Port Conflicts → Multiple Python servers competing for the same ports
[x] Google API Authentication → OAuth2 flow with token refresh across services was tricky
[x] Intent Classification Accuracy → Determining user intent from speech was inconsistent
[x] System Reliability → External API failures risked breaking the pipeline

Accomplishments that we're proud of

Technical Skills

Built a scalable microservices architecture
Mastered API integrations & OAuth2 authentication flows
Optimized for real-time, low-latency processing
Leveraged Groq's inference API for cost-effective large-scale AI

AI/ML Insights

Learned trade-offs between local vs. cloud LLM deployment
Built a complete voice pipeline: Whisper → GPT-OSS → TTS
Improved intent recognition with hybrid rule-based + AI methods
Practiced advanced prompt engineering

Development Practices

Implemented error handling & fallback strategies
Applied performance modeling for latency optimization
Built testing strategies for distributed systems
Wrote comprehensive documentation for a complex architecture

What we learned

Key Takeaways:

How to build real-time voice processing pipelines

Seamless Google services integration (Gmail, Calendar, Contacts)

Practical intent recognition techniques

Scalable microservices design for AI workloads

Delivering sub-second response times with async programming

What's next for June

Immediate Roadmap

[ ] Multi-language support for global users
[ ] Custom wake words for personalized activation
[ ] Local model deployment for privacy & control
[ ] Conversation memory for context-aware multi-turn interactions

Advanced Features

[ ] Voice cloning → Personalized TTS voices from user samples
[ ] Advanced integrations → Slack, Notion, GitHub, and more
[ ] Smart home control → IoT and automation integration
[ ] Meeting assistant → Live transcription, summaries, action items

Built with: Python FastAPI OpenAI Whisper Groq API Google APIs Android TTS OAuth2

Fun fact: The average response time is <1 second from voice input to TTS output! ⚡

Made with ❤️ during the hackathon

Built With

android
fastapi
gmail-api
google-0auth2
google-calendar
google-cloud
google-contacts
gpt-oss
groq
java
kotlin
openai-whisper
python
windows

Submitted to

OpenAI Open Model Hackathon

Created by

Architected and implemented a complete open-source voice AI system with microservices architecture, migrating from proprietary Claude AI to GPT-OSS 20B via Groq API, achieving sub-second response times for real-time voice interactions.

Built comprehensive integration between 5 specialized servers (TTS, Whisper, GPT-OSS, MCP, Main), Google services (Gmail, Calendar, Contacts), and Android mobile app with OAuth2 authentication and intent classification.

Created extensive documentation including detailed README, project story, testing guides, and architecture documentation, plus successfully deployed the complete system to GitHub with production-ready configuration and troubleshooting guides.

Nandha Ilangovan

Updates

Nandha Ilangovan started this project — Sep 01, 2025 10:22 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.