June – Your Personal AI Agent
Inspiration
I believe phones have reached their peak in innovation, and the next big shift is smart glasses. They won't replace phones, but they will transform into the most powerful accessory we carry. Their true potential is to serve as a seamless personal assistant, and that's exactly what I'm building with June.
What it does
Instead of tapping through screens or typing commands, you simply speak naturally, and June takes care of the rest.
| Feature | Description |
|---|---|
| Search, summarize, and answer questions from your Gmail inbox | |
| Calendar | Check availability, schedule, or reschedule meetings instantly |
| Contacts & Calls | Dial numbers or call people directly from your phonebook |
| Voice Conversations | Whisper converts your speech to text, GPT-OSS processes intent, and TTS replies back naturally in real time |
How we built it
We built June using a microservices architecture with specialized servers handling different aspects of the voice AI pipeline.
Architecture Design
graph TD
A[Android App] --> B[Main Server :5005]
B --> C[Whisper Server :5004]
B --> D[GPT-OSS Server :5003]
B --> E[TTS Server :5002]
B --> F[MCP Server :8080]
C --> G[OpenAI Whisper Model]
D --> H[Groq API + GPT-OSS 20B]
E --> I[Text-to-Speech Engine]
F --> J[Google APIs]
J --> K[Gmail]
J --> L[Calendar]
J --> M[Contacts]
J --> N[YouTube]
Technical Stack
- FastAPI → High-performance async web framework for microservices
- Python 3.11+ → Core programming language with async/await support
- GPT-OSS 20B → Large language model via Groq's optimized API
- OpenAI Whisper → Speech-to-text with high accuracy
- Custom TTS Engine → Text-to-speech synthesis for natural responses
- Google OAuth2 → Secure authentication for Gmail, Calendar, Contacts
Development Process
- Foundation Setup → Created virtual environment and installed dependencies
- Model Migration → Replaced Claude AI with GPT-OSS 20B (via Groq API) for cost efficiency
- Microservices Design → Built 5 specialized servers (ports 5002–5005, 8080)
- Voice Pipeline → Integrated Whisper → GPT-OSS → TTS for seamless voice interactions
- Google Integration → Implemented OAuth2 for Gmail, Calendar, Contacts
- Mobile App → Built Android client for voice input & real-time communication
Challenges we ran into
- [x] Model Size & Compute Limitations → GPT-OSS 20B (40–80GB) too large to run locally
- [x] Real-time Performance → Voice interactions required sub-second latency
- [x] Port Conflicts → Multiple Python servers competing for the same ports
- [x] Google API Authentication → OAuth2 flow with token refresh across services was tricky
- [x] Intent Classification Accuracy → Determining user intent from speech was inconsistent
- [x] System Reliability → External API failures risked breaking the pipeline
Accomplishments that we're proud of
Technical Skills
- Built a scalable microservices architecture
- Mastered API integrations & OAuth2 authentication flows
- Optimized for real-time, low-latency processing
- Leveraged Groq's inference API for cost-effective large-scale AI
AI/ML Insights
- Learned trade-offs between local vs. cloud LLM deployment
- Built a complete voice pipeline:
Whisper → GPT-OSS → TTS - Improved intent recognition with hybrid rule-based + AI methods
- Practiced advanced prompt engineering
Development Practices
- Implemented error handling & fallback strategies
- Applied performance modeling for latency optimization
- Built testing strategies for distributed systems
- Wrote comprehensive documentation for a complex architecture
What we learned
Key Takeaways:
- How to build real-time voice processing pipelines
- Seamless Google services integration (Gmail, Calendar, Contacts)
- Practical intent recognition techniques
- Scalable microservices design for AI workloads
- Delivering sub-second response times with async programming
What's next for June
Immediate Roadmap
- [ ] Multi-language support for global users
- [ ] Custom wake words for personalized activation
- [ ] Local model deployment for privacy & control
- [ ] Conversation memory for context-aware multi-turn interactions
Advanced Features
- [ ] Voice cloning → Personalized TTS voices from user samples
- [ ] Advanced integrations → Slack, Notion, GitHub, and more
- [ ] Smart home control → IoT and automation integration
- [ ] Meeting assistant → Live transcription, summaries, action items
Built with: Python FastAPI OpenAI Whisper Groq API Google APIs Android TTS OAuth2
Fun fact: The average response time is <1 second from voice input to TTS output! ⚡
Made with ❤️ during the hackathon
Built With
- android
- fastapi
- gmail-api
- google-0auth2
- google-calendar
- google-cloud
- google-contacts
- gpt-oss
- groq
- java
- kotlin
- openai-whisper
- python
- windows
Log in or sign up for Devpost to join the conversation.