June – Your Personal AI Agent

Inspiration

I believe phones have reached their peak in innovation, and the next big shift is smart glasses. They won't replace phones, but they will transform into the most powerful accessory we carry. Their true potential is to serve as a seamless personal assistant, and that's exactly what I'm building with June.

What it does

Instead of tapping through screens or typing commands, you simply speak naturally, and June takes care of the rest.

Feature Description
Email Search, summarize, and answer questions from your Gmail inbox
Calendar Check availability, schedule, or reschedule meetings instantly
Contacts & Calls Dial numbers or call people directly from your phonebook
Voice Conversations Whisper converts your speech to text, GPT-OSS processes intent, and TTS replies back naturally in real time

How we built it

We built June using a microservices architecture with specialized servers handling different aspects of the voice AI pipeline.

Architecture Design

graph TD
    A[Android App] --> B[Main Server :5005]
    B --> C[Whisper Server :5004]
    B --> D[GPT-OSS Server :5003]
    B --> E[TTS Server :5002]
    B --> F[MCP Server :8080]

    C --> G[OpenAI Whisper Model]
    D --> H[Groq API + GPT-OSS 20B]
    E --> I[Text-to-Speech Engine]
    F --> J[Google APIs]

    J --> K[Gmail]
    J --> L[Calendar]
    J --> M[Contacts]
    J --> N[YouTube]

Technical Stack

  • FastAPI → High-performance async web framework for microservices
  • Python 3.11+ → Core programming language with async/await support
  • GPT-OSS 20B → Large language model via Groq's optimized API
  • OpenAI Whisper → Speech-to-text with high accuracy
  • Custom TTS Engine → Text-to-speech synthesis for natural responses
  • Google OAuth2 → Secure authentication for Gmail, Calendar, Contacts

Development Process

  1. Foundation Setup → Created virtual environment and installed dependencies
  2. Model Migration → Replaced Claude AI with GPT-OSS 20B (via Groq API) for cost efficiency
  3. Microservices Design → Built 5 specialized servers (ports 5002–5005, 8080)
  4. Voice Pipeline → Integrated Whisper → GPT-OSS → TTS for seamless voice interactions
  5. Google Integration → Implemented OAuth2 for Gmail, Calendar, Contacts
  6. Mobile App → Built Android client for voice input & real-time communication

Challenges we ran into

  • [x] Model Size & Compute Limitations → GPT-OSS 20B (40–80GB) too large to run locally
  • [x] Real-time Performance → Voice interactions required sub-second latency
  • [x] Port Conflicts → Multiple Python servers competing for the same ports
  • [x] Google API Authentication → OAuth2 flow with token refresh across services was tricky
  • [x] Intent Classification Accuracy → Determining user intent from speech was inconsistent
  • [x] System Reliability → External API failures risked breaking the pipeline

Accomplishments that we're proud of

Technical Skills

  • Built a scalable microservices architecture
  • Mastered API integrations & OAuth2 authentication flows
  • Optimized for real-time, low-latency processing
  • Leveraged Groq's inference API for cost-effective large-scale AI

AI/ML Insights

  • Learned trade-offs between local vs. cloud LLM deployment
  • Built a complete voice pipeline: Whisper → GPT-OSS → TTS
  • Improved intent recognition with hybrid rule-based + AI methods
  • Practiced advanced prompt engineering

Development Practices

  • Implemented error handling & fallback strategies
  • Applied performance modeling for latency optimization
  • Built testing strategies for distributed systems
  • Wrote comprehensive documentation for a complex architecture

What we learned

Key Takeaways:

  • How to build real-time voice processing pipelines
  • Seamless Google services integration (Gmail, Calendar, Contacts)
  • Practical intent recognition techniques
  • Scalable microservices design for AI workloads
  • Delivering sub-second response times with async programming

What's next for June

Immediate Roadmap

  • [ ] Multi-language support for global users
  • [ ] Custom wake words for personalized activation
  • [ ] Local model deployment for privacy & control
  • [ ] Conversation memory for context-aware multi-turn interactions

Advanced Features

  • [ ] Voice cloning → Personalized TTS voices from user samples
  • [ ] Advanced integrations → Slack, Notion, GitHub, and more
  • [ ] Smart home control → IoT and automation integration
  • [ ] Meeting assistant → Live transcription, summaries, action items

Built with: Python FastAPI OpenAI Whisper Groq API Google APIs Android TTS OAuth2

Fun fact: The average response time is <1 second from voice input to TTS output! ⚡


Made with ❤️ during the hackathon

Built With

Share this project:

Updates