Inspiration
Books are our gateway to deep knowledge and lasting wisdom. But for many, reading feels intimidating, complex, or even boring-especially in the age of constant distractions from social media. But reading is a skill everyone should master now more than ever. I love reading books and every time I find myself with questions, I have to type and search on Google, or ask Siri. But I lose track of everything I learnt while I go through the whole process of finding answers. I started thinking about how I can make reading easier for people like me who want to read books easily and learn. So the question I asked myself is this: how can technology (especially AI) inspire us to engage with books more comfortably, spark curiosity, and help us truly absorb knowledge?
What it does
I decided to build a mobile application, that works as a voice assistant and guides me when I am reading books. The idea I had in mind was clear: a voice assistant that listens to my questions while I am reading, and answers them out loud for me. This voice assistant should also keep track of the questions and answers so I can come back to them later whenever I want. The questions one gets while reading are generally of these types: word meanings (vocabulary that the reader does not understand), or questions about the context (what the author means by a certain statement), or any other curious literature questions (about the book, author, themes etc.) So history could be clear and helpful in identifying what a reader learnt in their reading sessions. The assistant is called "Athena" because she is a mythical Greek goddess known for wisdom. Here's what Athena told me about the Greek goddess from within my application- "Athena is the Greek goddess of wisdom, war, and crafts. She is often depicted with an owl as her sacred animal, symbolizing her association with wisdom and knowledge. In mythology, Athena is the daughter of Zeus and Metis, and is known for her intelligence, strength, and strategic thinking. She played a key role in the Trojan War and was often consulted by heroes and mortals seeking guidance and advice." Perfect name, indeed.
How I built it
I built Athena as an iOS application. I used the following tools and technology:
- Language: Swift
- UI Framework: SwiftUI (fast prototyping and declarative UI)
- IDE: Xcode
- LLM: Groq + LLama model
- API: DictionaryAPI
- Package Manager: Swift Package Manager (SPM)
- Database: Firebase Firestore
Here are different services I wrote for Athena and the corresponding key frameworks I used:
- SpeechService:
Speech(Apple's Speech framework),AVFoundation(for audio session and recording) - TTSService:
AVFoundation(AVSpeechSynthesizerfor text-to-speech with the voice ofcom.apple.ttsbundle.Samantha-compact - NLPService: Pure Swift (custom classification logic, delegating to
DictionaryServiceorLLMService) - LLMService:
Foundation(forURLSessionnetworking) + Groq API (chat completion withllama-3.1-8b-instant, I got the idea to use Groq from one of the training sessions over the weekend, thank you :) ) - DictionaryService:
Foundation(forURLSessionandCodable) + DictionaryAPI - FirebaseService:
FirebaseFirestore(Firestore database) +Foundation(JSON encode/decode)
Local-First Design:
- Immediate save & history: Every Q&A entry is located in UserDefaults locally, so users never lose progress if offline.
- Sync to Firebase later: When online, entries are uploaded to the cloud for cross-device continually.
- Always-on features offline: TTS (Text-to-Speech), local transcripts, and saved history browsing all work without internet.
- Definitions, LLM responses need internet because of API usage.
Challenges
Some challenges were simple to resolve while some took a very long time. Here are some challenges that I encountered while building Athena:
- Audio session management: AVAudioSession created conflicts between speech recognition and text-to-speech (TTS). Getting the mic to deactivate cleanly after listening and then re-activate for playback required precise timing and session category juggling.
- State synchronization in SwiftUI: Managing the conversation flow with SwiftUI bindings (
@Published) was tricky when I tried to track partial vs. final transcripts from reader speech. I also had to prevent duplicate processing, and handle interruptions (like the reader asking a new question while Athena was still speaking). This made me think carefully about state management and concurrency on the main thread. - API integration and error handling: Two APIs were integrated, DicitonaryAPI for word lookups and Groq's LLama-based LLM for open-ended questions. Handling decoding errors, unexpected JSON formats, and "no result found" were minor challenges while building the application.
- Wake word integration: I wanted to add a feature of a reader calling to Athena by her name, and her listening automatically (Siri-like behavior), but I had to handle raw audio buffers, convert them into required Int16 format, and debug why Athena wasn't triggering reliably. I also had to design where wake word detection stops and Apple's Speech API takes over, making sure there is a smooth hybrid pipeline without conflicts. Unfortunately I was not able to implement this on time, so I have excluded this feature for now.
Accomplishments
I am happy that I was able to tackle most challenges I encountered while building the prototype for Athena:
- Built a complete voice-first assistant in SwiftUI within the hackathon timeline, integrating speech recognition, natural language processing, and text-to-speech functionality.
- Designed modular services so features are decoupled, reusable and extensible.
- Implemented local-first storage strategy so Q&As are stored instantly on-device, ensuring offline resilience, then synced to Firebase for cloud persistence.
- Integrated multiple layers for NLP and AI where queries are routed intelligently to find precise definitions or conversational answers.
- Designed a Polished UI with SwiftUI; added chat-style conversation bubbles, history view with categories (Words, General), and a clean navigation bar.
- Future-proofed for wake word detection (Porcupine) and richer AI integration.
What I learned
I learned how to architect a modular voice assistant by separating concerns into independent services for speech, NLP (AI), storage, and APIs. I gained hands-on experience balancing local-first storage with cloud sync, ensuring both offline reliability and online scalability. Most importantly, I discovered how to polish an AI project end-to-end, from functional pipelines to branding, error handling, and demo readiness. Through this, I saw firsthand how AI-powered voice assistance can make exploring books and knowledge more natural, engaging, and accessible.
What's next for Athena - A Reading Assistant
- Wake Word Activation: Enabling a hands-free “Hey Athena” experience. This is tricky because iOS doesn’t expose Siri’s private APIs-so implementing it requires creative workarounds and careful optimization of always-on listening.
- User Authentication: Adding sign-up/login to let users personalize their history, sync across devices, and securely access saved knowledge.
- Contextual Memory: Remember what the user asked earlier and carry that context forward, so conversations feel more natural and human-like.
- Advanced History Management: Smarter categorization, tagging, and search so users can quickly revisit definitions, phrases, or previous conversations.
- Multi-Modal Chat: Support not only voice but also text-based chat, giving users flexibility in how they interact with Athena.
- Integrate Agentic AI: Solve personalized task-based problems for readers.
- Smarter Recommendations: Suggest books, phrases, or concepts related to what the user is learning-turning Athena into a personalized knowledge coach.

Log in or sign up for Devpost to join the conversation.