AuraArchive: Transforming Conversations into Knowledge
Inspiration
In today’s information-driven world, knowledge is often captured in audio—lectures, technical discussions, meetings, and voice notes. While audio is rich in insight, it lacks accessibility and structure. Unlike text, spoken conversations are difficult to search, reference, or publish efficiently.
AuraArchive was built to solve this gap. The idea was simple: transform raw audio discussions into structured, searchable, and publish-ready technical articles. Instead of forcing users to manually extract insights from long recordings, AuraArchive automates the entire transformation process while preserving the context and intent of the conversation.
The goal was not just transcription—but meaningful knowledge synthesis.
How We Built It
AuraArchive is designed as a cloud-native, AI-driven processing pipeline that converts unstructured audio into structured blog content while ensuring scalability and seamless mobile accessibility.
Backend Core (Processing Engine)
FastAPI – The Processing Backbone
FastAPI powers the backend due to its high-performance asynchronous architecture. Audio uploads trigger background AI processing tasks using event-driven workflows, ensuring the API remains responsive while heavy AI reasoning occurs independently.
Google Gemini 2.5 Flash – The Intelligence Layer
Gemini processes raw audio files directly and generates structured JSON outputs containing:
- Title
- Summary
- Full Blog Content (Markdown)
- External Reference Links
This direct multimodal processing avoids traditional multi-stage pipelines and preserves contextual meaning.
Qdrant Cloud – The Semantic Storage Layer
Qdrant stores generated blog data and enables future scalability through embedding-based semantic search and content retrieval.
Mobile Application (User Experience Layer)
The Android application was developed using modern mobile architecture principles:
- Language: Kotlin
- UI: Jetpack Compose with Material 3
- Architecture: MVVM
- Dependency Injection: Dagger Hilt
- Networking: Retrofit + OkHttp
- Media & Rendering: Coil + Native PDF Export Engine
The app allows users to upload audio, monitor AI processing status, review generated drafts, and export content seamlessly to Google Docs or PDF.
Key System Workflow
Audio Upload → AI Processing → Draft Review → Publish → Public Feed
Each uploaded discussion moves through a lifecycle:
- PROCESSING: AI generates structured blog draft
- REVIEW_PENDING: Content awaits administrative approval
- PUBLISHED: Article becomes publicly accessible
This lifecycle ensures content quality while maintaining automation.
Challenges Faced
1. Distributed ID Synchronization
A major issue occurred due to inconsistent ID generation across system layers. The upload service generated one identifier while the database created another, causing frontend polling failures.
The solution involved establishing the upload identifier as the single system-wide source of truth, ensuring consistent tracking across mobile, backend, and database layers.
2. Enforcing Structured AI Output
Large language models naturally prioritize creative responses, which conflicted with our requirement for strict JSON formatting.
To resolve this, we implemented a dedicated sanitization layer that:
- Validates AI responses
- Removes formatting inconsistencies
- Applies fallback content when fields are missing
- Ensures schema-safe JSON before persistence
This dramatically improved reliability and downstream stability.
3. Designing Responsive AI UX
AI processing introduces unavoidable latency. Rather than hiding it, AuraArchive embraces transparent state transitions. Real-time status indicators guide users through processing stages, improving trust and perceived responsiveness.
What We Learned
Multimodal AI Enhances Context Understanding
Processing raw audio directly allows the AI model to capture tone, emphasis, and conversational nuance, producing richer and more accurate written content.
Lifecycle-Driven Architecture Improves Reliability
Treating stored content as a state machine simplified workflow management and reduced edge-case failures across distributed services.
User Experience Extends Beyond Interface Design
Clear feedback loops and workflow visibility proved equally important as performance optimizations in building user confidence.
AuraArchive demonstrates how multimodal AI can transform conversational knowledge into structured, shareable, and searchable content—bridging the gap between spoken discussions and documented intelligence.
Log in or sign up for Devpost to join the conversation.