AuraArchive: Transforming Conversations into Knowledge

Inspiration

In today’s information-driven world, knowledge is often captured in audio—lectures, technical discussions, meetings, and voice notes. While audio is rich in insight, it lacks accessibility and structure. Unlike text, spoken conversations are difficult to search, reference, or publish efficiently.

AuraArchive was built to solve this gap. The idea was simple: transform raw audio discussions into structured, searchable, and publish-ready technical articles. Instead of forcing users to manually extract insights from long recordings, AuraArchive automates the entire transformation process while preserving the context and intent of the conversation.

The goal was not just transcription—but meaningful knowledge synthesis.

How We Built It

AuraArchive is designed as a cloud-native, AI-driven processing pipeline that converts unstructured audio into structured blog content while ensuring scalability and seamless mobile accessibility.

Backend Core (Processing Engine)

FastAPI – The Processing Backbone

FastAPI powers the backend due to its high-performance asynchronous architecture. Audio uploads trigger background AI processing tasks using event-driven workflows, ensuring the API remains responsive while heavy AI reasoning occurs independently.

Google Gemini 2.5 Flash – The Intelligence Layer

Gemini processes raw audio files directly and generates structured JSON outputs containing:

Title
Summary
Full Blog Content (Markdown)
External Reference Links

This direct multimodal processing avoids traditional multi-stage pipelines and preserves contextual meaning.

Qdrant Cloud – The Semantic Storage Layer

Qdrant stores generated blog data and enables future scalability through embedding-based semantic search and content retrieval.

Mobile Application (User Experience Layer)

The Android application was developed using modern mobile architecture principles:

Language: Kotlin
UI: Jetpack Compose with Material 3
Architecture: MVVM
Dependency Injection: Dagger Hilt
Networking: Retrofit + OkHttp
Media & Rendering: Coil + Native PDF Export Engine

The app allows users to upload audio, monitor AI processing status, review generated drafts, and export content seamlessly to Google Docs or PDF.

Key System Workflow

Audio Upload → AI Processing → Draft Review → Publish → Public Feed

Each uploaded discussion moves through a lifecycle:

PROCESSING: AI generates structured blog draft
REVIEW_PENDING: Content awaits administrative approval
PUBLISHED: Article becomes publicly accessible

This lifecycle ensures content quality while maintaining automation.

Challenges Faced

1. Distributed ID Synchronization

A major issue occurred due to inconsistent ID generation across system layers. The upload service generated one identifier while the database created another, causing frontend polling failures.

The solution involved establishing the upload identifier as the single system-wide source of truth, ensuring consistent tracking across mobile, backend, and database layers.

2. Enforcing Structured AI Output

Large language models naturally prioritize creative responses, which conflicted with our requirement for strict JSON formatting.

To resolve this, we implemented a dedicated sanitization layer that:

Validates AI responses
Removes formatting inconsistencies
Applies fallback content when fields are missing
Ensures schema-safe JSON before persistence

This dramatically improved reliability and downstream stability.

3. Designing Responsive AI UX

AI processing introduces unavoidable latency. Rather than hiding it, AuraArchive embraces transparent state transitions. Real-time status indicators guide users through processing stages, improving trust and perceived responsiveness.

What We Learned

Multimodal AI Enhances Context Understanding

Processing raw audio directly allows the AI model to capture tone, emphasis, and conversational nuance, producing richer and more accurate written content.

Lifecycle-Driven Architecture Improves Reliability

Treating stored content as a state machine simplified workflow management and reduced edge-case failures across distributed services.

User Experience Extends Beyond Interface Design

Clear feedback loops and workflow visibility proved equally important as performance optimizations in building user confidence.

AuraArchive demonstrates how multimodal AI can transform conversational knowledge into structured, shareable, and searchable content—bridging the gap between spoken discussions and documented intelligence.

Built With

gemini
kotlin
python
qdrant

Updates

Dhairya Pandya started this project — Feb 08, 2026 10:04 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.