AudioX11AI

🚀 Inspiration In an increasingly multitasking world, we often don’t have time to read long documents, watch full videos, or go through lengthy audio recordings. We wanted to build something that gives people their time back—by allowing them to listen to any content, summarize it, and translate it, all in a few clicks. With the power of natural-sounding AI voices and offline-friendly summarization, Audix11AI was born to make information consumption hands-free, multilingual, and accessible.

🧠 What it does Audix11AI is an AI-powered assistant that:

Reads aloud PDF, Word, and text scripts in natural ElevenLabs voices.

Transcribes audio and video files to clean readable text.

Summarizes long documents or transcripts without external AI APIs, using the offline sumy library.

Translates both text and spoken voice across multiple languages.

Offers premium features via RevenueCat, including pro voices, unlimited file uploads, and multi-language audio.

🏗️ How we built it Frontend: Built using React.js for responsive UI and smooth user interaction.

Backend: Powered by Python with Flask/FastAPI for file processing and routing.

Text Parsing: Used pdfplumber, python-docx, and raw text input handling.

TTS: Integrated with ElevenLabs API for generating realistic voice narration.

Transcription: Used ffmpeg to extract audio and speechrecognition (with Whisper) for local speech-to-text.

Summarization: Used sumy with LexRank method to generate extractive summaries offline.

Translation: Integrated googletrans for multilingual support.

Premium Handling: Integrated RevenueCat to manage premium limits, voice access, and paywall features.

🧩 Challenges we ran into

I ran out of tokens so could not integrate the elevenlabs api with the project.

Handling large document files and cleaning the extracted text consistently.

Maintaining language compatibility across voice, translation, and transcription workflows.

Implementing summarization locally without relying on costly APIs.

Ensuring audio syncing with long-form narration for natural playback.

Learning to set up RevenueCat properly across platforms for monetization.

🏅 Accomplishments that we're proud of A fully working prototype that reads, summarizes, transcribes, and translates across formats.

Offline summarization using Sumy that doesn't rely on expensive or rate-limited APIs.

Seamless integration with ElevenLabs for lifelike multilingual TTS.

Flexible file handling across PDF, DOCX, MP3, MP4, and raw text input.

Monetization-ready architecture using RevenueCat.

📚 What we learned How to use and combine multiple Python libraries (like Sumy, ffmpeg, and ElevenLabs API) for real-world media applications.

Handling multimodal input (text, audio, video) and syncing outputs efficiently.

Strategies for building offline-friendly AI tools using local summarization methods.

How to integrate and manage subscription-based features using RevenueCat.

UX insights around voice playback, translation toggles, and language preferences.

🔮 What's next for Audix11AI Add real-time summarization while listening (live TTS + summary sync).

Enable voice cloning or custom voice input for personalization.

Add cloud storage integration (Google Drive, Dropbox) for quick file access.

Release a mobile version using Flutter with full cross-platform support.

Offer custom summaries (bullet points, question-generation, etc.).

Introduce offline mode with Whisper and ElevenLabs Lite (via caching).

Built With

Updates

SHREYAS Nikam started this project — Jun 30, 2025 03:04 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.