🌟 Inspiration

Learning a new language is often hindered by a lack of real-time practice and expensive tutors. We wanted to build a truly immersive, accessible, and intelligent companion that leverages the multimodal power of Google Gemini. Our goal was to create more than just a translator; we built a "self-healing" AI tutor that never goes offline.

🤖 What it does

Arvion Lingua AI is a comprehensive Telegram-based ecosystem for language and programming education. Key features include:

  • Gemini Voice Chat: Full voice-to-voice interaction using Gemini for transcription and gTTS for response synthesis.
  • Smart Model Auto-Ranking: A dynamic system that polls the API key for available models and ranks them by capability (from Gemini 3.1 Pro to Flash variants).
  • Multimodal Learning: Image-to-text (OCR) translation and analysis.
  • Adaptive Curriculum: 13 natural languages and 10 programming languages with tiered levels (A1-C2 / Beginner-Advanced).
  • Interactive Quizzes: AI-generated challenges with real-time feedback and streak tracking.

🛠 How we built it

The project is built on a high-performance asynchronous architecture:

  • Backend: Python 3.8+ with aiogram 3.5.0 for rapid Telegram Bot API interaction.
  • AI Orchestration: A custom GeminiService that manages model lifecycle, context-aware history, and multimodal inputs.
  • Data Persistence: aiosqlite for asynchronous database operations, ensuring zero-block performance.
  • Model Selection Logic: We implemented a sophisticated ranking algorithm that ensures the highest-tier model is always preferred.

🛡 Challenges we faced

The primary challenge was ensuring 100% uptime despite API quota limits or model unavailability. We solved this by developing an Automatic Fallback mechanism. If a high-tier model (like Gemini 3.1 Pro) returns a 429 or 500 error, the system instantly and transparently switches to the next best available model in the ranked queue.

🏆 Accomplishments that we're proud of

  • Self-Healing Architecture: The /models command provides real-time transparency into how the bot manages model priorities and system health.
  • Multimodal Integration: Successfully combining voice transcription, text generation, and image recognition into a single fluid user experience.

📚 What we learned

We gained deep insights into multimodal AI orchestration and the importance of building "resilient AI" that can handle API instability through intelligent fallback logic.

Built With

Share this project:

Updates