Building Doc-Chat: Breaking Language Barriers in Document Interaction
The Inspiration
Our journey to create Doc-Chat began when we witnessed how language barriers prevent many people from accessing important documents. After seeing family members struggle with English medical documents, we envisioned a system that would allow anyone to have a natural conversation with documents in their preferred language.
What We Learned
Building Doc-Chat expanded our expertise in:
- Large Language Models and RAG (Retrieval Augmented Generation)
- Audio processing and real-time speech recognition
- Multi-lingual translation systems
- Voice synthesis and natural language generation
- Frontend development with Streamlit
How We Built It
We designed Doc-Chat with modularity in mind, separating the system into key components:
- Document processing with intelligent chunking and embedding
- Speech recognition and synthesis using Whisper and ElevenLabs
- Language translation through NLLB-200
- RAG-based response generation with Mistral AI
- User interface built with Streamlit for seamless interaction
Challenges We Faced
1. Performance Optimization
Initially, the system had noticeable delays between user input and response. We implemented streaming audio processing and optimized buffer sizes to achieve near real-time interaction.
2. Language Accuracy
Maintaining accuracy across multiple languages was challenging. We solved this by implementing robust language detection and creating a custom preprocessing pipeline for technical terms.
3. Resource Management
Large documents initially strained system resources. We optimized through efficient document chunking and lazy loading techniques.
4. Team Details
Shwetha Krishnamurthy
- MBA Student and experienced Product Manager
- Responsible for RAG Setup and Streamlit Deployment
Rohan Srivastava
- MBA Student and experienced Product Manager
- Responsible for Vector Database, Speech translation, Bot Development
What's Next
We're excited to continue developing Doc-Chat with plans to:
- Add real-time document highlighting
- Support more document formats
- Enhance the conversation memory system
- Improve technical terminology handling
Through building Doc-Chat, we've learned that breaking down language barriers in information access is not just possible, but essential for making knowledge truly accessible to everyone.
PS: We tried our best to deploy it to Streamlit community Cloud, but we kept running into deployment issues and gave up in the interest of time.
Built With
- elevenlabs
- fal.ai
- huggingface
- langchain
- openai
- python
Log in or sign up for Devpost to join the conversation.