Inspiration
In an age where automation is revolutionizing industries, we noticed a common bottleneck in offices, institutes, and event spaces: the traditional human receptionist. Delays in identification, language limitations, and inconsistent service can leave a poor first impression. We imagined a futuristic AI-powered receptionist — one that’s always alert, fluent in multiple languages, capable of recognizing faces, understanding voices, and handling queries in real time. Thus, the Multimodal Smart Receptionist was born.
What it does
- Detects and recognizes faces using advanced facial recognition (128D facial embeddings).
- Authenticates and stores new visitors securely in a database.
- Understands and responds to human speech in both English and Hindi.
- Answers queries, performs data operations, and supports multimodal interaction (voice, text, vision).
- Offers weather and news updates, manages tasks, and provides personalized interaction.
- Ensures security by preventing spoofing or impersonation. ## How we built it Frontend: Built using HTML, CSS, JavaScript with Flask templating to ensure smooth user experience and real-time interaction. Backend (Python): Powered by OpenCV, dlib for face recognition, scikit-learn for SVM classification, pandas and numpy for database management. Speech: Implemented real-time speech recognition (Google Speech API) and text-to-speech (pyttsx3). Multilingual NLP: Integrated translation and understanding in Hindi and English. Database: CSV/JSON files for efficient data storage and updates. AI Integration: Used LLaMA-2 for knowledge-based interaction. ## Challenges we ran into
- Ensuring accurate face recognition across different lighting, poses, and expressions.
- Maintaining real-time speech interaction with low latency.
- Handling dual-language support and seamless transitions between English and Hindi.
- Synchronizing UI and backend for dynamic updates and smooth animations.
- Dealing with Windows performance issues, especially on lower-end devices. ## Accomplishments that we're proud of
- Achieved 98.69% face recognition accuracy across varied backgrounds and age groups.
- Enabled a fully functional dual-language interaction model.
- Integrated real-time multimodal communication with a smooth and stylish UI.
- Built a deployable, scalable prototype that can be used as a receptionist, smart door lock, attendance system, and more. ## What we learned
- Deepened our knowledge of facial embeddings and vector space classification.
- Understood the importance of human-centered UI/UX design in AI applications.
- Mastered real-time integration of voice, vision, and language technologies.
- Gained experience with AI model deployment and optimization on edge devices. ## What's next for A Multimodal Smart Receptionist
- Add OTP/email-based authentication and multi-factor security for high-security zones.
- GUI-based control panel for admins to manage logs, faces, and analytics.
- Integration with smart home/office systems (IoT).
- Add support for regional Indian languages and gesture recognition.
- Cloud-based face storage and syncing to make it truly plug-and-play.
- Deploy on web & mobile platforms for broader accessibility.
Built With
- bootstrap
- built-with-python
- css
- csv/json-based
- dlib
- flask
- google-speech-api
- google-translate-api
- html
- javascript
- llama-2
- newsapi
- numpy
- opencv
- openweathermap-api
- pandas
- pyttsx3
- scikit-learn
- speechrecognition
Log in or sign up for Devpost to join the conversation.