A Multimodal Smart Receptionist

Inspiration

In an age where automation is revolutionizing industries, we noticed a common bottleneck in offices, institutes, and event spaces: the traditional human receptionist. Delays in identification, language limitations, and inconsistent service can leave a poor first impression. We imagined a futuristic AI-powered receptionist — one that’s always alert, fluent in multiple languages, capable of recognizing faces, understanding voices, and handling queries in real time. Thus, the Multimodal Smart Receptionist was born.

What it does

Detects and recognizes faces using advanced facial recognition (128D facial embeddings).
Authenticates and stores new visitors securely in a database.
Understands and responds to human speech in both English and Hindi.
Answers queries, performs data operations, and supports multimodal interaction (voice, text, vision).
Offers weather and news updates, manages tasks, and provides personalized interaction.
Ensures security by preventing spoofing or impersonation. ## How we built it Frontend: Built using HTML, CSS, JavaScript with Flask templating to ensure smooth user experience and real-time interaction. Backend (Python): Powered by OpenCV, dlib for face recognition, scikit-learn for SVM classification, pandas and numpy for database management. Speech: Implemented real-time speech recognition (Google Speech API) and text-to-speech (pyttsx3). Multilingual NLP: Integrated translation and understanding in Hindi and English. Database: CSV/JSON files for efficient data storage and updates. AI Integration: Used LLaMA-2 for knowledge-based interaction. ## Challenges we ran into
Ensuring accurate face recognition across different lighting, poses, and expressions.
Maintaining real-time speech interaction with low latency.
Handling dual-language support and seamless transitions between English and Hindi.
Synchronizing UI and backend for dynamic updates and smooth animations.
Dealing with Windows performance issues, especially on lower-end devices. ## Accomplishments that we're proud of
Achieved 98.69% face recognition accuracy across varied backgrounds and age groups.
Enabled a fully functional dual-language interaction model.
Integrated real-time multimodal communication with a smooth and stylish UI.
Built a deployable, scalable prototype that can be used as a receptionist, smart door lock, attendance system, and more. ## What we learned
Deepened our knowledge of facial embeddings and vector space classification.
Understood the importance of human-centered UI/UX design in AI applications.
Mastered real-time integration of voice, vision, and language technologies.
Gained experience with AI model deployment and optimization on edge devices. ## What's next for A Multimodal Smart Receptionist
Add OTP/email-based authentication and multi-factor security for high-security zones.
GUI-based control panel for admins to manage logs, faces, and analytics.
Integration with smart home/office systems (IoT).
Add support for regional Indian languages and gesture recognition.
Cloud-based face storage and syncing to make it truly plug-and-play.
Deploy on web & mobile platforms for broader accessibility.

Built With

bootstrap
built-with-python
css
csv/json-based
dlib
flask
google-speech-api
google-translate-api
html
javascript
llama-2
newsapi
numpy
opencv
openweathermap-api
pandas
pyttsx3
scikit-learn
speechrecognition

Updates

Akriti Jha started this project — Apr 08, 2025 02:26 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.