ATLAS VISION

Inspiration

Living in a country as historically rich as Morocco, we noticed a profound disconnect in modern tourism. People often stand before majestic structures—ancient medina walls, towering minarets, or detailed mosaics—without truly understanding what they are looking at. Generic object detection apps might simply label a monument as a "tower" or "building," while standard search engines require you to already know the name of the place to learn about it. We wanted to solve this "context gap." We were inspired by the idea of valorizing cultural heritage by making it accessible, interactive, and educational. We wanted to build a digital companion that acts less like a search bar and more like a knowledgeable local guide who is always with you.

What it does

Atlas Vision is an intelligent cultural guide that bridges the gap between sight and understanding. Instant Recognition: Users simply upload or snap a photo of a monument. Hybrid Intelligence: The system uses Computer Vision to identify the specific landmark (e.g., "Hassan II Mosque" or "Bab Bou Jeloud") with high precision. Interactive Storytelling: It doesn't just give you a name; it opens a chat interface powered by Generative AI. Users can ask questions like "When was this built?", "What do the mosaics signify?", or "Tell me a secret about this place," and receive accurate, context-aware answers.

How we built it:

We architected a Hybrid AI System composed of two distinct phases: Perception and Cognition. The Perception Layer (Vision): We used Azure Custom Vision to train a specialized model. We curated a dataset of specific Moroccan landmarks to ensure the model could distinguish between locally similar architectural styles, which generic models often fail to do. The Cognition Layer (Intelligence): We built a backend using FastAPI (Python) that acts as the orchestrator. Once the vision model identifies the monument, that metadata is passed to a multi-LLM pipeline utilizing Google Gemini and Mistral AI. The Full Stack: Frontend: React.js (Vite) and Tailwind CSS for a responsive mobile-first experience. Deployment: The application is containerized with Docker and hosted on Hugging Face Spaces, while the frontend is deployed on Netlify.

Challenges we ran into

The Hallucination Risk: Generative AI can sometimes "hallucinate" facts. If a user uploaded a photo of the Koutoubia, we couldn't risk the AI describing the Eiffel Tower. We solved this by using the Vision model's output as a hard constraint—grounding the LLM's context before it is allowed to generate a response. Data Scarcity: Finding high-quality, labeled images for specific niche heritage sites was difficult. We had to manually curate and label our own dataset to ensure the Azure model achieved high confidence. Integration Latency: Chaining a Vision API, a backend, and an LLM can create lag. We had to optimize our FastAPI architecture to handle requests asynchronously to keep the UI snappy.

Accomplishments that we're proud of

Successful Hybrid Integration: We are proud of successfully merging a deterministic model (Computer Vision) with a probabilistic model (LLM) to create a tool that is both accurate and conversational. Promoting Local Heritage: Creating a functional tool that specifically highlights Moroccan culture feels like a meaningful contribution to digital preservation. Full Pipeline Deployment: Going from a local Python script to a fully Dockerized, cloud-hosted application accessible via a live URL was a major technical milestone for the team.

What we learned

AI Orchestration: We learned that the true power of AI often lies in the pipeline connecting different models rather than in a single model itself. Prompt Engineering: We gained significant experience in structuring system prompts to ensure the LLM acts as a "historian" rather than a generic chatbot. Containerization: We deepened our understanding of Docker and cloud environments (Hugging Face/Netlify) to manage production-level deployments.

What's next for ATLAS VISION:

Voice Interaction: Adding Text-to-Speech (TTS) and Speech-to-Text (STT) so users can talk to the guide via audio, making the experience hands-free while walking. Augmented Reality (AR): Implementing an AR view where users can point their camera at a monument and see historical overlays or restoration views of ruins in real-time. Expansion: Expanding the dataset beyond Morocco to cover heritage sites across North Africa and the Mediterranean. Mobile App: Porting the React frontend to React Native for a native mobile experience with offline capabilities.

Built With

custom
docker
fastapi
gemini
google
react
this-project-leverages-a-hybrid-ai-pipeline-combining-azure-custom-vision-for-detection-and-google-gemini-for-reasoning

Updates

Imane ABDELJALILI started this project — Jan 16, 2026 02:15 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.