About the Project: VisionTriage AI 🧠 Inspiration Our project was inspired by the biggest bottleneck in local governance: manual report triage. City staff spend crucial time interpreting vague photos and text reports (e.g., "Big crack on my street"), which delays resolution and wastes resources. We recognized that the task of combining visual evidence (photo) with descriptive text (context) is a perfect job for a multimodal AI. Our goal was to create a system that automates this interpretation, delivering a prioritized work order the moment a citizen clicks submit.
🛠️ How We Built It We built the application under a strict time constraint using a modern, scalable FastAPI (Python) + React architecture.
AI Triage Engine (Gemini API Core): The high-value logic resides in a protected FastAPI endpoint. We used the GemINI 2.5 Flash model with a hardcoded JSON Schema to force the AI to return standardized data, instantly classifying the report into fields like severity and department. This eliminates human sorting.
Multimodal Handling: The React frontend uses FormData to securely send the image file bytes and text to the backend. The FastAPI server handles the file and constructs the final multimodal prompt for Gemini.
Role-Based Access Control (RBAC): We implemented token-based authentication in FastAPI to strictly control access:
Users can Report Issues (/report).
Admins can View the Dashboard (/admin) and use the Chatbot.
Analytical Chatbot: On the Admin Dashboard, the AI Chatbot runs a secondary Gemini API query, analyzing the list of previous reports as context to answer administrative questions (e.g., "Which department has the most critical issues?").
Built With
- ai-/-machine-learning-gemini-api-(gemini-2.5-flash)
- authentication-custom
- axios
- css
- google-genai-sdk-(python)-backend-&-api-fastapi-(python)
- management
- rbac)
- react-router-dom
- session
- styling-custom
- token-based
- uvicorn-(asgi-server)-frontend-&-ui-react
Log in or sign up for Devpost to join the conversation.