Inspiration
AidLens was inspired by the challenges faced by NGOs, volunteers, and students who often deal with critical information shared through text messages, scanned documents, PDFs, images, or even voice communication. In high-pressure social impact scenarios, unclear or complex information can cause confusion, delays, or unsafe decisions. I wanted to build a single, accessible tool that helps users understand information in any form and act responsibly with confidence.
What it does
AidLens is a Gemini-powered multimodal assistant that helps users understand complex information from text, images, PDFs, and voice input. Users can type, upload documents or images, or speak directly into the app using a microphone. Based on the selected role (NGO worker, volunteer, or student), AidLens generates a structured response with a clear summary, key points, recommended next steps, potential risks, and one clarifying question when needed. This enables faster and safer decision-making in real-world social good scenarios.
How we built it
AidLens is built using the Gemini API to leverage advanced reasoning and multimodal capabilities. The application is developed with Streamlit for rapid prototyping and public deployment. Gemini processes text, image, PDF-extracted content, and voice transcriptions within a unified workflow. Carefully designed prompts ensure role-based, structured, and responsible outputs, while safety considerations are included for sensitive topics such as health or legal guidance.
Challenges we ran into
A key challenge was handling multiple input formats while maintaining consistent and clear outputs. Designing prompts that worked equally well for typed text, document extracts, and voice transcriptions required several iterations. Another challenge was ensuring that responses remained safe, understandable, and actionable without providing misleading advice, especially for sensitive social impact contexts.
Accomplishments that we're proud of
I am proud of building a fully functional, end-to-end multimodal application within a limited timeframe. AidLens successfully integrates text, image, PDF, and voice inputs into a single Gemini-powered workflow. The project demonstrates how advanced AI models can be applied beyond basic chat interfaces to create practical, real-world tools for social good.
What we learned
This project taught me the importance of prompt design, input handling, and responsible AI usage when working with advanced multimodal models like Gemini. I learned how voice and document understanding can significantly improve accessibility and how AI can be designed to support clarity and ethical decision-making in social impact applications.
What's next for AidLens AI for Social Good
Next, I plan to expand AidLens with broader multilingual support, text-to-speech output, and deeper customization for specific NGO workflows such as healthcare triage, education assistance, and disaster response. I also aim to improve document understanding for large and handwritten PDFs to further enhance accessibility and impact.
Log in or sign up for Devpost to join the conversation.