AidLens AI for Social Good

Inspiration

AidLens was inspired by the challenges faced by NGOs, volunteers, and students who often deal with critical information shared through text messages, scanned documents, PDFs, images, or even voice communication. In high-pressure social impact scenarios, unclear or complex information can cause confusion, delays, or unsafe decisions. I wanted to build a single, accessible tool that helps users understand information in any form and act responsibly with confidence.

What it does

AidLens is a Gemini-powered multimodal assistant that helps users understand complex information from text, images, PDFs, and voice input. Users can type, upload documents or images, or speak directly into the app using a microphone. Based on the selected role (NGO worker, volunteer, or student), AidLens generates a structured response with a clear summary, key points, recommended next steps, potential risks, and one clarifying question when needed. This enables faster and safer decision-making in real-world social good scenarios.

How we built it

AidLens is built using the Gemini API to leverage advanced reasoning and multimodal capabilities. The application is developed with Streamlit for rapid prototyping and public deployment. Gemini processes text, image, PDF-extracted content, and voice transcriptions within a unified workflow. Carefully designed prompts ensure role-based, structured, and responsible outputs, while safety considerations are included for sensitive topics such as health or legal guidance.

Challenges we ran into

A key challenge was handling multiple input formats while maintaining consistent and clear outputs. Designing prompts that worked equally well for typed text, document extracts, and voice transcriptions required several iterations. Another challenge was ensuring that responses remained safe, understandable, and actionable without providing misleading advice, especially for sensitive social impact contexts.

Accomplishments that we're proud of

I am proud of building a fully functional, end-to-end multimodal application within a limited timeframe. AidLens successfully integrates text, image, PDF, and voice inputs into a single Gemini-powered workflow. The project demonstrates how advanced AI models can be applied beyond basic chat interfaces to create practical, real-world tools for social good.

What we learned

This project taught me the importance of prompt design, input handling, and responsible AI usage when working with advanced multimodal models like Gemini. I learned how voice and document understanding can significantly improve accessibility and how AI can be designed to support clarity and ethical decision-making in social impact applications.

What's next for AidLens AI for Social Good

Next, I plan to expand AidLens with broader multilingual support, text-to-speech output, and deeper customization for specific NGO workflows such as healthcare triage, education assistance, and disaster response. I also aim to improve document understanding for large and handwritten PDFs to further enhance accessibility and impact.

Built With

gemini3api
github
googleaistudio
pdfplumber
pillow
python
streamlit
streamlit-audiorec

Submitted to

Gemini 3 Hackathon

Created by

I Independently designed and developed AidLens as an end-to-end multimodal AI application. My contribution includes ideation, system design, prompt engineering, and full-stack implementation using the Gemini API and Streamlit. I implemented support for text, image, PDF, and voice inputs (including microphone-based input), handled data preprocessing and input fusion, and ensured structured, role-based, and safety-aware outputs. I also deployed the application publicly, created the demo video, documentation, and managed the complete submission workflow.

Tanya Garg

Updates

Tanya Garg started this project — Feb 09, 2026 11:54 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.