Inspiration

AidLens was inspired by real challenges faced in social impact environments where critical information is often shared through long texts, scanned documents, PDFs, or spoken instructions. NGOs, volunteers, and students frequently struggle to interpret complex or unclear information quickly, which can lead to delays, confusion, or incorrect decisions. We wanted to build a tool that could simplify understanding across multiple formats and help people act responsibly and confidently in real-world situations.

What it does

AidLens is a multimodal AI assistant powered by the Gemini API that helps users understand complex information from text, images, PDFs, and voice input. Users can type, upload documents, or speak directly into the system. Based on the selected role (NGO worker, volunteer, or student), the app generates structured outputs including a concise summary, key points, recommended next steps, possible risks, and a clarifying question when needed. The goal is to transform raw information into clear, actionable guidance.

How we built it

AidLens was built using Python and Streamlit for rapid development and deployment. The Gemini API is used as the core reasoning engine, enabling multimodal processing across different input types. PDF content is extracted and combined with text inputs, images are processed for contextual understanding, and voice input is transcribed before analysis. Carefully designed prompts guide Gemini to produce structured, role-specific, and safety-aware outputs.

Challenges we ran into

One major challenge was ensuring consistent and structured outputs regardless of input format. Designing prompts that worked reliably across text, documents, and voice transcripts required multiple iterations and testing. Another challenge was maintaining clarity and safety while handling sensitive topics such as health or legal information. We also had to optimize performance to keep responses fast while processing multimodal data.

Accomplishments that we're proud of

We successfully built a fully functional end-to-end multimodal AI system within a limited timeframe. AidLens demonstrates advanced reasoning, structured output generation, and real-world applicability. Unlike basic chatbots, it integrates multiple input modalities into a single intelligent workflow. We are especially proud that the system provides practical decision-support rather than just information.

What we learned

This project taught us how powerful multimodal AI can be when applied thoughtfully. We learned that prompt engineering is crucial for reliability and clarity, and that user-centric design greatly improves accessibility. We also gained deeper insight into how advanced AI models like Gemini can be used responsibly to assist decision-making in real-world scenarios.

What's next for AidLens – AI for Social Good

We plan to expand AidLens with multilingual support, text-to-speech output, and deeper customization for domain-specific workflows such as healthcare triage, education assistance, and disaster response coordination. Future versions will also improve document understanding for large or handwritten PDFs and add intelligent automation features to further enhance real-world impact.

Built With

  • gemini-api
  • github
  • pdfplumber
  • pillow
  • prompt-engineering
  • python
  • streamlit
  • streamlit-audiorec
  • streamlit-cloud
Share this project:

Updates