Inspiration
We noticed a lot of frustration many immigrants face when trying to understand lengthy and complex insurance forms. We wanted to create a tool that would simplify the process by summarizing each form section, making it easier for users to understand what is required and how to fill out the document properly using NLP and AI processes. Our goal is to make paperwork less daunting and more approachable.
What it does
DocuDane uses Google Cloud and its Vertex API to extract text from insurance forms and generate concise, easy-to-understand summaries for each form box/section. The tool breaks down the content and identifies key fields, providing explanations in a human-friendly manner. Each block of text is numbered on the form so users can easily correlate the summary with the corresponding section.
How we built it
We built DocuDane by combining two powerful technologies: Google Cloud Vision API to extract text from image-based forms and Google Vertex AI's generative models to summarize and simplify the content. Using OpenCV and the Python environment, we visually highlight form sections and pair each summary with its corresponding box in the form, providing an intuitive user experience.
Challenges we ran into
One of the major challenges was dealing with sensitive or complex text like personal information or ambiguous terms that AI models might misinterpret. Another challenge was ensuring that the summaries were both concise and informative, without losing important details required for filling out the forms accurately. Currently the project is not perfect, and still parses erroneous text from the form images on occasion and mistaking it as a valid form field. The biggest one though was managing how to scale the project, since we initially started out wanting to make a mobile cross-platform app utilizing the device's camera.
Accomplishments that we're proud of
We successfully integrated advanced AI models to solve a real-world problem in a way that’s practical and easy to use. We’re proud of how we were able to handle complex insurance forms and make the information significantly more digestible for the average user. The tool also provides a seamless user experience by visually guiding users through each form block with clear labeling and summaries.
What we learned
We learned a great deal about how to effectively leverage AI for natural language processing and document parsing. It was also an eye-opener in understanding how to manage sensitive text content and ensure that safety filters are respected while providing valuable, understandable information to users. We also learned the importance of knowing when to lower expectations and scale down to manageable goals and features.
What's next for DocuDane
Next, we plan to expand DocuDane to handle a wider variety of forms, including legal and medical documents, and implement multi-language support. We also aim to improve the AI's ability to summarize more nuanced form sections and explore options for integrating user feedback directly into the summaries, ensuring an even more personalized and accurate experience. We would also like to port this to a cross-platform mobile framework for wider usage through frameworks such as Kivy, React Native, or Flutter.
Built With
- ai
- gemini
- google-cloud
- google-vertex
- natural-language-processing
- opencv
- python
Log in or sign up for Devpost to join the conversation.