THEME: GENERATIVE AI
Inspiration
A digital helpdesk is very crucial for any websites, but me and my team mate found that government websites are not much user friendly in searching for a particular information and thus, came the idea of building a chatbot interface to retrieve information from every government released documents, websites, etc. instantly to compensate the time required for manual browsing. Lodging a complaint/grievance is also not straightforward due to poor administration.
What it does
Poocho! is a vibrant chatbot using the power of open LLMs where it stores all the government websites into a database and helps us to retrieve user required information instantly. Additional feature includes grievance redressal mechanism where users can enquire and receive necessary course of action to lodge a complaint in particular organization.
How we built it
We used Embedchain OpenSourceApp uses open source embedding and LLM model. It uses all-MiniLM-L6-v2 from Sentence Transformers library as the embedding model and gpt4all as the LLM. Embedchain abstracts the entire process of loading a dataset, chunking it, creating embeddings, and storing it in a vector database. Data Types Supported: Youtube video, PDF file, Web page, Sitemap, Doc file, Code documentation website loader and Notion
Process Flow: Use the website’s sitemap.xml to query all URLs. Download the HTML of each URL and extract the text only. Split each page’s content into a number of documents using Langchain. Embed each document using Huggingface Transformers API. Create a vector store of these embeddings using ChromaDB. When asking a question, query which documents are most relevant and send them as context to Opensource LLM to ask for a good answer. When answering, provide the documents which were used as the source and the answer.
Challenges we ran into
1) Training of the LLM models was difficult and time consuming 2) Sitemaps of websites were inaccessible and web scraping was complex
Accomplishments that we're proud of
The outlined process flow reflects a systematic approach that harnesses the capabilities of advanced language models, content extraction, and document embeddings to create a comprehensive and efficient Q&A platform. Thus, we successfully built URL based chatbot where we enter the government's website and the model gets trained on it for further question and answering. It is named MeitY bot as in case of Ministry of Electronics and IT and trained on sitemap of meity.gov.in website.
What we learned
Learned about various LLM models and how to build a open source chatbot for websites.
What's next for Poochho! - All in One Govbot
An efficient and effective procedure for addressing grievances demonstrates an administration's accountability, responsiveness, and user-friendliness. Thus, we aim to further integrate this webapp which should be able to understand and process complaints effectively, assign them to the relevant department, and provide citizens with a unique complaint number. Real-time updates on the status of the complaint should be sent to citizens, enabling one-on-one conversations throughout the grievance lifecycle.
Built With
- chromadb
- embedchain
- langchain
- llama2
- openai
- snowflake
- streamlit
Log in or sign up for Devpost to join the conversation.