THEME: GENERATIVE AI

Inspiration

A digital helpdesk is very crucial for any websites, but me and my team mate found that government websites are not much user friendly in searching for a particular information and thus, came the idea of building a chatbot interface to retrieve information from every government released documents, websites, etc. instantly to compensate the time required for manual browsing. Lodging a complaint/grievance is also not straightforward due to poor administration.

What it does

Poocho! is a vibrant chatbot using the power of open LLMs where it stores all the government websites into a database and helps us to retrieve user required information instantly. Additional feature includes grievance redressal mechanism where users can enquire and receive necessary course of action to lodge a complaint in particular organization.

How we built it

We used Embedchain OpenSourceApp uses open source embedding and LLM model. It uses all-MiniLM-L6-v2 from Sentence Transformers library as the embedding model and gpt4all as the LLM. Embedchain abstracts the entire process of loading a dataset, chunking it, creating embeddings, and storing it in a vector database. Data Types Supported: Youtube video, PDF file, Web page, Sitemap, Doc file, Code documentation website loader and Notion

Process Flow: Use the website’s sitemap.xml to query all URLs. Download the HTML of each URL and extract the text only. Split each page’s content into a number of documents using Langchain. Embed each document using Huggingface Transformers API. Create a vector store of these embeddings using ChromaDB. When asking a question, query which documents are most relevant and send them as context to Opensource LLM to ask for a good answer. When answering, provide the documents which were used as the source and the answer.

Challenges we ran into

1) Training of the LLM models was difficult and time consuming 2) Sitemaps of websites were inaccessible and web scraping was complex

Accomplishments that we're proud of

The outlined process flow reflects a systematic approach that harnesses the capabilities of advanced language models, content extraction, and document embeddings to create a comprehensive and efficient Q&A platform. Thus, we successfully built URL based chatbot where we enter the government's website and the model gets trained on it for further question and answering. It is named MeitY bot as in case of Ministry of Electronics and IT and trained on sitemap of meity.gov.in website.

What we learned

Learned about various LLM models and how to build a open source chatbot for websites.

What's next for Poochho! - All in One Govbot

An efficient and effective procedure for addressing grievances demonstrates an administration's accountability, responsiveness, and user-friendliness. Thus, we aim to further integrate this webapp which should be able to understand and process complaints effectively, assign them to the relevant department, and provide citizens with a unique complaint number. Real-time updates on the status of the complaint should be sent to citizens, enabling one-on-one conversations throughout the grievance lifecycle.

Built With

  • chromadb
  • embedchain
  • langchain
  • llama2
  • openai
  • snowflake
  • streamlit
Share this project:

Updates