Inspiration
With the proliferation of digital content, there's a growing risk of encountering misinformation, biased narratives, and potentially dangerous propaganda. To tackle this, we've developed a question-answering pipeline that leverages trained documents to ensure the information disseminated is accurate and trustworthy.
What it does
SecureGPT is an advanced chatbot that employs a hybrid approach combining semantic search and generative AI to answer queries. Utilizing a trained corpus of 1,500 documents, each about 300 words, this bot links contextual data together, enabling seamless verification of facts and information.
How we built it
We utilized HuggingFace transformers to power our question-answering pipeline, with a front-end interface built using Streamlit. The primary model for Named Entity Recognition (NER) is based on Spacy, supplemented by a finetuned Dilbert model tailored to our specific corpus.
How is the application suited to the problem statement
The training corpus consists of web scraped unstructured news articles on terrorism and related keywords, dating back 1 year from today. The feature extractor (the first stage LLM) was built using Spacy's tokenizer which classified the entities, allowing us to build the knowledge graph. The second stage LLM, the question answering model was done using Dilbert finetuned on the knowledge graph provided by the primary stage LLM.
Integrating Semantic Search and Generative AI Recognizing the limitations of standalone generative AI models in handling large documents, we incorporated semantic search to improve accuracy and efficiency. Our process involves:
*Document Embeddings: Transforming internal documents into vector embeddings. * Semantic Search: Using these embeddings, we execute semantic similarity searches to identify the most relevant document segments in response to queries.
This method ensures a robust response mechanism, minimizing AI hallucinations and inaccuracies common in fine-tuned generative models.
Challenges we ran into
Integrating the NER model with our knowledge graph proved challenging, necessitating the integration of semantic search and embeddings to enhance performance significantly.
Accomplishments that we're proud of
Our system can respond to inquiries within the trained contexts, demonstrating the effectiveness of combining multiple AI technologies to improve information reliability.
What we learned
The project deepened our understanding of tools like HuggingFace and the practical application of NER models in real-world scenarios.
What's next for SecureGPT
Future enhancements will focus on expanding our bot’s capabilities to crawl the web actively, identifying and flagging potentially dangerous websites. This proactive approach will further our goal of safeguarding information integrity across digital platforms.
Built With
- huggingface
- streamlit
Log in or sign up for Devpost to join the conversation.