Inspiration

Security Analysts are required to be up to date with vast knowledge bases and stay relevant with the latest trends and methods. Accessing and analyzing all this information can become quite tedious and time consuming and can be fatal in the time of adversity.

Keeping this in mind, we as Cybersecurity professional built this custom tool for ease of access to the up to date information in the Cybersecurity landscape.

What it does

ChatBot

The tool can be used as a Chatbot or a report generation software. The chatbot can be primarily used as a QA system to quickly retrieve information.

Report Generation

As a main feature, the tool can be quickly used for report generation especially during incidents. The knowledge graph can be used for retrieval of malwares or common mitigation techniques that can be used to against the observed attacks.

How we built it

Knowledge Graph

We used MITRE as our preliminary data source. The source consists of information of different entities, some of which are:

  • Attack patterns
  • Threat groups
  • Malwares
  • Mitigations

Using the above entities, we constructed the knowledge graph to have nodes which represent the different entities and the edges connecting each of the nodes. We fine-tuned, the graph edges to have node types and edge types encoded in them. This is to ensure that the knowledge triplets have complete information when sent as a context to the LLM.

For example:

  • Stuxnet is a malware and Code signing is an attack pattern that it uses.
  • The nodes her are represented as Stuxnet and Code signing.
  • The edge here is defined as malware used by attack-pattern.
  • So the complete knowledge triplet returned will be : Stuxnet malware used by attack-pattern Code signing. We performed complete analysis into the building of the knowledge graph since the power of our tool lies in the data source.

We built the graph using NetworkX and port it into Neo4j for scalability.

LLM

For the use of LLMs, we initially started of it a Langchain module GraphQAChain. However, do to our complex use case we designed custom wrappers to be able to efficiently query the graph. Furthermore, since one of our intended outcomes was to make the tool product agnostic, we provided for custom LLM wrappers to integrate with different kinds of LLM. Our tool can now be used with different OpenLLM or Langchain LLMs or other in house LLM stores.

Pipeline

  1. User sends query prompt
  2. Use chat history to refactor prompt.
  3. Extract relevant entities from the given prompt.
  4. Smart search in the knowledge graph for relevant nodes and edges (knowledge triplets).
  5. Use the triplets as a context to answer the question.
  6. Update chat history with the question and send response back to the user.

Challenges we ran into

LLM

One of the major challenges was the use of a good enough LLM. We tested our pipeline with OpenAI and LLM models such as LLama, MPT, Dolly. However, quality responses were obtained with OpenAI and LLama to a certain extent.

Prompt engineering.

Efficient prompt engineering was key to the working of the pipeline. Our pipeline makes three requests for a given query and it was imperative that the response from each request was accurate enough to be able to support subsequent requests. With adequate number of trials we were able to narrow down on specific prompt templates.

Knowledge graph query.

The knowledge graph is a powerful tool that can help to query for useful information. However, it can also lead to retrieval of unnecessary information. For example, a given node can be connected to malware and a threat group. If the user query is about malware, the LLM is fed unnecessary information of the threat group. Furthermore, we often faced the problem token limits when querying for more than two levels since the KG extracts all the information.

To solve this problem we crafted custom graph algorithms which works similar to a reward based system. The algorithm queries only those edges and nodes that are queried by the user. This information is extracted from the entity extraction request sent to the LLM. For example a question : "What are the attack patterns used by Stuxnet?", extracts both the edge type attack patterns and Stuxnet.

Accomplishments that we're proud of

  • Our tool can efficiently query for relevant information for the given user query.
  • Efficient retrieval of up to 2/3 levels of the graph querying only useful information. This is essential to reduce extra information from corrupting the context for the LLM response.
  • Platform agnostic without no training overhead.
  • Quick retrieval of similarity index for different entities for analysts

What we learned

  • LLM deployment.
  • RAG applications
  • Knowledge Graphs

What's next for Security Copilot

  • Improve on the Knowledge Graph to include other data sources
  • Entity extraction still works the best works when the words have first letter capitalized. Improve on entity extraction prompt
  • Further tests on other open source LLMs.

Built With

Share this project:

Updates