Anomaly detection in graphs is a critical problem for finding suspicious behavior in innumerable systems, such as intrusion detection, fake ratings, and financial fraud. This has been a well-researched problem with the majority of the proposed approaches focusing on static graphs. However, many real-world graphs are dynamic in nature, and methods based on static connections may miss temporal characteristics of the graphs and anomalies.
Among the methods focusing on dynamic graphs, most of them have edges aggregated into graph snapshots. However, to minimize the effect of malicious activities and start recovery as soon as possible, we need to detect anomalies in real-time or near real-time i.e. to identify whether an incoming edge is anomalous or not, as soon as we receive it. In addition, since the number of vertices can increase as we process the stream of edges, we need an algorithm which uses constant memory in the graph size.
Moreover, fraudulent or anomalous events in many applications occur in microclusters or suddenly arriving groups of suspiciously similar edges e.g. denial of service attacks in network traffic data and lockstep behavior. However, existing methods which process edge streams in an online manner aim to detect individually surprising edges, not microclusters, and can thus miss large amounts of suspicious activity.
What it does
MIDAS, short for Microcluster-Based Detector of Anomalies in Edge Streams, detects microcluster anomalies, or suddenly arriving groups of suspiciously similar edges, in edge streams, using constant time and memory. MIDAS uses unsupervised learning to detect anomalies in a streaming manner in real-time and has become a new baseline. It was designed keeping in mind the way recent sophisticated attacks occur. MIDAS can be used to detect intrusions, Denial of Service (DoS), Distributed Denial of Service (DDoS) attacks, financial fraud and fake ratings. In addition, by using a principled hypothesis testing framework, MIDAS provides theoretical bounds on the false positive probability, which earlier methods do not provide. Also, MIDAS is up to 55% more accurate while being up to 929 times faster than state of the art approaches.
- Finds Anomalies in Dynamic/Time-Evolving Graph: (Intrusion Detection, Fake Ratings, Financial Fraud)
- Detects Microcluster Anomalies (suddenly arriving groups of suspiciously similar edges e.g. DoS attack)
- Theoretical Guarantees on False Positive Probability
- Constant Memory (independent of graph size)
- Constant Update Time (real-time anomaly detection to minimize harm)
- Up to 55% more accurate and 929 times faster than the state of the art approaches
How I built it
MIDAS finds anomalous edges from a dynamic graph in a streaming manner. It combines a chi-squared goodness-of-fit test with the Count-Min-Sketch (CMS) streaming data structures to get an anomaly score for each edge. MIDAS also incorporates temporal and spatial relations to achieve better performance.
Challenges I ran into
Trying to detect anomalies/intrusions in real-time as soon as the edge/communication arrives. This has to be done in constant memory.
- 450+ stars on the Github Repository.
- MIDAS was featured as a top story on KDnuggets and Hacker News.
- MIDAS was covered by AIhub, ACM TechNews, Towards Data Science, Towards AI, Analytics India Magazine, MarkTechPost, insideBIGDATA, MarTech, AI Trends
- MIDAS implementations in C++, Golang, Python, Java, Ruby, Rust, R and Julia.
- Multiple cybersecurity and e-commerce firms want to fine tune MIDAS for their requirements.
What I learned
Streaming Anomaly Detection, cybersecurity, threat detection
What's next for MIDAS
- Deploy within Azure Sentinel
- Distributed version
- Hardware implementation
- Extend to multi-attributed graph and record setting
- Apply in Knowledge Graphs