Inspiration
Financial fraud detection usually runs in batch mode. Transactions get analyzed hours or days after they happen, and by then the money is gone. With instant payments becoming the norm, I wanted to see if I could build something that catches fraud the moment a transaction comes through. Not tomorrow, not in an hour, but right now.
What it does
SentinelStream watches a live stream of financial transactions and flags fraud in real time. It catches five specific patterns:
• Velocity attacks — someone hammering a stolen card with rapid transactions • Location anomalies — a card used in New York then Tokyo 10 minutes later • Card-not-present clusters — a sudden burst of online purchases across different merchants • Unusual amounts — a transaction way outside a user's normal range • New device patterns — a transaction from a device the account has never used before
Each transaction gets a fraud score, a type classification, and a confidence rating. The dashboard shows a live feed with fraud highlighted in red, plus charts for fraud type breakdown, detection timeline, geographic distribution, and top flagged accounts.
How we built it
I started with the data layer. Built a synthetic transaction generator that produces 50K realistic transactions with about 3% injected fraud. Each transaction has an account ID, timestamp, amount, merchant category, country, device type, and card-present flag. The fraud patterns are seeded with specific logic, like velocity attacks creating 5+ transactions from the same account within 60 seconds.
For features, I built 16 per transaction. Rolling window counts and amounts at 1-min, 5-min, and 1-hour intervals. Amount deviation from the user's average. Country change flag. Distance between consecutive transactions. New merchant and device flags. Time of day and day of week.
The ML pipeline uses an ensemble approach. A Gradient Boosting Classifier handles known fraud patterns it was trained on. An Isolation Forest catches novel anomalies the classifier missed. Both contribute to the final fraud score.
The stream simulator reads transactions in chronological order, maintains per-account sliding windows, and scores each transaction against the model in sequence.
The dashboard is built with Streamlit using a dark theme, Plotly for charts, and speed controls from 1x to 50x.
Challenges we ran into
The biggest problem was lookahead bias. When computing rolling features like "how many transactions did this account make in the last 5 minutes," you naturally want to use the full dataset in pandas. But in stream mode you can't look ahead. I had to build a custom sliding window buffer that only considers past transactions, then make sure the training features matched exactly what the stream would produce.
Class imbalance was the second issue. 97% legit vs 3% fraud means a naive model can just predict "legit" for everything and look great on paper. I handled this by combining supervised (GBC with class weights) with unsupervised (Isolation Forest) detection. The ensemble catches both known patterns and novel anomalies.
Real-time rendering in Streamlit was also surprisingly painful. Streamlit wasn't built for live updates. I ended up using st.empty() placeholders and running the stream on a background thread to keep the dashboard responsive.
Accomplishments that we're proud of
• The dashboard actually looks professional and not like a typical hackathon demo • High precision across all five fraud types, not just the obvious ones • 15 passing tests with decent coverage • Completely self contained. No API keys, no external services. Just pip install and run • The speed controls make demos really impactful. Judges can watch hundreds of transactions get processed in seconds
What we learned Stream-level inference is completely different from batch processing. Feature computation, model serving, and state management all need to work sequentially. You can't just throw a trained model at a stream and expect it to work.
The Isolation Forest plus Gradient Boosting combo worked better than I expected. Supervised models learn patterns but can't generalize to new fraud types. Unsupervised anomaly detectors catch the unexpected but produce more false positives. Together they balance each other out nicely.
What's next for SentinelStream
• Hook it up to real data via Kafka or RabbitMQ instead of synthetic transactions • Add an interactive map for geographic fraud visualization • Build per-account behavior profiles that learn what "normal" looks like over time • Add configurable alert thresholds and notification channels • Expand from 5 to 15+ fraud patterns, including account takeover and synthetic identity fraud
Built With
- isolationforest)
- isolationforest)-?-streamlit-(interactive-dashboard)-?-plotly-(real-time-charts)-?-pandas
- numpy-(data-processing)
- pandas
- plotly-(real-time-charts)
- pytest
- python-3.11
- scikit-learn-(gradientboostingclassifier
- streamlit-(interactive-dashboard)
Log in or sign up for Devpost to join the conversation.