In our day to day life, technology has taken a pace. Every detail is available in our gadgets and they are just one click away. Technology is a boon, but there are some disadvantages too. Why not use our technology to rectify certain things. and with this thought, one idea has striked my mind is using Streamable data to detect any kind of Fraud activity and its prevention mechanism. There are many systems that has built over the decade to prevent fraudulent activity, but none of them uses real time data.
What it does
It uses Streamable data for Fraud Detection. "Streamable data" here refers to real-time data management of data entering a banking system at a given rate and volume (Eg; BKFS,ACBS). Data will be partially processed or audit before entering into in-memory system. Prior to this, transactional data used to be processed in batch and analyzed over certain hours, or a day or a week also depending upon the data involved, systemantic design,and compliance requirements. Accessing Operational data for Fraud Detection application is critical at times. This means that while real-time streaming enables banks to take the integrated customer experience to a new level. Traditional Batch processing will still find a place and banking system should be architected on the assumption that both will be working together since they address different but complementary components of data processing.
How I built it
Our team has designed a Streaming Data Architecture, where it consumes data from different sources and pass the data to Stream Message Broker (we have used Kafka, here). The Message broker will move the data into Streaming Engines (Azure Even Hub and Azure HDI) and then the streamed data will be stored in ADLS Gen2. This streamed data will be used by Machine Learning model to analyze fraud activity.
Challenges I ran into
We haven't used Kafka based storage, as the Kafka storage is 10 times higher than normal cloud storage. Storage is going to be on cloud, so the cost of cloud based storage is expensive. High Latency makes real time analysis difficult.
Accomplishments that I'm proud of
We have come up with multiple designs for Streaming Data architecture, and finally decided to the one. We have considered our cost to implementation to be minimum, as we have refrained from using Kafka storage and used Cloud storage instead.
What I learned
Real time analysis by building Streaming Datawarehouse than age old traditional datawarehouse.
What's next for Streaming Datawarehouse
Automation of data plumbing can be implemented in the future. From table modeling to schema-less development in the future.