Inspiration
Banking data from disparate systems are very important to make decision by applying visualization or Data analytics, predictive analytics to take quick financial decisions. Making data available in the right format on right time is very important.
What it does
Solution is to address the problem by automating the ingestion process by using parameters through API’s which will automatically generate required configurations. Once the required configurations are generated, the data can be moved from different sources to target data lake(e.g.Hive).
How we built it
The ingestion framework is built in such a way that it automatically detects the structure of the data and generate configuration reducing the manual work to get streamlined data using spark streaming with Kafka and SQL configurations.
Challenges we ran into
The generation rate is very high for real-time data. The sources can be sensors, Cloud based applications, files etc.
Accomplishments that we're proud of
Creation of an automation pipeline to handle the real time data ingestion effectively and efficiently.
What we learned
Business data required for the visualization and analytics.
What's next for Automation of Data Ingestion Pipeline for Data Lake
Will complete the overall flow and will add support for multiple source and multiple target including cloud data ingestion with various data structures (Json, Yaml , XML etc.). This framework can expediate the process of cloud migration journey with minimal development effort.
Log in or sign up for Devpost to join the conversation.