The inspiration for the project was to design a model that could detect fake loan entries hidden amongst a set of real loan entries. Also, our group was eager to design a dashboard to help see these statistics - many similar services are good at identifying outliers in data but are unfriendly to the user. We wanted businesses to look at and understand fake data immediately because its important to recognize quickly.
What it does
Our project handles back-end and front-end tasks. Specifically, on the back-end, the project uses libraries like Pandas in Python to parse input data from CSV files. Then, after creating histograms and linear regression models that detect outliers on given input, the data is passed to the front-end to display the histogram and present outliers on to the user for an easy experience.
How we built it
We built this application using Python in the back-end. We utilized Pandas for efficiently storing data in DataFrames. Then, we used Numpy and Scikit-Learn for statistical analysis. On the server side, we built the website in HTML/CSS and used Flask and Django to handle events on the website and interaction with other parts of the code. This involved retrieving taking a CSV file from the user, parsing it into a String, running our back-end model, and displaying the results to the user.
Challenges we ran into
There were many front-end and back-end issues, but they ultimately helped us learn. On the front-end, the biggest problem was using Django with the browser to bring this experience to the user. Also, on the back-end, we found using Keras to be an issue during the start of the process, so we had to switch our frameworks mid-way.
Accomplishments that we're proud of
An accomplishment was being able to bring both sides of the development process together. Specifically, creating a UI with a back-end was a painful but rewarding experience. Also, implementing cool machine learning models that could actually find fake data was really exciting.
What we learned
One of our biggest lessons was to use libraries more effectively to tackle the problem at hand. We started creating a machine learning model by using Keras in Python, which turned out to be ineffective to implement what we needed. After much help from the mentors, we played with other libraries that made it easier to implement linear regression, for example.
What's next for Financial Outlier Detection System (FODS)
Eventually, we aim to use a sophisticated statistical tools to analyze the data. For example, a Random Forrest Tree could have been used to identify key characteristics of data, helping us decide our linear regression models before building them. Also, one cool idea is to search for linearly dependent columns in data. They would help find outliers and eliminate trivial or useless variables in new data quickly.