How we used MongoDB: MongoDB was used to build a no-sql organization data warehouse for our project. We performed an Extract, Transform, and Load on a comma-separated-value file sourced from Charlottesville PD Crime Data resources using numerous Python libraries, including Pandas (for data manipulation) and Pymongo to integrate MongoDB. The original CSV file is converted to a readable table and uploaded to an Atlas server we created. We then separate it into condensed topical dimensional tables, including tables for location, incident type, and time, which we then perform transformations on and further separate into more condensed information columns for easy querying. The location dimensional table is formed from the merging of data from Cville PD with .txt/.csv files we generated by repetitively running coordinate generation and cluster matching algorithms on our address data. Finally, we create a star schema with fact_crime as a centralized table for easy reading.
Note that we are using real-world data, which has NaN values and invalid Street/Block names.
Log in or sign up for Devpost to join the conversation.