Inspiration

Every business owner, like a mechanic moving to Rancho Santa Margarita or pet-lover wanting to open up a Pet Store, faces the high-stakes decision of where to set up shop. We want to prevent businesses from failing simply because they chose the wrong spot.

What it does

BUSI takes a user's business idea and calculates a viability score Y for each latitude and longitude within a certain area using weighted parameters. It searches commercial zoning within 50 miles of residential targets using the Overpass API to find the best physical location.

Data Cleaning

We cleaned the Melissa ConsumerData dataset, reducing 44 columns down to the 10 most important parameters, and stored it in an SQLite3 database. We have also cleaned two separate datasets for Orange County which contain detailed zipcode based data and housing statistics, reducing 418 columns of data down to 40 by removing percentage based columns and maintaining raw number instead.

How we built it

Our Prediction model which is the core of our project relies on this formula, a modified logistic regression model. $$ Y = \sum_{i=0}^{n} \left(a_{i,0} + a_{i,1}\right) \cdot y_i $$ $$ y_i = \sum_{j=0}^{k} x_j $$

Y is a viability score y_i are the individual parameters x_i are each parameter converted into a categorical variable. For example, for parameter y_i = Income, x_0 is the base case, x_1=(income>$100k annually), x_2=(income>$200k annually) This combines standardized weights with supplementary LLM-generated weights to correlate parameters like income or hobbies to business needs.

Then we pick the 5 most viable latitude/longitude locations, and find the commercial zoning areas near it using the Overpass API and calculate the closest commercial zone using the Haversine formula.

Challenges we ran into

Developing our custom prediction model was hard and required us to convert many quantitive variables into qualitative to create our ranking algorithm. Data cleaning was another major hurdle, requiring us to filter out nulls and identify useful parameters from massive datasets. We also had to double-check specific parameters like "dogOwner" within the Melissa dataset to ensure accuracy. Additionally there were multiple duplicate latitude and longitude coordinates

Accomplishments that we're proud of

We are proud of our model's accuracy; in our mechanic case study, the coordinates we predicted were right next to existing car-related stores. We successfully utilized an API that isolates parameters and weights them based on importance.

What we learned

We first learned how to use logistic regression to create custom prediction model and how to clean very big datasets. We also learned how to use the Haversine formula for distance calculations and the Overpass API for zoning.

What's next for BUSI

We plan to scale BUSI to cover the entire Orange County area, moving from specific coordinates to broader ZIP code analysis using our cleaned datasets which can be merged into the main melissa dataset based prediction model.

Built With

  • gemini
  • melissa-consumer-data
  • oc-housing-detailed-dataset
  • orange-county-home-prices-latest-dataset
  • python
Share this project:

Updates