Real Estate Platypuses

Inspiration

We selected the Melissa consumer and property assessment datasets because they offered a rare opportunity to examine the relationship between personal lifestyle characteristics and real housing decisions. Rather than relying on obvious predictors like income alone, we wanted to explore whether seemingly unrelated consumer attributes like pet ownership, outdoor interests, and charitable giving carry meaningful signal about the type of home a person gravitates toward. By combining consumer profile data with verified property assessment records, we could move beyond speculation and test these relationships against real matched consumer-property pairs, ultimately building a model that predicts which properties a consumer is most likely to choose based on who they are, not just what they earn.

What it does

In our AI model, a user can put in certain characteristics about themselves and their household, such as the size of their household or how much they like gardening. The model will then perform a nearest-neighbor cross-validation check to determine what features the user might like best in a house, such as how big their lot should be or how many bedrooms they should have. Additionally, we used Omni Analytics to visualize the relationships between variables.

How we built it

Our workflow was split across two primary tools. We used Python in VS Code for all data processing and modeling, which included cleaning and encoding both the consumer and property datasets, and building our KNN recommendation model. All scripts were structured sequentially, starting with data cleaning, moving into feature preparation, and finishing with model training, validation, and generating the final property recommendations. For visualization and presentation of our findings, we used Omni Analytics, which allowed us to connect our cleaned data outputs and build a simple dashboards to explore the consumer to property relationships and model results in a more accessible, visual format.

Challenges we ran into

One of our earliest roadblocks was discovering that the consumer and property datasets did not align as cleanly as we expected. The addresses between the two files did not directly match, which created significant confusion around how to structure our training data and define what a "known" consumer-property pair actually meant. We also came in with ambitions for a broader geographic analysis, hoping to identify which regions or counties had the highest concentration of properties suited to a given consumer profile. However, we quickly realized the data was entirely centered within Rancho Santa Margarita, which required us to rethink our location-based question and narrow our scope accordingly. Finally, our initial modeling approach using Random Forests across multiple property targets proved too computationally intensive to run efficiently, pushing us to rethink our architecture and ultimately land on a KNN-based approach that balanced speed with predictive quality.

Accomplishments that we're proud of

This was our very first datathon, which made every milestone feel so rewarding. Despite running into significant technical difficulties along the way, we managed to figure out Omni Analytics within a really tight timeframe, which was honestly not easy given none of us had used it before. Beyond the tools, this project pushed us to get a lot more comfortable working with large, messy real world datasets and gave us a much better intuition for machine learning models than we had going in.

What we learned

We learned a lot about methods for creating a proper AI model using nearest-neighbor methods. We also learned a lot about how to use Omni Analytics to create informative visuals, and how to customize those visuals so they looked appealing.

What's next for Real Estate Platypuses

In the future, we may revisit this project to refine our AI model, or add more visualizations better displaying the relationships between consumer characteristics and the types of houses they may be interested in.