Oasis – Small Business Survival Intelligence

Inspiration

Chicago has lost tens of thousands of small businesses over the past decade. Many of these were not failing because of lack of effort, but because owners didn’t have access to early warning signals hidden in data. Business license renewals, health inspections, neighborhood income shifts, and customer activity all contain signals, but they exist in separate systems and are not accessible in a usable way.

As a team, we wanted to build something that could bring these signals together and make them actionable. The goal was to give small business owners the same level of risk insight that large firms use, but in a simple, accessible, and multilingual format.


What We Built

Oasis is a small business survival platform that predicts closure risk and explains the underlying factors.

We built a system that:

  • Aggregates data from multiple sources (business licenses, census data, Google reviews, etc.)
  • Trains a machine learning model to predict 12-month closure risk
  • Visualizes results on an interactive map with clear risk levels
  • Provides an AI-generated explanation and action plan for each business

Each business has:

  • A survival score
  • The top contributing risk factors (via SHAP)
  • A generated diagnosis and recommended actions

We also designed the system to be usable in real-world settings:

  • Multilingual support (English, Hindi, Tamil, Mandarin, Spanish)
  • Voice input and audio responses
  • Downloadable PDF reports for offline use
  • Corridor-level summaries for planners and community organizations

How We Built It

Data and Pipeline

We combined six different data sources into a unified dataset. This included:

  • Business license data from the Chicago Data Portal
  • Google Places data for ratings and engagement
  • Census data for socioeconomic context

One of the biggest challenges early on was that license data did not include coordinates. As a team, we built a geocoding pipeline that queried additional endpoints and matched records in batches. This reduced our dataset from around 1,000 records to ~570 usable, mappable entries.

We engineered features across three levels:

  • Business-level (age, renewal patterns, independence vs chain)
  • Customer-level (ratings, review velocity)
  • Neighborhood-level (income, rent, growth trends)

Modeling

We trained an XGBoost binary classifier to predict whether a business would close within 12 months.

  • Used class weighting to handle imbalance
  • Tuned hyperparameters with randomized search and cross-validation
  • Evaluated on a held-out test set

We also integrated SHAP to generate per-business explanations. This allowed us to surface the top three factors driving each prediction, making the model outputs interpretable rather than just numerical.

We explicitly removed features that could cause data leakage, such as renewal delays and nearby closures, to ensure the model remained realistic.


Backend and AI Layer

The backend was built with FastAPI. Instead of running predictions per request, we computed all predictions upfront and stored them in memory for fast lookup.

For the AI assistant:

  • We constructed structured prompts using model outputs and SHAP explanations
  • Used Gemini as the primary model and Groq as a fallback
  • Enforced strict JSON outputs for consistency

We also implemented:

  • Voice transcription using Whisper
  • Text-to-speech using ElevenLabs
  • PDF generation using ReportLab with multilingual font support

Frontend

The frontend was built with React, Mapbox, and Three.js.

We focused on making the experience intuitive and visually clear:

  • Color-coded risk markers (high, medium, low)
  • Interactive map with 3D elements
  • Sliding panel showing business insights
  • Animated score visualization

Challenges

One of the biggest challenges was working with incomplete and inconsistent data.

  • License data lacked coordinates, requiring a custom geocoding pipeline
  • API calls (Google Places) failed silently at scale, requiring fallback handling and checkpointing
  • Ensuring consistency in how risk scores were interpreted across backend, frontend, and AI responses caused repeated bugs

We also ran into issues with:

  • Multilingual AI responses returning inconsistent formats
  • Unicode rendering in PDF generation
  • Synchronizing assumptions across team members working on different parts of the stack

What We Learned

As a team, we learned that the hardest part of building a data product is not the model, but the pipeline. Cleaning, joining, and validating data took significantly more effort than training the model itself.

We also saw that:

  • Explainability (SHAP) is essential for making ML outputs useful
  • Structured prompting and fallback models improve reliability in AI systems
  • Building end-to-end systems requires constant coordination across frontend, backend, and ML components

What’s Next

We plan to:

  • Expand coverage to all Chicago businesses
  • Incorporate additional signals such as lease timing and real-time trends
  • Build a generalized version that can be deployed for other cities

The long-term goal is to make this type of risk intelligence accessible to any small business, not just large organizations.

Built With

Share this project:

Updates