RedwoodAI - AI-Based PII & Climate-Aware Underwriting Engine
Inspiration
First American's Sequoia initiative demonstrated how AI could transform title services by applying automated underwriting to residential purchase transactions. As they refine this technology for a national rollout beginning in 2026 , we identified two massive gaps in the current landscape: the reliance on manual PII redaction, and the lack of integrated, property-specific climate risk data during the underwriting phase. RedwoodAI was inspired by the need to bridge this gap, creating a dual-purpose engine that actively sanitizes sensitive data while assessing localized environmental risks.
How We Built It
We architected RedwoodAI as a full-stack MERN application with a heavy emphasis on real-time machine learning processing.
The Three-Tier PII Pipeline
To handle the complex, unstructured nature of real estate documents, we built a hybrid pipeline utilizing @xenova/transformers:
- BERT NER: We deployed an ONNX-optimized BERT model to perform Named Entity Recognition (NER), specifically targeting person, organization, and location entities within the text.
- Regex Patterns: We implemented strict regular expressions as a secondary net to catch deterministic PII like SSNs, phone numbers, emails, credit cards, account numbers, and dates of birth.
- Contextual Heuristics: We applied a final algorithmic layer to resolve conflicts between the ML and RegEx outputs.
ML Climate-Risk Prediction To predict environmental devaluation risk, we engineered a multi-model MLP (Multilayer Perceptron) ensemble. We process geospatial data into a 12-dimensional feature vector, denoted as $X$:
$$X = \begin{bmatrix} x_1 \ x_2 \ \vdots \ x_{12} \end{bmatrix}$$
This feature vector is fed into our hidden layers to compute the linear prediction $z$, where $W$ is the weight matrix and $b$ is the bias vector:
$$z = W \cdot X + b$$
To output a normalized risk score, we apply a sigmoid devaluation probability function to map the linear output to a probability between 0 and 1:
$$P(\text{Devaluation}|X) = \sigma(z) = \frac{1}{1 + e^{-z}}$$
Finally, the model's confidence $C$ for a given prediction is calculated by measuring the probability's distance from the decision boundary:
$$C = 2 \left| P(\text{Devaluation}|X) - 0.5 \right|$$
Geospatial & SASB Integration We aligned our risk outputs directly with SASB standards to provide standardized quantitative and qualitative data such as GHG emissions. We visualized this data on the frontend using Recharts and an interactive Leaflet.js map featuring color-coded heatmap overlays.
What We Learned
- JS is surprisingly decent for ML. We ran ONNX models straight in Node.js using Transformers.js was a game-changer. We didn't even need to spin up a separate Python microservice!
- Out-of-the-box BERT models get super confused by real estate language. We had to do a lot of heuristic filtering to make it actually understand title documents.
- We also learned that ML may not always be the best solution to a problem. Pairing BERT with good old-fashioned RegEx was the only way to catch all those weird edge-case SSNs and account numbers.
Challenges
- Trying to stitch entity strings back together from BERT's raw token outputs without completely destroying the document's original spacing was honestly brutal.
- We had a ton of overlap where both the ML model and our RegEx would flag the exact same text span, which initially crashed our redaction logic.
- Our frontend kept trying to render the document before the backend finished running the PII detection. Getting the async states to work took way too long.
What's Next
- Catching more PII: Expanding our RegEx library and training the NER on weirder financial identifiers that we missed this time around.
- Smarter risk predictions: Right now we're using a 12-dimensional vector, but we want to feed the model even more data points to make the devaluation predictions tighter.
- More climate factors: We definitely want to add layers for things like long-term water scarcity, soil subsidence, etc
Built With
- bert-base-ner
- express.js
- leaflet.js
- mongodb
- multilayer-perceptron-model
- node.js
- react
- recharts
- regex
- typescript

Log in or sign up for Devpost to join the conversation.