Inspiration
The idea came from watching my roommate, a project engineer at Teichert (California's largest solar construction firm), manually review HCSS timesheets. He processes seven to ten per week, each taking ten to fifteen minutes, on repetitive, structured forms that are well-suited to automation. Construction remains one of the few large industries largely untouched by AI, despite an estimated $60B in annual losses tied to clerical and numerical errors. Timesheets are tabular, follow predictable schemas, and have well-defined anomaly classes, which makes them a natural entry point for agentic AI in the space.
What it does
ConstructIQ ingests an HCSS timesheet PDF and returns a ranked, explainable audit report in roughly 90 seconds. The pipeline runs parsed records through a hybrid anomaly detection layer: rule-based detectors for structural issues (missing values, categorical mismatches, equipment standby excess) combined with an IsolationForest model deployed on Vertex AI for multivariate numerical outliers. Detected anomalies are then passed to Gemini, which produces natural-language explanations contextualizing each flag for the reviewing project engineer.
How we built it
Uploaded timesheets are parsed via the Reducto OCR API, chosen because HCSS forms contain small-font alphanumeric codes that generic OCR engines extract unreliably. Parsed records are written to MongoDB Atlas alongside historical cost-code baselines, integrated through the MCP partner track. The detection pipeline runs structural rule-based checks against those baselines and forwards numerical features to a Vertex AI endpoint hosting an IsolationForest model for outlier scoring. Flags are then passed to Gemini 2.5 Flash, which generates the audit narrative. The pipeline is served by a FastAPI backend orchestrated through Google Cloud Agent Builder, with a Streamlit frontend deployed on Cloud Run.
Challenges we ran into
Real timesheet data was scarce, so we built our corpus through synthetic generation calibrated against a small set of real HCSS cards shared by a project engineer, who also reviewed our synthetic distribution for realism. The largest technical challenge was customizing IsolationForest for HCSS structure: many fields such as employee codes and equipment codes are numerically encoded but semantically categorical, and treating them as continuous variables corrupts the anomaly score. Parsing conventions such as "8/2.5" (which encodes 8 regular hours and 2.5 overtime hours) required custom preprocessing rather than naive float conversion. We also hit integration friction between the Agent Platform API and our Vertex AI endpoint, which required careful schema alignment between training-time and serving-time feature representations.
Accomplishments that we're proud of
We are proud of building an HCSS-specific anomaly pipeline that respects the structural and categorical nuances of the format rather than applying off-the-shelf detection to ill-suited data. Repeated review cycles with an active project engineer gave us domain grounding that is uncommon for a hackathon project. Beyond the build itself, the work clarified a broader thesis: timesheet auditing is one of many administrative bottlenecks in construction that are well-suited to agentic AI, and our architecture generalizes naturally to invoice review, dispatch logs, and equipment utilization tracking.
What we learned
Technically, we worked end-to-end with the GCP stack (Vertex AI endpoints, IAM, Cloud Run, billing) and built a clearer mental model of how Agent Builder orchestrates external tools. We also learned the operational differences between IAM-style access control and MongoDB Atlas's organization-level access model, which is a non-obvious distinction when integrating both into a single pipeline. On the algorithmic side, we engaged seriously with the mechanics of IsolationForest itself: how it isolates points through random partitioning rather than density estimation, and why that property makes it highly sensitive to feature-encoding choices. With mathematical backgrounds across the team, working through the underlying distribution-shift logic made the choice of algorithm deliberate rather than incidental.
What's next for ConstructIQ
The most immediate next step is generalizing the pipeline beyond HCSS, as most construction firms use one of a small number of timesheet platforms, and supporting two or three would substantially expand the addressable user base. With additional training data and compute, we expect to improve the agreement rate between our model and a human project engineer, which currently sits in the 65 to 75% range. Longer term, ConstructIQ already maintains historical records of who works on which project, which opens the door to resource allocation features: modeling crew composition, task affinity, and team chemistry to assist with project staffing. Construction has remained largely AI-naive despite being one of the most administratively dense industries, and we see ConstructIQ as one of many natural entry points for agentic systems to deliver measurable productivity gains.
Log in or sign up for Devpost to join the conversation.