Benchmarking Bias in College Application Review

Colleges and universities commonly require personal essays and recommendation letters as part of the admissions process. Artificial intelligence (AI) models trained on historical records can learn and reproduce systemic biases present in those data. As a growing number of institutions adopt AI-based systems to evaluate application materials, it is increasingly important to understand whether demographic signals embedded in essays and letters influence model judgments. Prior work suggests that admissions essays and recommendation letters can contain subtle biases that shape perceptions of applicants’ personal attributes.

In this study, we propose a benchmark for measuring gender- and race-related bias in AI-assisted college admissions evaluation. We developed a scoring framework that assesses admissions essays across four dimensions: grammar, tone, academic readiness, and life context. To isolate demographic effects, we systematically varied personal attributes, including the applicant’s name, hometown, and volunteer organization, while keeping the essay content constant. Across tested models, we observed minimal overall bias, with a slight preference toward Black applicants. However, we also identified unintended bias in our synthetic data that disadvantaged white applicants, which likely contributed to this observed effect.