Inspiration
Our inspiration came from public-sector technology failures where well-intentioned initiatives created serious harms for the communities they were meant to serve. A prime example is Australia’s Robodebt scheme, where an automated welfare debt-recovery system was introduced to improve efficiency and detect overpayments but ended up causing severe harm to welfare recipients. The system wrongfully recovered money from hundreds of thousands of people, many of whom lacked the resources, documentation, digital access, or legal support needed to challenge the decisions.
Looking at similar cases around the world, such as the Dutch childcare benefits scandal and Michigan’s MiDAS unemployment system, we saw a common pattern: decision-makers often evaluate projects through cost, feasibility, and intended benefits, while indirect social costs remain invisible until harm has already occurred. This risk becomes even sharper when new technologies are introduced into public systems that many stakeholders cannot easily inspect, question, or contest.
Our project was built to help civic leaders ask, “What might we be missing?” before full public release, to tackle the problem at its very root. By combining public evidence, structured intake, and human-in-the-loop review, we aim to surface potential externalities early so public innovation can be judged not only by whether it works, but by who it might burden, exclude, or leave behind.
What it does
Our product is a policy decision-support system that helps public institutions and civic organizations think through the hidden externalities of proposed AI or technology-enabled initiatives before they move forward. It takes a proposal, structures the key context, routes it through a staged evidence pipeline, and then surfaces evidence-backed risks, affected groups, uncertainty, and human-review questions in clear language. The goal is not to approve or reject a policy, but to broaden what decision-makers consider so they can make more informed and more equitable choices. In short, it turns a (vague) civic AI idea into a grounded externality review that highlights impacts on access, equity, workforce, language, and local community conditions.
How we built it
We built PolicyLens around two main parts: a public-data evidence layer and a guided AI review pipeline. First, we collected and cleaned public datasets for Cincinnati and Hamilton County, including ACS, BLS, CDC/ATSDR SVI, and O*NET. We transformed these raw datasets into a structured evidence store that the system can search and cite. Instead of letting the AI make unsupported claims, the evidence layer gives it prepared records about local demographics, digital access, workforce context, social vulnerability, language access, housing conditions, and occupation-level job characteristics.
Second, we designed the AI workflow as a sequence of smaller agents rather than one large chatbot. The system first checks whether the user’s proposal fits the civic externalities-review scope. It then checks whether enough basic project information has been provided, such as the location, affected users, public service area, implementing actor, and stage of implementation. If important information is missing, it asks targeted follow-up questions.
Once the intake is complete, later agents extract the project context, decide which risk areas are relevant, retrieve matching public evidence, and draft a structured Community Externalities Review. A governance reviewer then checks the draft for unsupported claims, overstatement, missing uncertainty, or language that sounds like final policy advice. Only after this review does the system prepare the final user-facing report.
This structure made the project easier to debug, easier to validate, and safer to use. Each step has a narrow role, and the final output remains grounded in public evidence rather than relying only on the model’s general knowledge.
Challenges we ran into
One major challenge was turning messy proposal text into something the system could actually evaluate. Early-stage civic ideas are often incomplete: users may describe the goal, but leave out details such as who is affected, where the project will operate, what process will change, or how the technology will be used. We had to design an intake process that could separate useful information from vague background context and ask follow-up questions only when needed.
Another challenge was deciding what risks the system should focus on. Public-sector AI projects can create many kinds of externalities, but not all of them can be measured responsibly with available data. For the MVP, we focused on areas that were both socially meaningful and supportable with public evidence, such as digital access, language and accessibility barriers, workforce disruption, equity vulnerability, housing pressure, and local service context.
The data layer also required careful scoping. We chose Cincinnati and Hamilton County as the first test geography so that the system could be locally grounded instead of making broad national claims. This meant cleaning and organizing public datasets into a format that the AI pipeline could reliably use, while preserving limitations and uncertainty.
Finally, we had to decide how much autonomy the AI should have. We did not want the system to become an approval engine or a black-box scoring tool. To reduce that risk, we split the workflow into smaller stages, added evidence grounding, and included a governance review step before the final report is shown to the user.
Overall, the hardest part was not simply building an AI that could generate text. It was building a workflow that could take an incomplete civic proposal and turn it into a structured, evidence-backed review that remains understandable, cautious, and useful for human decision-makers.
Accomplishments that we're proud of
We are especially proud that we turned a fairly open-ended civic AI idea into a complete, working pipeline with clear stages, guardrails, and a real data foundation. Rather than relying on a single model to do everything, we designed a modular system that separates relevance checking, objective extraction, evidence gathering, synthesis, and final report writing. That made the project much more reliable and much easier to reason about.
We are also proud of the data layer we built. We constrained the MVP to Cincinnati and Hamilton County, Ohio, and assembled a preprocessing pipeline around public, defensible sources like ACS, BLS, O*NET, and CDC/ATSDR SVI. That gave us a grounded local evidence base while still leaving the architecture flexible enough to scale to new places and additional datasets later.
Another major accomplishment was the prompt and schema design work. We spent a lot of time deciding how to split the problem across agents, what each agent should see, and what each one should output. The result is a system with minimal redundancy and clean handoffs between steps, which is important both for reliability and for interpretability.
We are also proud that, in a short time, we built a proof-of-concept website along with a logo, mission statement, and catchphrase. That helped us translate the technical work into a clear public-facing product concept and made the project feel real and presentable beyond the backend pipeline.
Finally, we are proud that the final output is not just technically structured, but also human-readable and policy-facing, that creates value for communities and advocates critical-thinking. The system is designed to help decision-makers see externalities, uncertainty, and overlooked risks in a way that further supports human judgment instead of replacing it.
What we learned
We learned that civic AI works best when it is treated as a decision-support system, not as a one-shot answer machine. The hardest part was not just generating text, but designing a workflow that can handle incomplete proposals, surface missing context, and stay grounded in evidence rather than speculation.
We also learned that scope matters a lot. Narrowing the first version to Cincinnati and Hamilton County made the system much more defensible, because it let us rely on public data we could actually preprocess well and test locally. That local focus also made it clearer which variables mattered, which data sources were strong enough to trust, and where uncertainty had to be preserved.
Another lesson was that complex civic questions are better handled through modular steps. Splitting the pipeline into smaller agents made the system easier to debug, easier to evaluate, and easier to keep aligned with the user’s actual needs. It also helped us keep the prompts and schemas clean, with each step responsible for one clear job.
Finally, we learned that the most useful output is not a score, but a structured explanation. Policymakers do not just need an answer; they need to see what was considered, what is still unknown, and what human review questions should come next. That shaped both the data pipeline and the LLM design.
What's next for PolicyLens
Next, we want to turn the prototype into a more polished pilot and test it with real civic use cases. The immediate focus is to strengthen the user experience, validate the outputs with feedback from policymakers and community stakeholders, and make sure the system continues to surface the kinds of externalities that matter most in practice.
On the technical side, we want to expand the evidence layer with additional public datasets, add more jurisdictions beyond Cincinnati and Hamilton County, and continue improving the modular pipeline so it stays easy to maintain and scale. We also want to refine the final report so it becomes even clearer, more concise, and more useful for decision-makers.
Longer term, the goal is to broaden the set of policy areas the system can support, while keeping the same core principle: human review first, with the AI helping surface overlooked risks, missing context, and relevant evidence.
References
Reuters (2025) ‘Australia agrees to record $309 million payout to victims of illegal debt recovery scheme’, Reuters, 4 September. Available at: Reuters. Accessed: 21 June 2026.
Reuters and Van Den Berg, S. (2021) ‘Dutch government quits over “colossal stain” of tax subsidy scandal’, Reuters, 15 January. Available at: Reuters. Accessed: 21 June 2026.
Associated Press (2023) ‘Judge OKs $20M deal in mess over jobless aid determinations’, AP News, 23 January. Available at: AP News. Accessed: 21 June 2026.
Log in or sign up for Devpost to join the conversation.