AI-Powered Test-Prep Comparison System

Problem

Students preparing for exams like the SAT, ACT, and LSAT face an overwhelming marketplace of test-prep options. Providers advertise different pricing models, instructional hours, guarantees, success claims, and teaching credentials - but this information is scattered across websites, inconsistently reported, and difficult to compare objectively. Students often rely on anecdotes, ads, or brand familiarity rather than structured evidence when choosing a provider that fits their budget, schedule, and learning preferences.

Solution

We propose a multi-agent AI chatbot that generates a ranked comparison table of test-prep providers tailored to user inputs:

User Inputs:

  • Test type (required): SAT / ACT / LSAT / etc.
  • Budget
  • Mode of instruction: online, in-person, hybrid, self-paced
  • Time commitment

System Operation: Behind the scenes, a team of LLM agents automatically gathers verifiable data from provider websites and public sources, extracts structured facts, and computes an internal ranking. The system never hallucinates—if a field cannot be verified, it is left blank and cited accordingly.

The chatbot presents users with a transparent, evidence-backed table that allows apples-to-apples comparison across providers.

System Architecture

Instead of relying on a single model to make a complex decision, we designed three specialized AI agents, each acting like an expert with a distinct responsibility:

  1. Budget Agent: Evaluates cost-effectiveness, focusing on price, cost per hour, guarantees, and improvement claims.
  2. Time and Schedule Agent: Analyzes whether a plan can realistically fit within the student's available study time and whether the schedule is flexible or fixed.
  3. Learning Format Agent: Examines whether the teaching style aligns with the student's preferred mode of instruction and class size.

All agents analyze the same set of plans independently and in parallel, then "vote" on the best option. The final recommendation is based on consensus, simulating how multiple human advisors might independently assess the same choices.

This design reduces single-model bias and makes the reasoning more robust and interpretable.

Data

A critical component of this project was the creation of a structured database of test-prep plans. Because no public, standardized dataset exists for comparing SAT providers, we manually collected and normalized information from provider websites into a consistent CSV format. This step was essential: without structured data, the AI agents would have nothing reliable to reason over.

Key Design Decision: Each row in our dataset represents a specific test-prep plan offered by a company, rather than just the company itself. This distinction is important because many providers offer multiple plans that differ significantly in cost, duration, format, and guarantees. By treating each plan as its own data entry, we allowed the system to perform fine-grained comparisons rather than coarse company-level judgments.

Variables Recorded for Each Plan:

  • Company and plan name
  • Total cost of the program
  • Total instructional hours
  • Estimated score improvement claim
  • Mode of instruction (live online, self-paced, tutoring, hybrid)
  • Duration of the program
  • Class size (group vs individual)
  • Money-back guarantee availability
  • Key features and support offered
  • Tools and technology

Tools and Technology

Several tools and technologies were used to build this system:

  • Python for orchestration and data handling
  • asyncio and aiohttp enabled the agents to run in parallel
  • Ollama with a locally hosted Llama model provided the LLM inference
  • Structured JSON datasets were created to standardize plan information across providers
  • Prompt engineering was used to specialize each agent's reasoning process
  • Consensus logic was implemented to determine the final recommendation

Challenges

  1. Data Standardization: Every SAT prep site described plans differently. Some gave hours, some didn't. Some gave guarantees, some hid it in marketing text. We had to manually clean and standardize everything into a usable JSON before the AI could even do anything.
  2. Technical Implementation: Parallel calls to the local model caused delays, timeouts, and syncing issues before we could even compute a result.
  3. Incomplete User Inputs: People enter "NA" for time, budget, or goals, and we still had to make the system work without breaking the filters.
  4. Did not integrate Frontend with backend: Although we currently have both a functional front-end and backend, we haven't integrated both yet.

Future Steps

  1. Automated Data Pipeline: Replace manual data entry with an automated pipeline that uses web scraping and information extraction tools to continuously collect and refresh plan details from provider websites and credible public sources. This would allow the database to:

    • Grow significantly
    • Cover more exams (SAT/ACT/LSAT and beyond)
    • Remain up to date as companies change pricing, offerings, or guarantees
  2. Enhanced Interactivity: Make the chatbot more interactive by having agents engage directly with the user through follow-up questions.

Built With

Share this project:

Updates