Inspiration

My partner (Erika) is a dental professional who has seen first hand the impacts undetected issues and subsequently poor treatment planning can have on patient health. A common issue new dentists face is evaluating all possible problems and required treatment based on a case file. Secondly, several dentists in lower-socio economic areas administer care with limited knowledge and understanding. Thirdly, dentists are facing significant burn out and dental practices are just getting busier. We hope this GPT helps them diagnose and provide essential care to their communities.

What it does

The Dental Assessment GPT generates an evidence-based dental assessment & plan from a structured case, and qualitatively grade the assistant’s output against clinical principles.

The GPT takes inputs such as Patient data, oral problems, patient medical history, clinical findings, radiographs, current medications and habits.

How we built it

We built this using a mix of data collected from dentists via surveys + publicly available data and specialised problem/treatment dental documentation.

1. Dataset Processing Pipeline

Initial Data Audit

  • Found 2494 total cases in dataset ⁠
  • Placeholder patterns (e.g., “Patient reports symptoms began approximately 2 weeks ago”) dominated content

Clinical Enhancement

  • Enriched cases with demographics (age 18–75, gender balance, diverse occupations)
  • Inserted condition-specific findings:

    • Caries vs. periodontal disease vs. cysts vs. mucosal lesions
    • Matching radiographic findings (periapical radiolucency, bone loss patterns, cystic expansion, etc.)
    • Urgency levels (0 = elective, 1 = moderate, 2 = urgent)

Full Pipeline Application

  • Applied continous refinements and enhancements to get the final 2494 cases. Have gone through expert feedback and llm as a judge fine tuning to get robust causal links

Quality Control Gates

  • JSON schema validation (ensured ⁠ diagnosis ⁠, ⁠ etiology ⁠, ⁠ urgency ⁠, ⁠ management ⁠, ⁠ abx ⁠, ⁠ follow_up ⁠, ⁠ counseling ⁠, ⁠ guideline ⁠)
    • Internal checks for: missing values, duplicate patient profiles, inconsistent urgency assignments

2. Expert Validation Process

Dentist Grading via Typeform

Agent Mode Research

  • Ran 40+ structured Agent Mode queries (e.g., “How would a periodontist classify this?”)
  • Extracted literature-backed treatment pathways.

AI Cross-Comparison

  • Benchmarked random cases against ChatGPT-5 “thinking” mode outputs
  • Flagged inconsistencies between enhanced cases vs. gold-standard reasoning

Structured Input–Output Linking

  • Built causal mapping:

    • Demographics + findings → Risk assessment
    • Risk assessment + urgency → Management plan
    • Management plan + systemic signs → Antibiotic indication

Challenges we ran into

Data Issues

  • Duplication Noise – 95% of data was placeholders
  • Template Lock-In – Generic time markers ("2 weeks ago") everywhere
  • Missing Clinical Context – No age, gender, or systemic history initially
  • Radiographic Gaps – No condition-specific images described
  • Flat Urgency Levels – Every case looked the same complexity-wise
  • Dentist buy in – Typeform required about 45mins of dentists time which was hard to get in this short window.
  • Over-Representation of Healthy Cases – Dataset skewed toward low-complexity “routine” or “checkup” visits, underrepresenting challenging pathologies.
  • Ambiguity in Diagnoses – Some cases were vague or combined multiple possible conditions, creating fuzzy labels for model training.

Technical Problems

  • Python Failures – Syntax errors during JSONL transformation
  • Scope Creep – Accidentally generated “healthy check-up” patients instead of pathology-driven cases
  • Expert Coordination – Dentists flagged inconsistencies requiring multiple feedback loops
  • AI Variability – ChatGPT-5 outputs differed across sessions even with structured prompts

Accomplishments that we're proud of

We have learnt a lot about the dental space over the last 2 weeks and the long term impacts poor dental treatments can have on people. We are most pleased that dentists feel heard when we talk to them and are able to create something that can alleviate some of the pressures they face. Alongside this holistic accomplishment we also are proud of:

Data & Clinical

  • Deduplication First – prevents amplifying noise ⁠- Schema Discipline – enforce required fields early
  • Scope Boundaries – healthy cases ≠ pathology training data
  • Urgency Calibration – cases must span elective to urgent for realism
  • Causal Pathways – clinical data must logically connect to management

Technical

  • JSONL Handling – strict error checks & rollback points
  • Template Detection – auto-flag placeholder cases pre-enhancement
  • Version Control – multiple checkpoints during processing
  • Iterative Sampling – small test batches before scaling full dataset

Expert Validation

  • Typeform Feedback Loop – dentist ranked scores provided measurable quality signal. We had to make the typeform ourselves and let dentist to rank out of 4 answers. Here is the form: https://form.typeform.com/to/RFEHs2Xy
  • Agent Mode Testing – revealed weak spots where AI diverged from expert consensus. good for rapid dataset production.
  • Cross-AI Comparison – Over the span of several days we continuously sampled, compared and refined the training data and graded each sample out of 100. ChatGPT-5 thinking mode was vastly more superior in understanding input output pairs than ChatGPT normal mode (which seems to give a relatively higher score on training data when using it to compare samples).

What we learned

Despite enhancements, the final dataset skewed heavily toward “healthy” patient cases, limiting its usefulness for training pathology-focused dental AI. This underlined the need for domain-specific expert tuned validation criteria:

  • Minimum pathology density
  • Balanced representation of caries, periodontal, mucosal, and radiographic conditions ⁠- Urgency distribution reflecting real-world triage
  • ability to capture nuances is only possible with scale. Instead of 2494 high quality gold outputs, more samples curated and reviewed by experts helps understand nuance a lot better.

What's next for Dental Assessment GPT

We have a strong feeling that this GPT can become the spine/brains for several Dental AI agents if further refinement can be provided. We plan on getting dentist buy in and refining this model further with RLHF on the fine tuned outputs. At inference we hope to implement RAG in context and longer token sequences to make model fully context aware. In line with recent breakthroughs in model hallucinations, we also plan to improve its ability to reject prompts and not solely aim for the best answer, instead improve its "i dont know" quotient for multivariate dental cases.

Built With

Share this project:

Updates