HealthTest AI

Automating IEC 62304-Compliant Test Case Generation for Healthcare Software

The Problem

Healthcare software development operates under strict regulatory frameworks — IEC 62304, FDA 21 CFR Part 11, ISO 13485, and HIPAA.

Quality assurance (QA) teams in medical device companies spend 40–60% of their time manually converting requirements into compliant test cases.

A single software module can take weeks to generate hundreds of test cases — each carefully mapped to requirements, compliance tags, and risk classifications.

We’ve seen QA engineers spending 2–3 hours per requirement just to write documentation, often using large spreadsheets that are difficult to maintain during audits.

This creates bottlenecks that delay product launches, increase costs, and limit scalability for fast-growing healthtech startups.

Modern LLMs can understand complex regulatory language — so why not build a system that generates IEC 62304-compliant test cases instantly?

What It Does

HealthTest AI automates the entire test case generation workflow for healthcare software.

Key Features

Multi-Format Requirements Processing Accepts requirements in PDF, Word, or plain text format.

AI-Powered Test Case Generation Uses advanced LLMs to generate comprehensive, compliance-ready test cases covering:

Positive and negative scenarios

Boundary conditions

Security validations

Approval Workflow

QA teams can approve, reject, or enhance AI-generated test cases before finalization.

Compliance-First Output Each test case includes:

Traceability IDs linking to source requirements

Risk levels per IEC 62304 classification

Compliance tags (FDA 21 CFR Part 11, HIPAA, ISO 27001)

Detailed preconditions, test steps, and expected results

Jira Integration Mock export functionality that generates ticket IDs for immediate QA workflow integration.

How We Built It

Backend Architecture

FastAPI web framework with async support for efficient processing

Document parsing pipeline using PyPDF2 and python-docx

JSON-based persistence for requirements, test cases, exports, and logs

SHA-256 hashing for duplicate requirement detection

AI Prompt Engineering

Our custom system prompt deeply understands:

IEC 62304 software safety classifications

FDA electronic record requirements

ISO 13485 design controls

HIPAA security safeguards

The prompt instructs the LLM to generate structured JSON output with realistic, audit-ready test cases that would pass regulatory review.

Frontend Stack

Vanilla JavaScript for simplicity and performance

CSS Grid/Flexbox for responsive layouts

Hash-based routing for navigation

Real-time polling for generation status updates

Challenges We Ran Into

API Cost Management

Claude API costs $15 per million tokens.

One large document could cost $2–3 to process.

Implemented prompt caching (90% cost reduction) and multi-LLM fallback to free tiers.

Empty Response Handling

Some providers returned empty strings or malformed JSON.

Added comprehensive error handling and fallback test cases to ensure system reliability.

Windows Environment Variable Loading

.env file failed to load due to UTF-8 BOM encoding issues.

Implemented manual file parsing as a fallback solution.

Accomplishments

Technical

Achieved 90% cost reduction through caching and fallback strategies

Generated realistic test cases with proper compliance terminology

Built a robust system compatible with multiple LLMs

Business Value

Reduced test case creation time from hours to seconds

Estimated 60% QA time savings

Ensured comprehensive compliance coverage

Made enterprise-grade QA automation accessible to startups

User Experience

Clean and professional UI suitable for healthcare enterprises

Approval workflow integrated into existing QA processes

Mock Jira integration demonstrating real-world applicability

What We Learned

Provider Abstraction is Critical: Avoid hardcoding API calls to prevent failures from rate limits.

Healthcare Compliance is Complex: Test cases must reflect regulatory terminology and structure.

Error Handling Over the Happy Path: Robust fallback handling ensures production stability.

JSON Parsing is Non-Trivial: Each LLM formats responses differently; a multi-stage parser was necessary.

What’s Next for HealthTest AI

Short-Term (Next 3 Months)

Real Jira/Azure DevOps integration with OAuth authentication

Traceability matrix visualization linking requirements to test cases

Batch processing for multiple requirement documents

Excel export with regulatory audit formatting

Medium-Term (6–12 Months)

PostgreSQL backend replacing JSON file storage

Role-based access control for multi-user collaboration

Test execution automation integration

Custom compliance templates (add regulations beyond FDA/IEC)

Version control for requirements with automatic test case regeneration

Long-Term Vision

End-to-end QA automation: requirements → test cases → execution → defect tracking

AI-powered maintenance that auto-updates test cases when requirements change

Compliance dashboard showing regulatory coverage metrics

GDPR-compliant enterprise deployment for EU healthcare companies

CI/CD integration for continuous compliance validation

Future Enhancement: RAG Integration

In future releases, Retrieval-Augmented Generation (RAG) will be integrated to enhance robustness and reduce hallucinations.

RAG will:

Retrieve verified compliance and regulatory documents dynamically

Improve factual accuracy and contextual grounding

Ensure all generated test cases reference authoritative standards

Provide audit-proof traceability from requirement → standard → test case

This will make HealthTest AI more reliable, explainable, and production-ready for enterprise healthcare QA teams.

Built With

4.5
anthropic
claude
css
fastapi
git
html
javascript
pydantic
pypdf2
python
sonnet
uvicorn

Submitted to

Accel + Anthropic | Dev Day Community Showcase

Created by

I designed the healthcare-specific prompt system and implemented Claude API integration with document processing. The trickiest part was prompt engineering to ensure consistent IEC 62304 and FDA-compliant output—getting Claude to generate structured test cases with proper risk classifications, traceability IDs, and compliance tags required iterative refinement. I'm most proud of creating a system prompt that produces audit-ready test cases matching real regulatory standards. I learned that domain expertise (healthcare compliance) in prompts is more valuable than raw model capability for specialized tasks.

Akshay Kanthed
Anurag Phad
Ninad Saswade

Updates

Akshay Kanthed started this project — Oct 05, 2025 01:09 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.