MedReport Logo
MedReport Screenshot 1
MedReport Screenshot 2

MedReport: Transforming Doctor-Patient Conversations into Professional Medical Reports

OpenAI Open Model Hackathon Submission

Where Natural Conversation Becomes Professional Documentation

Video Link

Video Demo

Inspiration

Hi there! I'm Young-wouk, and I wear two hats: by day I'm a practicing physician with over 10 years of clinical experience, and by night I'm a developer with 25+ years of programming under my belt (fueled, of course, by a serious coffee habit).

Living at the intersection of medicine and technology, I’ve experienced firsthand the soul-crushing reality of spending more time staring at computer screens documenting patient encounters than actually looking patients in the eye. Every consultation ended the same way: “Great, now let me spend 15 minutes typing everything we just discussed...” It was driving me — and my colleagues — absolutely insane.

When OpenAI released gpt-oss-120b and gpt-oss-20b with their revolutionary open-weight reasoning capabilities, something clicked. Here was a model that could understand complex medical reasoning, run completely offline (goodbye privacy nightmares!), and had the power to actually solve healthcare's biggest workflow problem. Paired with OpenAI's Whisper for medical-grade speech recognition, I got super motivated — the kind of motivated that makes you forget what sunlight looks like.

After many days of coding, many more nights than I care to admit, and way too many tutorials on how to use video editors — because presenting your work is just as important as building it — MedReport was born.

The vision was simple: What if every natural conversation between doctor and patient could automatically become professional documentation? What if we could give physicians back those precious hours to do what they love most — actually care for people?

Turns out, with a lot of determination, OpenAI's incredible gpt-oss architecture, and the perfect synergy between Whisper and gpt-oss-20b, we could make it happen. And yes, it works beautifully — completely offline.

The Problem: Healthcare's Documentation Crisis

Healthcare faces a critical workflow challenge that directly impacts both patient care quality and physician wellbeing:

Current State:

10-15 minutes per consultation lost to post-documentation tasks
2+ hours daily spent by physicians on paperwork instead of patient care
60% of physician burnout directly attributed to administrative burden
25% of global population faces language barriers in healthcare access
Privacy concerns prevent adoption of cloud-based medical AI solutions

Why Current Solutions Fail:

Medical Secretaries: Limited availability, expensive, still require dictation time
Dictation Software: Requires formal post-consultation dictation, breaks consultation flow
Electronic Health Record Systems: Focus on manual data entry rather than natural conversation capture
Cloud AI Services: Privacy concerns make them unusable in confidential medical settings

Technical Gap: No existing system can process natural doctor-patient conversations in real-time while maintaining complete privacy and generating professional medical documentation — until now.

What MedReport does

MedReport transforms unstructured doctor-patient conversations into professional medical documentation in real-time using OpenAI's gpt-oss-20b for medical reasoning and OpenAI Whisper for medical-grade speech recognition — all completely offline.

Core Functionality:

Natural Consultation Flow: Doctor and patient converse normally during visits
Real-Time Transcription: OpenAI Whisper processes audio locally with superior medical terminology accuracy
Intelligent Medical Analysis: gpt-oss-20b understands clinical context and applies medical reasoning patterns
Professional Documentation: Generates structured medical reports in a few minutes
Document Integration: Seamlessly incorporates external medical documents when relevant
Smart Prescription Detection: Advanced reasoning automatically identifies and formats prescriptions as PDFs
Multilingual Support: Documentation in 10+ languages
100% Privacy: Complete offline functionality with zero data transmission

Impact Metrics:

75% reduction in documentation time (10-15 minutes saved per consultation)
2+ hours daily returned to physicians for patient care
>20× faster report generation compared to manual creation
95%+ accuracy in medical terminology recognition

How we built MedReport

Architecture Overview: A Local-First Medical AI System

MedReport operates as a complete offline solution, built around OpenAI's gpt-oss-20b and Whisper models to ensure absolute privacy while delivering professional-grade medical documentation. Our architecture eliminates all external dependencies, making it deployable anywhere healthcare is practiced.

Model Selection & Deployment Strategy

Why OpenAI's gpt-oss Models Over Alternatives: Healthcare demands absolute privacy and data sovereignty while requiring excellent performance, making OpenAI's open-weight models the only viable choice that delivers both:

Exceptional Performance: Remarkable reasoning capabilities and medical knowledge produce professional-grade results with high accuracy
Complete Offline Operation: Patient conversations never leave the device — critical for HIPAA compliance
Open-Weight Architecture: Full transparency and control for medical applications
Zero Privacy Risk: No API calls, no cloud dependencies, no data transmission
Universal Accessibility: Works in remote clinics, disaster zones, and areas with poor connectivity

Strategic Decision: Base Models Over Fine-Tuning: The base gpt-oss-20b model demonstrated exceptional performance for medical documentation tasks right out of the box. To validate this, we prepared 200+ audio transcription pairs and 1000+ medical report training examples using Unsloth for potential fine-tuning, but found only minor improvements that didn't justify the additional computational overhead. Key reasons for skipping fine-tuning include:

Excellent prompt-guided results: With well-crafted prompts, base model delivers excellent and consistent professional outputs
Extensive medical knowledge: Pre-training already embedded comprehensive clinical knowledge
Strong clinical reasoning: Natural understanding of medical contexts across specialties
Broad flexibility: Maintained versatility across medical contexts without domain-specific constraints
Faster deployment: No training overhead or specialized fine-tuning infrastructure required

Core Technology Stack Implementation

Audio Processing Pipeline with OpenAI Whisper:

Audio Input → 16kHz Mono Conversion → 10s Chunks (8s Overlap) → Whisper-Small → Real-time Transcription

Model Selection: Whisper-small optimized for medical terminology while running efficiently on standard laptops
Real-time Processing: Audio chunks sent incrementally for immediate transcription feedback, enhancing user experience with live text display
Chunking Strategy: 10-second segments with 8-second overlaps preserve conversational continuity without losing clinical context
Context Preservation: Complete transcription sent to gpt-oss-20b at the end of recording (not fragmented chunks) maintaining clinical narrative flow
Noise Handling: Optimized for typical medical office environments with background conversations and equipment sounds

Medical Reasoning Engine with OpenAI's gpt-oss-20b:

Full Transcription ± External Documents → Clinical Context Analysis → Structured Medical Report

Deployment: F16.gguf quantization providing optimal accuracy-performance balance on consumer hardware
Prompt Engineering: Sophisticated prompts that leverage the model's reasoning capabilities to:
- Distinguish symptoms from diagnoses from treatment plans
- Apply clinical decision-making logic
- Maintain professional medical documentation standards
- Handle complex multi-symptom presentations
- Generate structured reports using specific examples provided based on selected format

Document Integration Workflow:

PDF Upload via Import Button → PyMuPDF Text Extraction → Relevance Analysis → Context Integration → Report Generation

User-Controlled Import: Users can optionally load external PDFs (lab results, referral letters, imaging reports) via dedicated Import button
Text Extraction: PyMuPDF extracts content which is passed to model in prompt for contextual analysis
Smart Processing: gpt-oss-20b determines which external document content is consultation-relevant based on transcribed conversation
Selective Integration: Only includes information actually discussed or pertinent to the current visit

Advanced Feature Implementation

Prescription Detection & PDF Generation: Our most innovative feature uses gpt-oss-20b's reasoning to automatically identify prescriptions and generate professional documents through single-inference processing — the model simultaneously generates the structured medical report and appends any detected prescriptions in custom XML format:

Detection Phase: Model analyzes transcription for medication mentions during clinical reasoning
Integrated Output: Prescription data automatically formatted in custom XML and appended to the medical report through specific prompt instructions specifying XML structure with title, patient, content, and context fields
Function Calling: The presence of prescription XML in the report automatically triggers the creation of action buttons in the UI — when clicked, these buttons generate print-ready prescription PDFs with complete medical report integration
Clean Display: XML removed from user-visible report while maintaining structured data for processing

Multi-Format Report Generation:

General Format: Structured documentation with bullet points and clear sections for quick reference, guided by a specific report example provided in the prompt
Narrative Format: Comprehensive prose-style reports with detailed clinical descriptions for thorough documentation, using a narrative report example in the prompt
Custom Templates: Configurable via custom_report_format.txt — content is loaded and passed as a report example in the prompt for specialty-specific needs

User Interface & Experience Design

Modular GUI Architecture: Built with separated, maintainable components, with particular attention paid to design: simple and functional, yet easy and pleasant to use. The software was designed and developed by a physician (with extensive programming experience) for physicians:

UI Manager: Non-blocking interface ensuring responsiveness during AI processing
Audio Handler: Real-time capture with visual feedback and buffer management
Animation System: Real-time audio wave visualization providing engagement feedback
PDF Generator: Professional document creation with medical formatting standards

Performance Optimization:

Thread-Safe Operations: Audio processing, AI inference, and UI updates run independently
Memory Management: Smart buffer handling prevents memory leaks during extended consultations
Resource Efficiency: Optimized to run smoothly on laptops without dedicated GPUs
Error Handling: Graceful degradation with user-friendly error messages

Privacy & Security Implementation

Complete Offline Architecture: Every component designed for zero network dependency:

Local Model Storage: Both Whisper and gpt-oss-20b models stored locally
Offline Processing: All transcription, analysis, and document generation happens on-device
No Telemetry: Zero data collection or transmission of any kind
HIPAA Compliance: Meets all healthcare privacy requirements by design

Data Handling:

Session-Based: Audio and transcriptions exist only during active consultation
Secure Cleanup: Automatic cleanup of temporary files and memory buffers
Local Storage: Generated reports saved locally under user control

This architecture demonstrates how OpenAI's open-weight models can power sophisticated healthcare applications while maintaining absolute privacy — setting a new standard for medical AI deployment.

Challenges we ran into

Model Selection & Optimization: Finding the right model configuration took considerable experimentation. We evaluated gpt-oss-20b vs gpt-oss-120b and various quantized versions, alongside different Whisper model sizes. We ultimately chose our current configuration for the optimal balance between performance, speed, and deployment accessibility — ensuring the system runs smoothly on standard laptops without requiring high-end computers or specialized hardware.

Multimodal Architecture Integration: Designing a complete architecture that seamlessly integrates not only gpt-oss-20b but also another open model like Whisper presented significant challenges. Creating a cohesive multimodal application required careful coordination between audio processing, transcription, and medical reasoning components while maintaining offline functionality and performance efficiency.

Audio Processing Complexity: Managing real-time audio capture while maintaining conversation context was technically challenging. The most difficult aspect was "simulating" stream processing — sending audio chunks for analysis while simultaneously recording the next chunk. We solved this with smart buffer management and overlapping audio segments to ensure clinical continuity.

Medical Reasoning Accuracy: Getting gpt-oss-20b to consistently understand complex medical contexts required extensive prompt engineering. We developed enhanced prompts through multiple iterations, ultimately achieving very good and consistent results that leverage the model's reasoning capabilities to distinguish between symptoms, diagnoses, and treatment plans accurately.

Prescription Detection: Developing reliable prescription recognition and PDF generation required innovative use of gpt-oss-20b’s function calling. We built a system where the model generates structured XML inside reports, which then triggers automatic prescription document creation. XML was chosen over native function calls because it ensures interoperability with healthcare systems, creates a durable and auditable record for compliance, and allows strict schema validation of prescription details. It also supports document-centric workflows with headers, metadata, and signatures, while enabling asynchronous PDF generation and balancing both human readability and machine parsing. This approach bridges model reasoning with real-world medical document standards, making prescriptions both trustworthy and system-ready.

GUI Responsiveness: Creating a smooth user experience while processing intensive AI tasks required careful thread management. We implemented non-blocking operations to ensure the audio wave animations and UI remained responsive during gpt-oss-20b inference.

Privacy Architecture: Ensuring complete offline functionality while maintaining professional medical documentation standards required careful architecture design. Every component needed to work seamlessly without any internet dependency while leveraging OpenAI's open-weight models effectively.

Accomplishments that we're proud of

Revolutionary Healthcare Impact:

Created the first system to automatically transform natural medical conversations into professional documentation using OpenAI's gpt-oss models
75% reduction in documentation time could impact 2.1 billion annual patient encounters globally
Demonstrated that sophisticated healthcare AI can achieve professional results while maintaining absolute privacy using open-weight models

Technical Excellence with OpenAI Models:

Perfect synergy between OpenAI Whisper and gpt-oss-20b for healthcare applications
Advanced function calling implementation generating structured reports with embedded prescriptions in single inference
Complete offline deployment showcasing the power of OpenAI's open-weight model architecture
Optimized to run efficiently on consumer-grade hardware - no specialized GPUs or high-end workstations required
Usable by any healthcare provider regardless of technical infrastructure or resources

Privacy Leadership:

100% local processing with zero data transmission sets new standards for medical AI
Proves privacy and performance are not mutually exclusive using OpenAI's open models
HIPAA-compliant solution accessible globally without cloud infrastructure requirements

Real-World Validation:

Built by a practicing physician who understands actual healthcare workflow challenges
Professional-grade medical documentation that meets clinical standards
Already being tested in clinical environments with promising initial results
Immediate deployment capability for healthcare providers worldwide

What we learned

OpenAI's gpt-oss Models Are Game-Changers: The reasoning capabilities of gpt-oss-20b exceeded our expectations for medical applications. The model's ability to understand complex clinical contexts without fine-tuning demonstrates the sophistication of OpenAI's open-weight architecture.

Base Models vs. Fine-Tuning: We discovered that gpt-oss-20b's extensive pre-training provides excellent medical knowledge foundation, making fine-tuning unnecessary for many healthcare applications. Despite preparing 200+ audio files with transcriptions and 1000 transcript-to-report training pairs for fine-tuning Whisper and gpt-oss-20b respectively using Unsloth, the additional improvements were surprisingly minor. This insight could reshape how we approach medical AI development - the base OpenAI models are already remarkably capable.

Multimodal Integration Excellence: The seamless integration between OpenAI Whisper and gpt-oss-20b creates powerful multimodal capabilities that feel natural and professional in clinical settings.

Privacy-Performance Balance: OpenAI's open-weight models prove that cutting-edge AI performance doesn't require sacrificing privacy. This is crucial for healthcare applications where data sovereignty is non-negotiable.

Healthcare Workflow Transformation: Real-world testing confirmed that natural conversation-to-documentation using OpenAI's models can genuinely transform healthcare workflows without disrupting the doctor-patient relationship.

What's next for MedReport

Advanced OpenAI Model Integration:

gpt-oss-120b Implementation: Leverage the larger model for complex multi-specialty consultations
Enhanced Function Calling: Develop more sophisticated medical reasoning workflows
Model Optimization: Fine-tune models for specialized medical domains and niche clinical applications

Technology Evolution:

Real-time Clinical Decision Support: Expand beyond documentation to treatment recommendations
Integration Ecosystem: APIs for Electronic Health Record systems and medical device integration
Advanced Analytics: Population health insights while maintaining complete privacy

Immediate Global Impact:

Healthcare Provider Partnerships: Collaborate with medical institutions to deploy MedReport using OpenAI's gpt-oss models
Specialty Medicine Integration: Expand templates across all medical specialties leveraging gpt-oss reasoning capabilities
Regulatory Validation: Pursue FDA clearance and international medical device approvals

Global Healthcare Transformation:

Developing Nations Deployment: Bring advanced medical documentation to underserved areas using offline OpenAI models
Emergency Response Systems: Rapid deployment capabilities for disaster medical scenarios
Medical Education: Create training platforms using consistent documentation examples

Vision Realization: Transform every medical conversation globally into professional documentation using OpenAI's open-weight models, giving physicians worldwide the gift of time — time to look patients in the eye, time to truly listen, and time to practice the art of healing rather than the administrative burden of documentation.

MedReport represents the future of healthcare AI: powerful, private, and profoundly human-centered, built on the foundation of OpenAI's revolutionary open models.

Built With

gpt-oss
python
tkinter
transformers
whisper
windows-10