Based on everything I've seen of your project, here's your submission:


Inspiration

Many professionals — therapists, doctors, lawyers, journalists — spend hours manually transcribing and analyzing recorded conversations. We wanted to eliminate that bottleneck and give people a tool that not only transcribes audio instantly but turns it into actionable, structured insights using AI.

What it does

Narralyze is a web application that lets users upload or record audio, automatically transcribes it using OpenAI's Whisper model, and then runs a deep AI analysis on the transcript. Users can choose from different analysis modes — General, Therapy, Legal, and Medical — to get context-aware reports. The final report can be exported as a professional PDF or Word document, saved, renamed, and managed from a personal dashboard.

How we built it

We built Narralyze with Node.js and Express on the backend, using EJS for server-side rendering. Audio transcription runs locally via the Xenova Whisper model through the @xenova/transformers library. AI analysis is powered by the Gemini API. We used MongoDB with Mongoose for user authentication, bcrypt for password hashing, and session-based login. The frontend uses vanilla JavaScript with a custom UI for recording, drag-and-drop uploads, waveform visualization, and a file management dashboard. Reports are generated using PDFKit and the docx library.

Challenges we ran into

Getting the Whisper model to correctly parse WAV headers and extract PCM audio data was tricky — different audio sources produce different header sizes. We also hit Gemini API free tier quota limits during development and had to handle model fallbacks gracefully. Rendering markdown AI output cleanly into both PDF and DOCX formats required building custom parsers to strip and reformat inline markdown, like text formats and Headings.

What's next for Narralyze

We aim to enhance Narralyze by introducing real-time speaker diarization, allowing the system to automatically detect and label different speakers instead of relying on generic labels like “Person 1” and “Person 2.”

In addition, we plan to implement a team collaboration mode, enabling multiple users to share, review, and annotate transcripts collaboratively. This will be complemented by a history timeline feature that tracks changes across multiple sessions of the same ongoing conversation—particularly useful for professionals such as therapists or legal teams who need to monitor case progress over time.

We also plan to add accessibility features to ensure that deaf and hard-of-hearing users, as well as users with speech impairments, can fully benefit from the application. This includes enhanced real-time transcription, visual cues, and other assistive tools to make conversations more inclusive, allowing users who cannot hear or speak to effectively participate and understand discussions.

Note(For Deployed website): If logout option doesn't show and you are in the home page please press the Narralyze button in the navbar to got to the login page.

Share this project:

Updates