Inspiration

AI excels at summarization, which gave me the idea to apply it to audio. This would be especially useful for:

  • Summarizing long office meetings and generating meeting notes.
  • Creating action items from appointments with doctors.
  • Keeping records of other professional meetings.
  • Recording and organizing audio notes, then summarizing them for future reference.

What It Does

The tool captures, transcribes, and summarizes content from both live recordings and pre-recorded audio files.

How We Built It

1. Testing APIs and Handling Long Files

  • Started by testing Whisper and GPT-4o-mini APIs to check accuracy and file size limits for long-form content.

2. Storing and Retrieving Files

  • Used secure S3 cloud storage to store and retrieve audio files.

3. Handling Long Audio Files

  • Developed a mechanism to split audio into chunks using FFmpeg if the file exceeds Whisper's API limits.
  • Ensured overlap between audio chunks to prevent missed details and removed duplicate words when merging transcripts.

4. Creating Effective Summaries

  • Designed optimized prompts for generating clear and detailed meeting notes summaries.

5. Summarizing Transcripts

  • Broke large transcripts into smaller parts to fit within GPT-4o-mini's limits and generated a combined summary from individual ones.

6. Building the Frontend (iOS)

  • Worked on improving audio recording settings for AVAudioRecorder, for the best audio quality with the smallest file size.
  • Explored creating a real-time waveform view while recording.

7. Uploading and Processing Files

  • Developed features for fetching files from iCloud/local phone storage and uploading them for transcription and summary.
  • Made the upload process smooth so users don’t need to keep the app open—once the audio is uploaded to backend, process continues on the server. On frontend through polling, progress is tracked and users are notified when the process is complete via a push notification.

8. Displaying and Sharing Results

  • Showed the transcript and summary with markup support.
  • Added options for sharing PDFs of the summary, transcription, audio, and individual transcript or summary text.

9. Managing Past Recordings

  • Created a feature to display a list of all recordings and file uploads, with options to view transcripts, summaries, and listen to recordings.

10. Payment Integration

  • Integrated Superwall dynamic paywalls and RevenueCat for handling in-app purchases and reporting.

Challenges We Faced

  1. Configuring Recording Settings – Balancing audio quality and file size for optimal recording.
  2. Managing Large File Sizes – Efficiently handling long recordings by splitting files into chunks.
  3. Designing Effective Summary Prompts – Crafting prompts that consistently produced accurate and clear summaries.
  4. File Upload and Progress Display – Ensuring smooth uploads for large files and clearly displaying progress for users.
  5. Implementing Push Notifications – Setting up reliable notifications to inform users when the processing is complete.
  6. Designing an Intuitive and Good-Looking UI – Creating a user-friendly and visually appealing interface.

Accomplishments We’re Proud Of

  • Received great feedback from family and friends on the app's functionality.
  • Built a robust backend despite primarily being a frontend engineer. The backend handles audio processing, file storage, transcription, summarization, and notifications efficiently.
  • Designed the best UI in any of the apps I’ve built so far, ensuring an intuitive and smooth user experience.
  • Completed the app in 2 weeks from ideation, designing, development to releasing it on the App Store.

What We Learned

  • Gained deep knowledge about audio recording, file uploads, processing, transcription using Whisper, creating and merging multiple transcripts and summaries, and enabling audio playback.

What's Next for Minat - AI Meeting Notes

Here are some items we plan to add:

  1. Customizing and editing summaries.
  2. Live backup during recording to prevent data loss.
  3. Phone call recording.
  4. Importing audio from YouTube and podcasts.
  5. Syncing across devices.
  6. Detecting different speakers.
  7. Allowing follow-up questions based on meeting notes.
  8. Integration with calendars.
  9. Adding folders for organizing audio files.
Share this project:

Updates