🧠 Inspiration
In a world flooded by distractions, jotting down a thought before it disappears can be harder than it sounds. We wanted to capture those fleeting ideas effortlessly, to let speech flow naturally into structured, editable notes. The inspiration behind Voice Notes was to bridge intuition and code, to turn the act of thinking aloud into a seamless workflow built from the ground up in low-level C++.
🎙️ What it does
Voice Notes listens, transcribes, and remembers.
It records audio directly from your microphone, converts it into text using a native C++ implementation of OpenAI’s Whisper model, and displays synchronized voice–text notes inside a minimalist, SFML-powered interface.
Each note exists as a pair of .wav and .txt files - editable, replayable, and fully offline. With a single hotkey, your voice becomes structured memory.
🧩 How we built it
Voice Notes was built entirely in C++, combining:
- SFML for real-time graphics, window management, and microphone input
- Whisper.cpp (GGML) for efficient on-device transcription
- Low-level file I/O and threading to synchronize recording, saving, and UI rendering
- Custom state management for settings, hotkeys, and data persistence
No frameworks, no web stack, just system-level code designed for raw performance and control.
⚙️ Challenges we ran into
The main challenge was working close to the metal: managing multiple threads for recording, processing, and rendering without crashes or deadlocks. Integrating Whisper at a native level meant navigating memory alignment, audio resampling, and cross-platform quirks. We also faced the complexity of building a GUI from scratch in SFML, handling inputs, focus, and asynchronous behavior without the safety net of modern UI libraries.
🏆 Accomplishments that we're proud of
- Running Whisper transcription natively in C++ with no Python bindings
- Building a complete offline voice-to-text workflow with real-time audio capture
- Designing a responsive and minimal UI from the ground up
- Creating a system that’s both low-level and user-friendly, where efficiency meets simplicity
📚 What we learned
We learned how much depth lies beneath “simple” applications. From managing buffers and sample rates to handling Unicode input and multithreaded state synchronization, Voice Notes forced us to think like systems engineers. We also gained a deeper appreciation for how low-level optimization and thoughtful UX can coexist beautifully.
🚀 What's next for Voice Notes
Next, we plan to:
- Integrate speaker diarization and sentiment detection
- Add cloud synchronization (optional, privacy-respecting)
- Introduce searchable transcriptions and note tagging
- Port the core engine to a cross-platform mobile version
- Optimize inference using GPU acceleration via Vulkan / CUDA
Ultimately, we want Voice Notes to evolve into a personal companion for thought, one that listens, understands, and organizes, all while keeping your data yours.
Built With
- c++
- sfml
- whisper
Log in or sign up for Devpost to join the conversation.