Inspiration

We're both hardcore thinkers and have lots of ideas that tend to slip our mind. We want to never forget anything again, and do not bother to write things down in an organized way. We want to minimize the friction between having an idea, storing it, and accessing easily later.

What it does

Scribe is a tool for tracking and managing ideas and thoughts by mapping your brain through voice recordings that get automatically organized in Notion. On the click of a button, talk to your phone and ramble about topics that are relevant to you and that you don't want to forget. This can range from a new business idea that popped up in a talk or a reminder about your academic/work or social duties. It’s instant, so there’s no 'write it down later' risk. It’s organized, so you won’t need to dig through random sticky notes or 27 different apps. And it’s personalized.

How we built it

We used SwiftUI and Xcode for the mobile app which runs on iPhone. The app only contains a button for starting/stopping recording, which then sends POST requests to a webhook hosted on ngrok that then tunnels the request to localhost.

We then have a FastAPI server that listens to the POST request on localhost:3000 and triggers two actions depending on the request:

  1. Retrieve existing Notion page structure using Notion API in the background when recording begins
  2. Send recorded audio to webhook when recording stops.

Process 1 runs the notion page structure retrieval directly at recording begin on our locally hosted server. Process, after recording, then runs our whole pipeline as described below. It consists of 3 main steps:

  1. Transcribe audio using AssemblyAI API speech-to-text model
  2. Merge the notion page structure with the audio transcription and send it to grok, which then process the input to produce a new notion page structure that contains the newly organized contents 2.1. We truncate the notion page JSON to the essential information before feeding it (together with the appropriate prompt engineering) to the model. 2.2. Grok then outputs an extended version of the truncated JSON, which is then rebuilt into a full Notion page using a restructuring algorithm that is based on a general Notion page skeleton that we extracted based on a JSON structure.
  3. In the final step we compare the newly created Notion page to the original (unedited) one to identify the changes made, and do the necessary Notion API calls to update the page structure in the Notion App. These changes are then additionally displayed in a basic streamlit frontend that highlights the modified lines.

Challenges we ran into

  • First time building a mobile app and linking it to a webhook.
  • Maintaining a hierarchical structure of notes and ideas through the LLM was difficult as Grok can not process big files derived from knowledge bases like Notion due to the limit amount of tokens. So we had to come up with a truncation algorithm that simplifies the JSON structure in order to have suitable input, and the a reconstruction algorithm that uses the extracted relevant features to map them back to the original content in order to update Notion on the right spots.
  • It was also our first time using the Notion API so we had to familiarize ourselves with it. The integration with it required some specific definitions as each descriptive element requires certain attributes.
  • Developing a simple version control system that compares the newest and the previous version of files, flags whether stuff has been added, modified, or deleted and merges the information accordingly.

Accomplishments that we're proud of

From building an app that connects to our laptops, over building multiple components that process audio to text to structured output, to having a fully-fledged end-to-end pipeline was incredibly satisfying. Probably the biggest highlight is that it is a PoC that actually works out of the box.

What we learned

iOS development, webhook links, truncation / reconstruction algorithms, Notion API and coordinating the building of an end-to-end pipeline with multiple steps and components that interact asynchronously.

What's next for scribe

Given that we managed to get a PoC that works in 36 hours and the fact that it has proven to be useful to us, it leads us to believe that with enough developing time it could become a standalone product in the App-Store. Therefore, Juan Carlos and Xabi will continue to iterate on it until we reach this point.

Built With

Share this project:

Updates