Inspiration

Building an AI journal in a previous hackathon, I realized that managing prompts feels like "magic". You see someone post on LinkedIn "190 hacks for the ultimate ChatGPT prompt". But, how do you know one prompt actually improves performance? There's currently no easy way to manage, and track prompts and see how they perform in the real world.

What it does

One place to manage your prompts: see prompt version history, compare performance between different versions, and integrate with New Relic to monitor the reliability of your prompts (latency, costs, frequency). It's super easy to integrate PromptStore into your project, just add a few lines from the PromptStore SDK to pull prompts into your application. You can easily swap out and version control prompts without changing any code, and even AB test various prompts to find the best performing prompt. You can compare

How we built it

React App as front-end, Supabase as database to manage all the prompts, Amplitude integration, NewRelic integration. Python and React for the sample JournalGPT application.

Challenges we ran into

Prioritizing what actually should be working and what can be done as 'fake'. Learning about the tools while developing the integrations - we weren't Amplitude or NewRelic users before this hackathon. However, the hackathon and vendor crew and was super helpful in teaching us how to use these.

Accomplishments that we're proud of

We got a working application! We were able to get PromptStore working with a database backend (Supabase), prompt storage, and even working with a sample JournalGPT application. The app lets you manage prompts, version control them, and swap them out dynamically without any code changes. You can then compare different prompts and see how they perform comparatively. We integrated with Amplitude and NewRelic to be able to track the performance of your prompts both in terms of latency, cost as well as user data such as retention and user satisfaction.

What we learned

It's super powerful to be able to track and swap out on the fly, while tracking performance of various prompts and models! We saw immediate value being able to edit, track and version prompts - this is a powerful use case to let your data scientists and ML engineers do what they do best, without requiring them to deploy new code for every small change. The performance metrics of each prompt are great to make sure we're improving performance of our prompts over time.

What's next for PromptStore

Launch and get some users. We plan to add more features including ability to define custom metrics, more powerful A/B testing, and SDKs for several popular languages. In the future, we can also expand into automated prompt tuning that can be done by an AI without any human intervention Some other features: better interface for "chat" prompts; actual editor for prompts, and a debugger to investigate what made your prompt performance go down or up; highlight changes in the newly made prompt (similar to github)

Built With

Share this project:

Updates