Inspiration

The world of Artificial Intelligence, trying to make a better and more accurate conversational AI based on how the user would want their response to be like.

What it does

It is a Voice-Driven AI where users can change the type of response (e.g., chill gen z, professional, and hype friend). The AI will respond differently based on the user's option chosen in the UI. Not only that, but the chat history is also kept to ensure the conversation continues with the specified context.

How it was built

The project was built using the React SDK, using API keys of ElevenLabs and Google Cloud Platform (GCP). It was developed using the new Antigravity IDE by Google.

Challenges I've ran into

API API Usage and pricing was definitely the main challenge, but it was turned into an opportunity. A minimum of RM40 is required to be billed first into GCP. Although the usage was only about < RM0.05, it would definitely help with future projects. ElevenLabs API Usage for the free trial was also finished before development ended. Therefore, I was required to get the starter subscription for more credits. Even though the subscription was required, it unlocked more features which I have take advantage of. For instance, using the "create or clone" function using my voice for this project.

Code logic The code logic for ensuring the application does not bug during speaking, recording, or thinking. Not only that, but to also add "safety codes" to ensure the application does not crash or bug out.

GCP API integration(most stressful) As a person who was new into GCP, I was struggling to find out how to create the API Key, and although once I realized how to create, it still did not give a response (which i found out later is because it was not under a billing account).

Accomplishments that I'm proud of

Honestly, this is my first project that connects GCP with ElevenLabs which made me realize that with both of these platforms can create extraordinary projects. This was the first time I'm using the API key provided by GCP (although currently used had no restrictions), I realized there were specific restrictions we can put to to enhance security.

What I've learned

To always double check if there are any API keys in the project where the public can easily access. The first time i pushed the project into Github, I realized that my keys were in a normal config file instead of .env.

What's next for VoiceDirect

Training, Prompting, Chat History, Clearer Voices As a person who is up-to-date with current trends, memes, or any news (even with the tiniest info), I am thinking of adding a training model and add more specific prompts based on the most recent trends. Not only that, but enhance the UI by adding a visible Chat History for users to be able to see. Most importantly, a more clearer voice of AI and to make sure it sounds as realistic as possible.

Built With

Share this project:

Updates