Motivation:
Difficult conversations can be tough for several reasons, including fear of conflict or rejection, lack of confidence, emotional investment, power dynamics, cultural norms, and lack of skills. These conversations often involve discussing sensitive or personal topics, which can be emotionally challenging for both parties. Some examples include talking to new people, saying No to friends, talking about your feelings on your first date or negotiating your salary with a HR. Most often we are not prepared for these scenarios, but practicing these scenarios before we actually face them can reduce social anxiety and build confidence in a lot of people. Research also suggests that around 63% people find it hard to start a conversation around sensitive topic and many people also suffer with social anxiety and isolation due to lack of these communication skills .
Our Solution
We want to build an AI-based application that acts as a speech buddy and enables users to navigate through these difficult conversations. Our tool is powered with Language, Speech and vision technologies to closely resemble the real world setting to practice a difficult conversation.
Core Features include:
Scenario-specific conversations: We fine-tuned the GPT models through prompt-engineering to adapt to the customized scenarios that the users want to practice. For instance, if the user wants to improve on negotiation skills, the bot would act like a HR and if the practice scenario is a first date, then the bot would talk like your romantic interest on a dinner date with you.
Speech and Text Chatting: Our AI tool can converse with the users through both text and speech. This provides the accessibility to users to talk through multiple modalities.
Humanized AI-Avatar: We have an avatar integrated into our application that can basically take the image of the person you want to have a conversation with and is animated in accordance to the speech generated by the AI chatbot. Overall, the AI Avatar can serve as a proxy for the real human to practice real-life conversation.
Personalized Feedback: Along with the generic speech use-case, our intelligent AI tool will also provide the relevant feedback based on the textual content. The feedback tried to quantify metrics like confidence and also suggestions on the phrases that could be potentially replaced/ avoided.
How did we build it
We used a sophisticated tech stack to build this product and handle different use-cases. We developed a web-based application with front-end built on React JS and backend database with Convex. Our backend Convex server is also connected with the Flask which interacts with the OpenAI API to get the responses from the GPT, Azure model for the Text to Speech service and a Wav2Lip model hosted on Modal to support the AI avatar animation.
Challenges Faced:
We faced technical challenges as we were trying to get the credits to host the models on different services like OpenAI and Modal. Our major difficulty was also to come up with an end-to-end pipeline with minimal latency in the conversation and have a seamless user experience. With all of us being very new to web development, we also found creating a complex chat application with language, video and audio to be a test to our technical abilities.
What's Next?
This tool has a lot of potential to disrupt the education and healthcare industry. We use the cutting edge technology of LLMs and Deep Learning models to provide a seamless and human-like conversation experience to the users. We believe that this could be a great user product that can be a personal coach and buddy to have conversations and learn from them. We want to work further on this to develop more features, conduct user research and release a completely working app to users to test different hypotheses. We believe that the app can work well as a consumer product that could be released as a freemium version.

Log in or sign up for Devpost to join the conversation.