SuperegoAI

Inspiration

Ever wish you had a little voice guiding you through life? Well, now you do—except this one can’t make up its mind. Meet SuperegoAi, your personal AI companion with a split personality!

This project is inspired by Freud’s superego and id—or as we like to call them, the angel and devil on your shoulder—this chatbot listens to whatever you say and randomly chooses whether to lift you up or tear you down.

Feeling down? Maybe the angel will swoop in with words of comfort and wisdom. Too confident? The devil might just humble you in the most demeaning way possible. It’s all up to fate (and a random number generator). Expect pure, unpredictable chaos—sometimes uplifting, sometime sarcastically honest, but always hilarious.

What it does

SuperegoAI allows user to explore their alternate egos. At the start of the program, users can upload a picture of themselves. Based on that photo, the program will show them their most evil and pure self with a little bit of photoshop. As you go about your day on your device, these two little avatars with polar personalities accompany you on the populated yet lonely stretched of your computer. Whenever you speak, one of them will pop up at random and respond to your thoughts, no matter how serious or casual.

It can be your biggest comfort while you are doing hard tasks such as difficult problems or be your greatest comedic hater on your best day. The fun part is, you never know which personality you are doing to get, allowing for a dynamic and new interactions each time you call your SuperegoAI.

How we built it

This script is creating an interactive voice assistant with a fun twist. It listens to what you say, determines if you're being "good" or "evil," and responds in a fitting way, either helping or offering sarcastic advice.

Main Components: Speech-to-Text: The script records audio using the microphone, then converts that audio to text using Google’s Speech-to-Text API.

Mood Detection: Based on what you say, it decides if your mood is "good" or "evil." If you mention the word "good" or "evil," it uses that, otherwise, it randomly picks one.

GPT-3.5 Response: It sends the text you spoke to OpenAI’s GPT-3.5 model, which responds with either helpful or sarcastic advice, depending on whether the mood was "good" or "evil."

Text-to-Speech: Once GPT gives a response, the script uses Google’s Text-to-Speech API to convert that text into audio. The tone of the voice changes based on the mood — "good" gets a neutral tone, and "evil" gets a lower pitch, making it sound darker.

Audio Playback: The generated audio is saved as an MP3 file, and it’s played back to you using the Pygame library.

UI with Tkinter: The script includes a simple graphical interface using Tkinter. It has a "Persona Photo Booth," which seems to show different personas (like “good” or “evil”) based on the responses. This part is custom and might involve displaying pictures or special graphics based on the mood.

Threading: The script continuously listens for your voice, processes it in the background, and uses threading to run the voice recognition and the graphical interface at the same time. So while you're talking, the assistant can respond and interact with you.

How It Works: The program first records 15 seconds of your voice. It turns your speech into text. Based on your words, it picks a “mood” (good or evil). It asks GPT to generate a response according to that mood. The response is then turned into speech and played back to you. All this happens while a graphical interface (created with Tkinter) runs in the background.

Challenges we ran into

Connecting the API: we were trying to connect several APIs from Google Cloud with IAM (identity and access management) and JSONs. Unfortunately, the APIs were not returning the correct output or not recognizing the input, so we had to reconfigure almost all of our APIs.

First time configuring and coordinating multiple libraries, APIs, and file inputs and outputs across 5 different files- a lot of dependencies had compatibility errors and limitations such as computer input was unable to be accessed by the WSL. Eventually, we had to research and read specific documentations to figure out ways to resolve errors and test our different libraries that aligned with our goals more tightly.

Accomplishments that we're proud of

Configuring APIs- for this project, we had to learn about APIs from scratch. We walked into this project without knowing what an API is, and we walked out having configured 3 different APIs, debugged a variety of API-related bugs, and a solid understanding on the concepts.

Applying past knowledge on more practical applications: This is also our second hackathon. Our first hackathon team comprised of 2 sophomores (us) and 2 grad students from another school that we met by chance. During our first hackathon, we learned the general in-and-outs of hackathons. We are glad to have this opportunity to apply our newly found knowledge on another hackathon and see how much we have improved. We made sure to leave more time to do our demo and tried to plan out our demo of your project more efficiently, which was a major point deduction area for us at Wayne Hacks.

Additionally, to build this project, we had to use concepts that we learned in a very controlled environment on more varied environments without our usual guidance. Despite the increased difficulty and errors, we were glad to be able apply these abstract concepts on something that we are actually passionate about. For instance, setting up different environments and building niche libraries such as differences between different OS.

Functional Project: With the number of things that did go wrong, we were glad that we ended up with something right--aka something that ran and worked correctly after the hours and caffeine poured into it.

Dedication and effort: Both me and my teammate had to step out of our comfort zones to code this project, learning new functionalities and systems within 24 hours.

What we learned

As mentioned before, we learned what an API is, common implementations, and common pitfalls.

We were also to learn how WSL interacted with the Windows system-- that it was almost like a virtual environment and could not connect to any input systems of the underlying computer systems.

We also learned how to look at new command line commands and permissions and how to configure them properly so that the right libraires could read, write, and/or execute the correct files for the environment to come together.

What's next for SuperegoAI

Better GUI design: we would like to make the user upload GUI executed at the beginning of the program to be more stylish

Animation: Instead of just having the avatar pop up, we would also like them to have synchronized mouth movements to the output audio.

Reading Screen input (especially for IDEs): Mose coders, especially college students, code for hours in their rom, isolated from the world (speaking from experience). Sometimes having a friend that either gives you coding advice brutally or comforting is just what we need when coming to a slump. We plan to expand this so that it can read inputs based on what the user has on screen and try to give advice or make comments on the code based on teh differences between feeds (usually in 15 second intervals)

Built With

Submitted to

SpartaHack X
- Winner [Mad Scientist] Winner of the Track

Created by

Configured team virtual environments to allow for API connections and audio input and output. Implemented the front-end using Tkinter that allows for user image upload and receive input audio directly from the computer. Successfully integrated backend API systems with the front-end.

Sabrina Lin
Owen Druzgal

Updates

Owen Druzgal started this project — Feb 02, 2025 11:58 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.