DigiFriend

DigiFriend Logo

Inspiration

I was reading an article one day, and something about Autism popped up. I was curious, so I researched it more. Turns out, there is also a 'meme', that if someone acts childish, they are called Autistic. This hurt me greatly, as I realized that Autistics are just misunderstood, and want to see the world in their own Childish Imagination. In the United States, people are more understanding, but in a country like India, they are only laughed at. I wanted to do something for this, and hence, I started work on DigiFriend.

What it does

First, the parent needs to fill in their child's details like Name, Likes, Dislikes, Preferences, etc. Then, a custom DigiFriend model is loaded for each specific child. Whenever the child wants to talk, he just raises the band, and the capmera starts taking face data, and smartband starts audio recording. As soon as the emotion is captured, DigiFriend speaks the introline, and if the child is angry, it calms it down. DigiFriend also relates complex school concepts (Like Algebra) to everyday concepts such as a seesaw! This helps the child get the Friend they never had. Also, every hour the data of what the child is talking, mood swings, learning outcomes, etc is sent to the Parent so that they can get to know and understand their child better.

How we built it

(i) The input The camera on the cap sends the live feed to an app called ‘Lookcam’. Lookcam is also installed on my computer (which is the server-side), so my code has a function to take 30 screenshots of the lookcam window. This way, I get the face input. For voice, the smartband records the voice of the user using Sounddevice, and sends it to the code for feature extraction, and prediction.

(ii) The prediction
Using the library FER for facial emotion prediction, we get the emotion intensities for the face. Next, we get the voice emotion prediction. First, we get the raw data of the audio frequencies using the library "Librosa", then we train a model based on synthetic data using scikit-learn, to know how to recognize emotion. Then, we convert the frequencies to NumPy arrays and compare them to the synthetic data. This way, we get our voice emotion intensities. Now that we have both the emotion intensities, we compare all of them. If both values of the same emotion are maximum, then we choose that. In cases where both are different, we sort them on the basis of "groups." Each emotion belongs to a group, for example, happy belongs to the group (Happy, Surprise and Neutral & Happy, Angry and Neutral). So now, we take the intensities of a similar group, and then choose the emotion.

(iii) The Chatbot
Once we get the variable "final_emotion", we use that to build a condition for our chatbot. Based on the final_emotion, we choose which introline to speak. Then, we use webaudio api to allow the user to speak to the watch, and gtts to make the watch speak. When the user responds, the chatbot checks whether the response is found in the dataset or not. If yes, it takes the response to the response from the dataset, and if no, it generates it using blenderbot.

(iv) The Deployment
The flask app is run on my ip address. I have a domain with an A record pointing to the ip address. Then, using median.co, I converted the website to a PWA. Then, I changed up the build.gradle a bit, for it to support wear os, and uploaded the app to the smartband, and it works!!! The hardware is just a smart band and a camera.

(v) The Parent Side App

The Parent side app uses the same deployment strats as the the main childside interface. The parent first has to enter the child's details, which is then stored in a local Sqlite3 database. The reason it is local is because of Data Security. Whenever data needs to be accessed, the database is pushed to drive, then immediately deleted on the cloud 5 secs after a confirmation is received. However, the database still exists locally, which greatly improves security. Then, the data of the child is accessed by main DigiFriend child side, just accessing the Database.

Challenges we ran into

There were multiple challenges I encountered while making DigiFriend, such as:-

(i) I tried using different Language Models to fit my need, but none of them worked. Eventually, I found that instead of fine tuning or trying to make my own emotional AI (like Pi), a separate model layer would work perfectly, so I used Llama 3.3, and added the emotional support and customizations as a system prompt (which worked!)

(ii) I tried building my application using the kivy framework, but that didn’t work, so I had to switch the whole thing to flask.

(iii) I tried using a subdomain to get the flask site working, but since that didn’t work, I ended up using a domain.

However, One of the biggest challenges, or rather opportunities was doing the project solo. I got no, or very negligible help from any quarter, and things were at times very frustrating, with the ever-present stress of the timeframe, but it all helped in upping the ante. This was a very thrilling experience, and I’ve learnt a lot.

Accomplishments that we're proud of

1) DigiFriend (veryy early) was showcased in front of the American Chamber Of Commerce, India, and I got to meet a lot of representatives of big tech companies who saw a promise in the idea.

2) Won the Intel AI Impact Festival 2024 under the Impact Creators (6-8) Category

What we learned

From working on DigiFriend, I have learnt much about TensorFlow, Torch, and how to use different LLMs. I also learnt about the different use cases of AI before realizing what I could use to make DigiFriend. I’ve improved my knowledge about python, and all the different libraries in it.

Making DigiFriend helped me learn the following non technical skills as well: -

a. Time management

b. Effective planning

c. Error Handling

What's next for DigiFriend

1) Since the headgear is currently quite bulky, one potential Improvement that could be done is Integrating a camera on the smart band itself. This way, whenever the child raises the watch, the prediction would start.

2) I'm currently working on a new mode - DigiFriend for Stress. It is mainly designed to assist in Employment Assistance Programs. Currently, there is a psychologist who listens to the employees, and allows them to vent. However, there are two main problems in that: a) It's too expensive and b) People don't really open up for fear of repercussions. So, what DigiFriend stress mode is simple. First, it classifies the user into three stress levels:-

a) Minor Stress - Loads the flash model

b) Medium-High - Loads the full sized model

c) Very High (Suicidal Thoughts, etc) - Connects to a psychologist

This accomplishes four things:-

i) The cost is cut down by a significant amount

ii) The problems and stress points would be anonymously shared to the feedback team at the company, so the workplace is more employee friendly.

iii) It isn't completely AI Dependent, since if it's high, it goes to the psychologist.

iv) Each person has their own custom DigiFriend based on their preferences.

This is a version I am building right now. Hope to finish it by March.

Built With

elevenlabs
flask
groq
gtts
llama
numpy
pandas
python
pytorch
scikit-learn
sqlalchemy
sqlite
ssl
tensorflow
transformers

Updates

Aadit Kashyap started this project — Feb 23, 2025 12:03 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.