Inspiration

Modern lifestyles, characterized by digital interactions and often physical isolation, have led to loneliness and other mental health issues becoming widespread across the world. In many cases, those affected do not receive the help they need because of the stigma surrounding mental illness, the cost of therapy, or the lack of immediate support. Our team sought to solve this issue by prototyping a physical AI mental health expert who could provide a more accessible option for those who need it.

What it does

At his core, Tars is a listener. With an empathic attitude and a smile on his face, he has been engineered to hear out the user and provide authentic, helpful advice. The user can speak freely with him, and he will provide a response through text-to-speech technology and through his characteristic pixel-art facial expressions. While not yet a full substitute for therapy, Tars has immense potential as a companion willing to provide a judgement-free space to chat or seek advice from at all hours of the day. Additionally, your privacy is a top priority for Tars, and he will never send any data over the internet. All his insights herald from secure, locally-run models.

How we built it

An LED matrix and a raspberry pi which gave us the initial idea to make an animated assistant. With the help of open source drivers to control the matrix, we wrote C++ scripts to display animations. Next, we set up speech-to-text using whisper.cpp and ran it on a separate desktop computer to make Tars more responsive. The transcription was then fed into a locally-run instance of LLaMa 3 while maintaining the full context of the conversation. We tested many prompts in order to find one that would most effectively communicate like a therapist and provide the emotional context to each response. We sent the text to Piper to be vocalized and forwarded the emotions over HTTP to the Pi for it to display an appropriate facial expression on the LED matrix. The facial expressions were animated frame by frame and the Pi switches between them with custom C++ logic.

Challenges we ran into

  • Getting LLaMa 3 running was somewhat difficult as it was released to the public 3 days ago.
  • Efficiently timing the switches between facial expressions required using shared memory to communicate between C++ processes.
  • We lacked level shifting hardware to raise the Pi's 3.3V logic to 5V, which led to graphical anomalies (which looked cool).
  • We wanted to run everything locally instead of using public API's, so in some cases resorted to connecting our services via SSH tunneling.

What's next for Tars?

Tars would greatly benefit from better physical design. We could create a custom PCB for a more permanent connection between the Pi and the LED matrix. Tars could also be packaged in a 3d-printed housing to increase durability and ease of transport. On the software side, we can always advance the quality of our models and prompts. This would make Tars more engaging to communicate with and reducing latency would allow users to communicate more naturally with him. Additionally, we hope to increase Tars' expressiveness by creating more animations and by enhancing our sentiment analysis.

Open Source Software Used:

Built With

Share this project:

Updates