NOTE:

the video does exceed the limit, we would be really appreciative if you played it in 2x speed to be conform to the limit because we have not been able to edit it in time, we're really sorry for this https://www.loom.com/share/169fb8acd08b4397af336bda1d1298a6?sid=466517e4-5214-4e6a-b8b7-09b537b13922

Inspiration

We were inspired by how impressive the new xRx multimodal solution by Groq so we wanted to explore a similar solution through a combination of Hexabot plugins that make the resources much more customizable and usable In many more scenarios and for more people.

What it does

Our solution provides x2x which takes any modality as an input (text, audio, image) and outputs 2 possible modalities (text, audio).

How we built it

This is done through our provided helpers; speech-to-text and text-to-speech and through our Vision and x2x plugin blocks which use the helpers to integrate the multi-modality compatibility.

Challenges we ran into

We ran into some challenges both conceptually about how exactly to integrate our solutions (through helpers and/or plugins and what APIs to use) and also with the code, especially when trying to manage the attachment system in Hexabot. But the team was more than helpful in the process which made the experience much better.

Accomplishments that we're proud of

What we learned

We've learned many things considering the relatively big size of this open-source project for such a short span of time. Especially given the fact that we weren't too familiar with Next.Js among the full array of features and services provided by Hexabot. It was definitely a rollercoaster.