posted an update

It's a network of LLMs where the audio to text LLM is connected in series with the text to Image LLM, and the text classification and Image classification LLMs are connected in parallel. In theory it sounds constructive but the Idea of calling four LLMs in a synchronous manner would lead to consumption of serious computing power. I have the setup in my repo, as more advances are made, It would be a good implementation.

Log in or sign up for Devpost to join the conversation.