Inspiration

In academic settings, Generative AI tools are increasingly being used to aid research. We can access these tools through a web browser, where we can interact with them through text or speech prompts in English to find solutions to our problems. If the AI provides a satisfactory answer, we often want to share it with our colleagues or even the broader public through blogs.

However, despite Gemini's user-friendly interface and built-in sharing options, it(others also) currently lacks several key features, such as:

Audio Sharing : Many people struggle to share audio recordings with friends, especially during exam season when it becomes very useful. Texting out confusing parts is time-consuming. We'd rather speak and hear the answer to questions like "when to use subnetting and VLAN?" This is particularly challenging for those with better auditory memory or visual impairments who can't easily read from screens. They're unable to access the answers in their audio files. The current version does not allow to save audio.

Fixed Audio Accent : With gemini we do not get to change the tone of the audio response. Users only get the default accent for each language. The audio is not automatically played with each response which makes it harder for the disabled to get it played.

Transcription : When using speech prompts, the system doesn't provide a written transcript of the prompt itself. This can be a problem for people who are not native English speakers. If they use a speech prompt in their own language, the AI response might not accurately reflect the original intent because it doesn't have access to the written text.

Text sharing : We only get to save the text response in docx format not the widely used pdf format.

Since these limitations can be a real pain for students, why wouldn't I want to improve my own code and make it work better?

What it does

  1. Transcribe all major languages.
  2. Translate over 120 languages and variants.
  3. Tell stories and poems.
  4. Help in academics as well as in our daily life as an everyday AI assistant .
  5. Get response in different languages and accents, Download pdf and Audio response

Challenges we ran into

The implementaion seemed bit difficult at the beginning but all the services like GPC STT and TTS apis with good accuracy rate made it easier.

What we learned

Learned how machines learn about files like audio or images, how they can convert them to the language (base64) they can understant.

Built With

Share this project:

Updates