Inspiration

Our inspiration is to convert web comics into audio for those who cannot easily read them!

What it does

It converts comics to text and audio. The user can insert a URL into our project, select a language, and it will output the text from the web comics, along with the audio from the text! This project helps those who cannot read, and those who prefer listening to stories rather than reading!

How we built it

We built this application using python, google cloud, libraries such as vertexai(gemini multimodel ai), gtts, pyppeteer, and gradio. First we built the front end and then worked on connecting the text and audio components. We used vertexai to do the image interpretation in order to read the images into text, and used gtts python library to complete the audio component of our project.

Challenges we ran into

We ran into challenges when converting the images into text and audio in an accurate manner. First we were using BeautifulSoup python library, which was only outputting code from html, not the actual text on the images! Also, we had issues using AI tools. Image manipulation was not accurate, and also the AI would continually describe the images into text in incorrect ways!

Accomplishments that we're proud of

We're proud of all of the knowledge we learned about AI tools, and our ability to switch libraries and AI tools when large errors were prevalent when writing this code. The front end of our project has a very sleek and clean design that is very unique and is extremely user-friendly.

What we learned

We all learned a lot about LLMs, none of us were extremely familiar with this prior to this project. Image to text and audio manipulation is a very unique and difficult concept that was unfamiliar with many of us. Also, this project gave us the ability to learn more about different python libraries that are necessary for image and audio conversion.

What's next for TaleCraft

We would love to expand this project by increasing its usability for learning purposes. For example, our project has a section where we can change the language of the text and audio. This could allow people to learn new languages from web comics! This project could potentially help with surveillance systems.

Built With

  • gemini-multimodel-ai
  • gradio
  • gtts
  • pyppeteer
  • python
  • vertexai
Share this project:

Updates