We love comics. Since the earliest traces of human existence, we have always endeavored to tell our stories by pictorial description. These art forms include what we call Comics today, and people from all cultures of the world have their own version of pictorial art and storytelling. My brother like many people across the world, is visually impaired and therefore cannot partake of the rich and fulfilling experience of reading a comic book. And over the years I have discovered that not knowing/understanding a foreign language has been a roadblock to reading, understanding and enjoying comics from foreign cultures and languages. One inspiration directly was the fact that some great Manga that has been published has very poor or sub par translation, and many classics don't have complete or any translations at all. To overcome these issues. we decided to build ComicVision. An open source AI tool, powered by Azure, which reads, translates, and speaks out comics in a variety of source and target languages, therefore making the wonderful world of comics from around the world accessible to all.
What it does
To start, the service requires inputs of digital images of comic book pages. The system then parses in the pages, converts to grayscale, and segments the pages into panels and speech bubbles using opencv segmentation techniques. We use histogram equalization to highlight contrast to extract usable text. Then we run these segments into an Azure OCR service to detect the language and give us characters for the specific bubble. Then, we call a speech to text Azure engine which says the speech bubble out loud, and we initialize it with either a male or female voice depending on the characters in the comic. Finally the output images are also viewable and the speech is replaced in the bubbles with the translated output.
How we built it
Azure notebooks: Code host, central process OpenCV: image segmentation, speech bubble isolation Azure Cognitive Services OCR: Text from speech Azure Cognitive Services Vision: Character recognition Azure translation: translation to and from target languages Azure Speech: Text to speech
Challenges we ran into
Segmenting images which are not homogenous is not easy Language orientation differences: left - right, top- bottom and vice versa Text outside speech bubbles Text with different coloring
Accomplishments that we're proud of
For the most part, some of our favorite comics (tintin) worked for all the samples we fed the system
What we learned
User testing is key for this type of product semantic, typographical and cultural differences in languages is not easy to overcome
What's next for Comic Vision
Testing, improvement for speech services, better support for Eastern (pictograph alphabet) languages