Young kids beginning to learn to read are typically helped with illustrations; picture books enable the child to develop associations between words and the physical world that they have already been actively working to understand for 4 years. However, not all children develop reading skills at the same rate; as a result, some children may fall behind as they are asked to approach picture-less stories. The same idea also applies to those attempting to learn a new language; pictures help to illustrate the point while not giving the answer completely away.

What it does

Illustratr is a web service that automatically generates illustrations for a given piece of text. After illustrations are generated, an easy-to-use interface allows the user to view the illustrations inline with the text, as though it were a complete book. Part of that interface includes the ability to detect swipe gestures, denoting turn pages and maintaining an interactive part of reading.

How I built it

The process begins with a natural language processing algorithm built with nltk that segments the text into "scenes", where each scene will have one illustration. After that, key words falling under one of four categories are extracted: context (background), subject, verb, object. With these key words, the Bing Image Search API is used to collect candidate images for each of these categories; the candidates are ranked based on defining factors (i.e. the subject ideally has an easily removable background) and the top candidate is selected to form the final product. The final product is generated by superimposing the subject and the object onto the background in a contextually aware fashion.

Challenges I ran into

Accomplishments that I'm proud of

What I learned

What's next for Illustratr

Although the schema used to determine the imagery accounted for verbs ahead of time, there was not sufficient time to detect subjects that were embodying those verbs. As a result, this is a crucial next step. In addition, the image synthesis step still requires a lot of fine tuning: images do not always turn out to make sense.

Share this project: