There are hundreds of programs that use image recognition and computer vision (CV) to classify images to concepts. What if we went the other way? I wanted to build a program that created images based on descriptions of a scene. Imagine Google Images, but for images that don't exist.
What it does
- You input a sentence
- My algorithm loops through, creating entities based by part-of-speech tagging (nouns, verbs, adjectives)
- Images are downloaded for each entity
- Images are transformed based on their relation to each other
- Your creation comes to life
How I built it
I created two classes for the entity recognition, or "chunking" algorithm: Entities (Nouns/Things):
- Contain an image with a picture of the object
- Can be combined to make a new entity
- * Several images can be combined using a custom merging algorithm I wrote:
python #merges sequential entities using a generic combining transformation def mergestreak(subrange): subrange_length = next_power_of_2(len(subrange)) while (len(subrange) < subrange_length): subrange.append(blankimg) print("adding a blank image!") print("Subrange Length:" + str(subrange_length)) while (len(subrange) > 1): for i in range(0, len(subrange), 2): print(i) subrange[i] = combine(subrange[i], subrange[i + 1]) showimg(subrange[i].image) del subrange[1::2] return subrangeLinks (Prepositions/Relationships between two Entities):
- Prepositions such as "on", "at", and "by" describe how two Entities relate to each other in the picture.
- Using a dispatching function pattern, I assigned a function to each preposition that combines two Entities in a special way.
Verbs and articles are ignored. Adjectives are noted, but not implemented.
Challenges I ran into
- Images are more challenging than I initially thought. Managing the alpha channel and other image aspects provided a learning curve.
- Creating a custom morphing algorithm was something I didn't get to.
- I wish I had thought of using CoLab IPython notebooks earlier as a way to host the program.
- Although Python is still my favorite language, I discovered the drawbacks to dynamically typed languages this weekend. When you're trying to hack something together, it's easy to send the wrong data type to your own function.
Accomplishments that I'm proud of
- First hackathon!
- Entity recognition and Entity merging provided two challenging problems. Managing a changing list of entities tested my array manipulation.
- Created something new.
What I learned
- Lots of Python I didn't know before:
- * Debugging with PyCharm
- * Functional Programming using dispatch pattern
- * Complex array functions and algorithms
- Be careful when designing abstract programs - you'll have to rewrite almost everything.
What's next for Image Prototyper
- A better image morphing algorithm
- Support for adjectives