Picture this - you're a sophomore CS student. You want to participate in Hackathon, but you have no ideas. Alas, your life as an upper middle-class college student is so hard, boo hoo. Suddenly, when talking with your English major friend, you're given a burst of inspiration! They hear your woes and suggest you write a program that writes poems to learn more about natural language processing and machine learning. That, my friends, is where I found my inspiration for my first Hackathon project.
What it does
This program reads through a file of poems, and analyzes them line by line and word by word. It finds syllable counts for each word, and it learns about what words frequently follow other words, and tries to apply this to write its own poem. Right now, it can write the following four poem formats: a) Iambic pentameter (10 syllables per line) b) Iambic tetrameter (8 syllables per line) c) Haiku (5-7-5 syllable format) d) Free Verse (literally goes wild and does whatever it wants) From the information gathered from the poems fed into the program, the program will go on to write a poem trying to apply what it learned from the poems to write its own.
How I built it
I started by gathering all sorts of poems to put into a file. I gathered ~150 poems (16 of which have been provided by my good English major friend) and put them in a file for the program to parse through. I then wrote the code that parses through the file, and tokenizes each word. Each word that is found is added to a map with a corresponding vector, and that vector holds the words that can follow the original word based on the information from the poems used by the program. Each word also gets its syllables counted, and that information is stored in a map as well. All the information gathered from the example poems is then applied in an algorithm that writes a poem choosing random words to start off with, and using the trailing words vectors (with frequency as a factor) to determine what words should come after the original word, Syllable count is also taken into consideration for every poem EXCEPT a free verse poem.
Challenges I ran into
Utilizing two maps at once was difficult. Sometimes I couldn't remember which map I needed to iterate when in my algorithm, and I confused the two a lot. I was also not too familiar with iterators, so I got confused at times on how exactly I needed to use a specific iterator to accomplish a particular task. Keeping up with all the conditions in the algorithm as well proved to be tiring. There were many conditions that had to be met or fulfilled at times even just to get one word added to the line, and keeping up with every condition was challenging.
Accomplishments that I'm proud of
Writing my own syllable count function. I was convinced that I was going to have to use a library, but after a few minutes of Googling about natural language processing I was able to figure out a decent algorithm for determining the syllable count for a word. Also the results of the free verse option are either super deep (example: bought from my being buried yet I can survive), or just make no sense (brushed past my phone calls while you could mere toil align thy swift punishment) and it brings me much amusement and joy.
What I learned
The idea of language is much more complex and vast than just "nouns and verbs and adjectives and stuff!". There are many complexities that we take for granted because we learn language at such a young age.
What's next for Poem Writer
I hope to add stricter grammar rules that allow for more cohesive poems, rather than gibberish lines at times. I will need to study more about natural language processing in order to do that.