SurpriseHaiku

SurpriseHaiku was an experiment that parsed random twitter tweets to see if they followed the Haiku cadence of 5 / 7 / 5 (syllables). I ran this experiment during the 2014 Olympics to try and focus around Olympics related tweets. This is one of my first applications that I built to learn more about the twitter API and to try some different packages.

See: https://twitter.com/surprisehaiku

Examples

A #haiku: https://twitter.com/chasedog6/status/432727303148142592 …

Lets Show Why Baseball

Is The Best Sport RT if

you want Baseball back

A #haiku: https://twitter.com/heyfrazier/status/432727300325388288 …

every time they

show putin i get really

uncomfortable

Dependencies

Python
PyHyphen
Tweepy
The Project Gutenberg Etext of Moby Hyphenator II by Grady Ward

Usage

Download this repo
Install dependencies
- Use either:

pip install pyhyphen
pip install tweepy

or...

pip install -r requirements.txt

Get yourself a registered twitter app then either enter your credentials as environment variables at line 18 of surprisehaiku.py
Run it with the below command!

python surprisehaiku.py

Now you wait for 30 seconds as the script gets a list of tweets and then it will print them to the terminal. You can see which words are missing from the dictionary and how the algorithm chooses which hyphenation is best (comparing the dictionary and the hyphenator).

Example:

ORIGINAL TWEET::
FOLLOW ME, I LOVE YOU, YOU IS MY LIFE, MY DREAM, I LOVE U, BRAZIL LOVES U @ShawnMendes #LOTPgift @taylorcaniff #TayTo1 x108
BY::
z4yumb4e
PARSED VERSION::
A #haiku: https://twitter.com/z4yumb4e/status/483482944573816832

FOLLOW ME I LOVE
YOU YOU IS MY LIFE MY DREAM
I LOVE U BRAZIL
length 126
Number of syllables in each word in the tweet...
                                 DIC  ALG  Best
FOLLOW                            2    2    2
ME                                0    1    1
I                                 0    1    1
LOVE                              1    1    1
YOU                               1    1    1
YOU                               1    1    1
IS                                0    1    1
MY                                0    1    1
LIFE                              1    1    1
MY                                0    1    1
DREAM                             1    1    1
I                                 0    1    1
LOVE                              1    1    1
U                                 0    1    1
BRAZIL                            2    1    2
LOVES                             0    1    1
U                                 0    1    1
ShawnMendes                       0    2    2
LOTPgift                          0    1    1
taylorcaniff                      0    4    4
TayTo1                            0    1    1
x108                              0    1    1

It's far from perfect as parsing something like "ShawnMendes" is tough. But overall it does a decent job finding true tweets and printing them out.

How it Works

Surprise Haiku will listen to the twitter firehose for a random set amount of time. Right now it the "track" or keywords it looks for in tweets in hard coded for the world cup. It will look at those tweets (which are a particularly hard bunch as they include lots of names and countries misspelled) and tries to find a haiku in them. The method for finding a haiku is explained below.

Method for identifying a Haiku

The simplest explanation is:

Normalize the tweets text
Split it by syllable/ each word by syllable
see if we get a 5/7/5 combination

Identifying the syllabyles on the English language is tough. We've got a lot of different ways of pronouncing things. Our application loads a dictionary of syllabic breaks along with words. Then we start analyzing tweets by normalizing the text, removing capitalization, and other punctuation. From there we iterate through each word to see if it is in the dictionary, we then test what our "PyHyphen" hyphenator returns the value to be.

Once we get our values (nearly instantly), we compare them and spit out the like breaks and word values. If it's a Haiku and we have the correct cadence (and the cadence ends at the end of a word) then we tweet/print out the result.

To Do/Improvements

Add in some more natural language processing so that Haikus have to end with certain parts of speech
Add in some natural language processing to throw out words with a lot of typos and or words we won't be able to recognize
Automate this so that it could be deployed to a server/doesn't have to be run manually
Automate adding to dictionary
Reporting mechanism for validation
Add plurals to dictionary
Better dictionary than just in memory/ maybe a database
parse command line arguments for time and whether to print them to twitter, etc
Write tests

Built With

Updates

Bill Chambers started this project — Nov 09, 2014 08:04 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.