What it does
Just as the eponymous avian is "notable for their superb ability to mimic natural and artificial sounds from their environment" [Wikipedia], the Lyrebird app was designed to mimic the Tweets from given Twitter accounts). Unfortunately, despite some careful planning and some interesting NLP ideas, the out put is decidedly moderate. On the other hand, Lyrebird comprises a variety of interesting technology making it a challenge to built and providing an expandable basis for further improvement.
How we built it
- Twitter scraper and pre-processor which removes problematic sub-strings such as @-mentions and URLS.
- A popular NLP toolkit for Python, NLTK, was used for tokenising the Tweets and tag the words with their respective Parts of Speech.
- A custom n-gram module is used to generate 1-grams for words from both the target Twitter account(s) and a large corpus of around 73,000 blog posts, and 3-grams for Parts of Speech from the target Twitter account(s).
- A Database basic management system for Amazon's Dynamo DB was programmed in Python.
- Likewise, Amazon AWS was used for data ingestion and processing. This has the potential to become very parallel, speeding up ingestion times.
- A sentence structure is generated first using the 3-grams for Parts of Speech. The idea behind this is to create a readable sentence structure before fitting specific words.
- On top of this general structure, we analyse the necessary part of speech for each word, as well as the 1-gram of the word before, to generate the next word.
- We dynamically choose, based on a probability model, which Twitter account (or blog corpus) from which to choose a word first, and fall back to the others if the first choice fails to deliver. Using this method, we had hoped to create amusing combinations of Twitter accounts, e.g. Trump and Mary Berry; or Beyonce and Stephen Hawking.
- The sleek user interface was lovingly handcrafted over the entire course of the Hackathon, and is probably one of the most striking parts of the project. It is just a pity that the Tweet generation is too slow for the dynamic text entry to have the desired effect.
Challenges I ran into
- AWS and Dynamo DB were both very challenging to understand and implement in the 24 hours we had, having never used them before.
- Despite careful planning, keeping a flexible code structure was tricky, especially as time pressed on and tiredness set in.
Accomplishments that I'm proud of
- Keeping working together effectively as a team despite fatigue and pressure.