What it does

How I built it

  • Download a scan of 1912 Laird and Lee's edition of Websters
  • OCR all the pages (pyOCR)
  • Match up continuous lines of definitions (lots of heuristics)
  • Identify illustrations
  • Identify syllables, and final word rhymes (using Pronouncing, CMU Pronounciation Dictionary, NLTK, and some guesses)
  • Sort rhymes by closeness to iambic meter (stress/unstress)
  • Format rhymes by popular schemes
    • Ballade: ABABBCBC" -> "BCBC"
    • Cinquain: "A,B,A,B,B"
    • Alternate Rhyme: ABAB
    • Limerick: AABBA
    • (some short/long rules, based on what sound right more than any research)
  • Generate poems, with title referring to the definitions
  • Format generated poems as images (PIL)
  • Dynamic title line breaks
  • Pull out square-ish images from dictionary pages, create a mask, overlay onto header

Challenges I ran into

Accomplishments that I'm proud of

  • identifying the actual definitions from the OCR

What I learned

What's next for Definitions

  • Fix OCR, different source
  • Page numbers
  • Improve styling
  • Share the code!
Share this project: