Inspiration

Most current autocompletion tools predict only individual words. We wanted to predict sentences.

What it does

  • Reads in text and creates templates of groups of similar sentences
  • Takes in the start of a sentence and recommends an ending

How we built it

  • Started out by building the base of our infrastructure on a Flask server
  • Developed our method of grouping sentences together
    • Used sk-learn's CountVectorizer to convert individual sentences into vectors
    • Used sk-learn's DBSCAN to group together similar vectors
  • After obtaining groups of similar sentences, created templates that represent their structure
  • Inserted templates into a trie for easy lookup

Challenges we ran into

  • How to best represent sentences as vectors
  • Determining the type of machine learning to use to cluster similar sentences
  • Teasing out a common sentence structure from similar sentences

Accomplishments that we're proud of

  • Accurately group similar sentences and create templates that represent them

What we learned

  • Utilizing sci-kit learn's machine learning library
  • Different methods of unsupervised learning
  • Ways to represent sentences as vectors
  • Creating backend services using Flask
  • Trie implementation and traversal

What's next?

Keep up with what we’re doing:

Daniel Jiang - danieljiang.me

Scott Numamoto - scottnumamoto.com

Share this project:

Updates