Radient Gradient

We created a tool that allows users to explore the best ways to tag their German documents.

What we do:

We read and pre-process text from any type of text file (including .pdf, .doc, etc)
We execute key phrase extraction using a number of methods (user specifiable in our CLI tool): RAKE (Rapid Automatic Keyword extraction), KEA (Keyphrase Extraction Algorithm), gensim's keywords method (based on lemma TextRank) and a modified text rank with word2vec word embeddings.
We developed and made available an evaluation metric to compare generated tags vs some golden (e.g. human annotated) tags: we use the word2vec vector space model and cosine similarity to determine whether predicted tags are somewhat similar to those chosen by people.