Inspiration

The possibilities of document classification, we will find a non-exhaustive list of the things we can do with document classification are below:

Classification of file types, Classification of document types, Classification of document languages, Classification of countries of origin, Classification of merchants, Classification of line items, Classification of urgency, Classification of privacy-sensitive data etc

What it does

It classifies a given set of documents.

How we built it

Used PyPDF for text extraction , Doc2Vec for feature extraction and XGBOOST for classification. And Fast API for inference .

Challenges we ran into

Faced problems in implementing doc2vec to generate the tagged docs.

Accomplishments that we're proud of

We extended this model to next level . A basic implementation of Multi Linguistic classifier is done.

What we learned

NLP techniques and the complete flow of an NLP project.

What's next for Multi Linguistic Document Classifier

A very basic model for Multi Linguistic classifier is implemented.

Built With

  • anaconda
  • doc2vec
  • fastapi
  • gensim
  • python
  • xgboost
Share this project:

Updates