Inspiration
The possibilities of document classification, we will find a non-exhaustive list of the things we can do with document classification are below:
Classification of file types, Classification of document types, Classification of document languages, Classification of countries of origin, Classification of merchants, Classification of line items, Classification of urgency, Classification of privacy-sensitive data etc
What it does
It classifies a given set of documents.
How we built it
Used PyPDF for text extraction , Doc2Vec for feature extraction and XGBOOST for classification. And Fast API for inference .
Challenges we ran into
Faced problems in implementing doc2vec to generate the tagged docs.
Accomplishments that we're proud of
We extended this model to next level . A basic implementation of Multi Linguistic classifier is done.
What we learned
NLP techniques and the complete flow of an NLP project.
What's next for Multi Linguistic Document Classifier
A very basic model for Multi Linguistic classifier is implemented.
Built With
- anaconda
- doc2vec
- fastapi
- gensim
- python
- xgboost
Log in or sign up for Devpost to join the conversation.