1. Developed a Backend Server with Flask
Built a Flask-based server to handle user requests, providing an endpoint for uploading PDF files and processing them for text extraction and entity recognition.
2. Implemented PDF Text Extraction using PyPDF
Utilized PyPDF to extract textual data from PDF documents uploaded by the user, handling multi-page documents and preparing the text for further processing.
3. Text Preprocessing
Cleaned and structured the extracted text to ensure it was ready for analysis by removing unnecessary characters and formatting.
4. Integrated Named Entity Recognition (NER) with spaCy
Integrated spaCy's pre-trained NER model to automatically detect and identify entities (such as names, dates, organizations) from the extracted text.
Log in or sign up for Devpost to join the conversation.