We know TBs of data are being generated every day in this world the digital era. Getting insights from the text data is a big challenge and that's why we are working on TextGenix that will be an AUTO NLP platform with all the features. We want to build this as a one-stop solution for all your NLP needs.
TextGenix is an NLP-driven platform for fast and accurate insights. It provides analytics of large amounts of content and text data in quick time.
What it does
Our tagline says “AUTOMATE YOUR CONTENT WORKFLOW WITH TEXTGENIX”. It is one of the kind platforms for fast and accurate insights into text data. It generates summary of the texts, provides NER, Document/texts redaction or sanitization, Spell-checker, Dictionary, Phone, and Email extractor, and other insights.
It does the below things:
TextGenix platform can help to understand large content/tett data in a matter of minutes. Through the platform, Get actionable insights from hundreds of thousands of lines of texts in no time. It generates an automated summary of large contents and provides word-by-word analytics of the texts from total word count to the meaning of each word. The user can either enter an URL to summarize and getting insights or enter the complete content directly into the platform. TextGenix also detects the sentiment of your text data.
The large content of text data is hard to analyze. It is very difficult to analyze the large content of texts. TextGenix can help people to get insights within minutes. Manual analysis of texts leads to the number of hours. For example, extracting mobile numbers or emails from text data, TextGenix can do that in quick time automatically. Media people, researchers, or anyone who is having the internet can access our platform and use our automated platform.
Humans are lazy in nature and people want to save time. TextGenix uses NLP and Deep Learning to automate the workflow of content creation and management. You can redact and sanitize your docs/contents to make them secure. It will also have a language translator, spell checker, etc soon.
Why Use CovidCentral?
Fast and Free
Ease of Use (User-friendly)
Secure (No content or data will be saved in the server rather we are sending NLP to you at the frontend.)
How we built it
We built CovidCentral using AI technologies, Cloud technologies, and web technologies. This platform uses NLP as a major technique and leverages several other tools and techniques. The major technologies/frameworks are:
a. Core concept: NLP (Spacy, Sumy, Gensim, NLTK)
c. Web Technologies: HTML, CSS, Bootstrap, jQuery ( JS)
d. Database and related tools: SQLITE3 and Firebase (Google's mobile platform)
e. Cloud: AWS and Heroku
Below are the steps that will give you a high-level overview of the solution:
Model Development and Automation: We have used several NLP libraries and frameworks like Spacy, Sumy, Gensim, and NLTK. Apart from having a custom model, we are also using pre-trained models like BERT for the tasks. The basic workflow of creating our NLP-based summarizer or analytics engine is like this: Text Preprocessing (remove stopwords, punctuation). Frequency table of words/Word Frequency Distribution – how many times each word appears in the document Score each sentence depending on the words it contains and the frequency table. Build a summary or text analytics engine by joining every sentence above a certain score limit.
Deployment: After successfully integrating backend and frontend into a platform, we deployed TextGenix on the cloud. We deployed our solution on Amazon Web Services (AWS) and Heroku.
Challenges we ran into
Right now, the biggest challenge is “The Novel Coronavirus” in India. We are taking this as a challenge and not as an opportunity. Our team is working on several verticles whether it is medical imaging, surveillance, bioinformatics, and NLP to fight this virus. There were a few major challenges: Time constraint was a big challenge because we had very little time to develop this but we still pulled this in this short span of time. We also got challenges while deploying our solution to the cloud but managed somehow to do that and still testing our platform and making it robust and efficient.
Accomplishments that we're proud of
Propelled by the modern technological innovations, data is to this century what oil was to the previous one. Today, our world is parachuted by the gathering and dissemination of huge amounts of data. In fact, the International Data Corporation (IDC) projects that the total amount of digital data circulating annually around the world would sprout from 4.4 zettabytes in 2013 to hit 180 zettabytes in 2025. That’s a lot of data! With such a big amount of data circulating in the digital space, there is a need to develop machine learning algorithms that can automate the workflow for content creation and management. We are proud of the development of TextGenix and to make it Open Source so anyone can use it for free on any kind of device to get insights from their text data.
What we learned
Learning is a continuous process of life, the pinnacle of the attitude and vision of the universe. In this lockdown situation, we are not able to meet each other but we learned how to work virtually in this kind of situation. Online meeting tools like Zoom in our case, GitHub, Slack, etc helped all of us in our team to collaborate and share our codes with each other.
We also strengthen our skills in NLP (BERT, Spacy, NLTK, etc) and how to integrate our models to the front-end for end-users. We spent a lot of time on the interface so people can use it and don’t get bored. From design to deployment, there were many things that helped us improve our skills technically.
We learn many things around us day by day. Since we are born, we learn many things and going forward, we will add more relevant features by learning new concepts in our platform. See the below section for details.
What's next for TextGenix
We are working to build this solution as an Auto NLP platform. Features like:
- Text Cleaning
- Text Classification
- Create endpoints easily
- Synthetic data generation will be added in the next few weeks. We are working on that and also utilizing the services of Apache Airflow to handle the data pipeline.
On the business front, We are looking to collaborate with enterprises that are happy to use our solution.
That's all for now.
Thanks to the Hackathon Team.