Inspiration
In our day-to-day research, we usually face problems when (dataset, pre-trained model) are scattered over various places. It is a very time-consuming task for us to search for certain datasets that meet our needs, then look for pre-trained models that are already available online, and do benchmarking for several pre-trained models. Based on those problems, we are inspired to create a library that contains datasets from many different sources for fellow researchers to use in the future.
What it does
This library is a one-stop solution for researchers in conducting research on the topic of Sentiment Analysis which has several features provided below: Dataset Available
- Sentiment Analysis
- IMDB Movie Reviews
- Pros and Cons
- Movie Review
- Trip Advisor
- City Search Data
- Yelp Review
Features
- Text Cleansing e.g removing hyperlinks
- WordPiece Tokenization with tagging for aspect extraction
- Entity metrics for aspect detection
How we built it
We provided a feature for the researcher to download the specific dataset in raw format or preprocessed format, load and split dataset
Challenges we ran into
We faced many difficulties in preprocessing each of the datasets.
What we learned
What have we learned so far is that software development is hard and needs to consider the design pattern
What's next for torchsenti
We have several feature update in the near feature, like wrapper for text cleansing, WordPiece Tokenization, etc
Log in or sign up for Devpost to join the conversation.