In our day-to-day research, we usually face problems when (dataset, pre-trained model) are scattered over various places. It is a very time-consuming task for us to search for certain datasets that meet our needs, then look for pre-trained models that are already available online, and do benchmarking for several pre-trained models. Based on those problems, we are inspired to create a library that contains datasets from many different sources for fellow researchers to use in the future.

What it does

This library is a one-stop solution for researchers in conducting research on the topic of Sentiment Analysis which has several features provided below: Dataset Available

  • Sentiment Analysis
    • IMDB Movie Reviews
    • Pros and Cons
    • Movie Review
    • Trip Advisor
    • City Search Data
    • Yelp Review


  • Text Cleansing e.g removing hyperlinks
  • WordPiece Tokenization with tagging for aspect extraction
  • Entity metrics for aspect detection

How we built it

We provided a feature for the researcher to download the specific dataset in raw format or preprocessed format, load and split dataset

Challenges we ran into

We faced many difficulties in preprocessing each of the datasets.

What we learned

What have we learned so far is that software development is hard and needs to consider the design pattern

What's next for torchsenti

We have several feature update in the near feature, like wrapper for text cleansing, WordPiece Tokenization, etc

Built With

Share this project: