torchsenti

Sentiment Analysis Library for Research with PyTorch

Comment

Inspiration

In our day-to-day research, we usually face problems when (dataset, pre-trained model) are scattered over various places. It is a very time-consuming task for us to search for certain datasets that meet our needs, then look for pre-trained models that are already available online, and do benchmarking for several pre-trained models. Based on those problems, we are inspired to create a library that contains datasets from many different sources for fellow researchers to use in the future.

What it does

This library is a one-stop solution for researchers in conducting research on the topic of Sentiment Analysis which has several features provided below: Dataset Available

Sentiment Analysis
- IMDB Movie Reviews
- Pros and Cons
- Movie Review
- Trip Advisor
- City Search Data
- Yelp Review

Features

Text Cleansing e.g removing hyperlinks
WordPiece Tokenization with tagging for aspect extraction
Entity metrics for aspect detection

How we built it

We provided a feature for the researcher to download the specific dataset in raw format or preprocessed format, load and split dataset

Challenges we ran into

We faced many difficulties in preprocessing each of the datasets.

What we learned

What have we learned so far is that software development is hard and needs to consider the design pattern

What's next for torchsenti

We have several feature update in the near feature, like wrapper for text cleansing, WordPiece Tokenization, etc

Built With

Updates

Ruben Stefanus started this project — Aug 25, 2020 07:48 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.