Inspiration
From school course material, Youtube and the scikit-learn website.
What it does
Multiclassification using MLP model: We first convert words to a matrix of token counts and split the matrix into a training set and validation set. Then by using 5 fold cross-validation and hyperparameter tuning to do the training. By adjusting the maximum iterations, the learning rate, and one hidden layer size, we find our greatest accuracy and then do another scoring.
How we built it
Using Google Colab to build our training models: MLP & BERT The training data set provided consists of a sentence, a pair of entities found in the sentence, their entities spans, and one of the three target variable values: positive, negative or not related (using 0, 1, 2 to represent these values). The data set format is in CSV (comma-separated values), so by importing Pandas library, we can read the input CSV file, to extract the columns of data (entities) and the target value.
Challenges we ran into
BERT is computationally intensive, with a short amount of time, our training model does not have a high performance.
Accomplishments that we're proud of
successfully computed 2 training models.
What we learned
BERT implementation.
What's next for Phyla Challenge - INFORMATION EXTRACTION
increase BERT training model accuracy by adding more hidden layers & number of echos.
Log in or sign up for Devpost to join the conversation.