Phyla Challenge 1- INFORMATION EXTRACTION

Inspiration

From school course material, Youtube and the scikit-learn website.

What it does

Multiclassification using MLP model: We first convert words to a matrix of token counts and split the matrix into a training set and validation set. Then by using 5 fold cross-validation and hyperparameter tuning to do the training. By adjusting the maximum iterations, the learning rate, and one hidden layer size, we find our greatest accuracy and then do another scoring.

How we built it

Using Google Colab to build our training models: MLP & BERT The training data set provided consists of a sentence, a pair of entities found in the sentence, their entities spans, and one of the three target variable values: positive, negative or not related (using 0, 1, 2 to represent these values). The data set format is in CSV (comma-separated values), so by importing Pandas library, we can read the input CSV file, to extract the columns of data (entities) and the target value.