Inspiration

Given the volatility and unpredictability of market movements, the identification of trends allows investors to identify profitable markets with greater accuracy. We identified 4 main factors that help to shape market trends and fluctuations are as follows:

  1. Government
  2. International transactions
  3. Speculation and Expectation
  4. Supply and Demand. (Mitchell, 2022)

The influence of points 1, 2, and 4 can be estimated through historical and numerical data. However, market sentiments are current and changing. Additionally, they come in the form of textual data, making them difficult to analyze. However, it is an important factor as it affects overall buying and selling activities within the market. This influences both present and future trends (Mitchell, 2022), affecting stock market predictions.

Hence, we will use Natural Language Processing (NLP) and Machine Learning to automate the process of analyzing market sentiments.

What it does

The API scans online platforms, such as Twitter, for opinions regarding the stock market. The data is then passed to the NLP which parses the text for the stock or organization it is referencing. The NLP model also predicts whether the sentiment is positive or negative.

The API then returns a summary of the predicted results. This includes whether sentiments towards a particular stock is positive or negative, and the confidence of the prediction.

How we built it

The NLP portion that recognizes the stock, uses a tokenizer pipeline. It takes in the sentence and splits it into its component words and phrases. It then labels the phrases with its type (e.g. Noun). It then extracts the important objects, within which the key stock will be.

The NLP sentiment prediction is done using a neural network, specifically LSTM. This type of network can find the relation between words and give a high accuracy on the final prediction.

The API was deployed using heroku, since it is lightweight. However, we are planning on deploying it on AWS to allow for regular and scheduled scrapping of new data, and to accommodate more traffic.

The dashboard to showcase our work was done on Dash. This provided a UI to view the prediction results.

Challenges we ran into

The tokenizer was difficult to finetune. Online posts often have their own style of writing, including slangs, abbreviations and misspellings. Hence, in some texts where the grammar was jumbled, we could not identify the stock, even when in other posts we could find that same stock.

Deployment was difficult as we did not separate and standardize our environment at first. We have since had to resolve a lot of dependencies.

Accomplishments that we're proud of

Our prediction model achieved an 80% accuracy, which is impressive considering that no grammar or spelling correction was done to the data.

We were able to extract out our results and display them on the dashboard in a user-friendly interface.

What we learned

Upon further research, different tokenizers can be combined, with the final layer in the pipeline resolving the different results.

We have also since learnt that standardizing the environment is an important first step to ensure that everyone in the team can work together smoothly. This also ensures that the deployment environment can be set up clearly.

What's next for Outwitting the Stock Market

The data now is using pre-scrapped data. We would like to set up the proper pipeline from scrapping to storage to prediction and to API.

We would also like to fine-tune our model to improve accuracy

We would also like to move our deployment onto AWS to support any future expansion.

Finally, and for the most ambitious future development, we would like to sort the data by time and analyze the sentiments of each stock over time. Using the time series data, combined with NLP analysis of other textual data such as news articles, we want to predict future sentiment. This can help our users see ahead and make more timely decisions.

Share this project:

Updates