Reflection 1
Introduction: The Federal Reserve has been instrumental in managing business cycle fluctuations in the US economy, especially in times of economic uncertainty. While the Federal Reserve communicates its interest rate decisions through its statement releases, which also includes key economic indicators and the Fed’s projection for the future, the Federal Reserve chairs have been hosting live press conferences after the statement releases. These press conferences introduce additional modes of communication, such as the chair’s facial expression and tone of voice, that might or might not convey additional information to the public. Traditional analysis has focused on how the phrasing of text statements impact the public’s perception of the future path of interest rates, and some recent attempts are starting to look at how the tone and the expressions of Fed chairs impact market prices, often disjointly. We want to combine all channels (video, audio, text, statement) at the Federal Open Market Committee press conferences to analyze the impacts of Federal Reserve communications on economic stability and interest rate expectations, which we proxy for using volatility in asset prices and movements in futures prices. We want to apply multimodal learning to extract audio, text, and video features of the press conferences. We then want to build a deep learning model to classify periods of high market volatility and movements in expectations using previously extracted video features. We then want to conduct local perturbations to back up how variations in different channels change the effects of Fed stabilization. We chose this topic because Haoyu is a PhD student in Economics studying the effects of Federal Reserve communications, and he believes that press conferences are an important channel of Fed communication and its impact on market expectations cannot be fully understood without the machinery of multimodal learning. Kevin has also always been interested in applying deep learning in the field of finance. He also had experience in using various statistical models including neural networks to process cryptocurrency trade data. Throughout this course, he learned the latest development in deep learning and wished to explore how to build a multimodal network to incorporate various inputs in order to make insights in financial time series.
Challenges: What has been the hardest part of the project you’ve encountered so far?
Formulating the problem to one that can be solved by transformers is difficult. Since we are working with multi-modal data - text and audio, we have been thinking about how to design attention heads to attend to different modes of communication.
Matching trading data with the conference transcript is not easy. Sentences in the transcript do not come in equal length and it is challenging to align them at the second level. Therefore, we need to come up with a structure that can host sequences that have various lengths.
Cleaning trading data shows that the trades do not come in equal time windows. Therefore when classifying the scenarios, we will have an imbalanced dataset for the target variable.
Insights: Are there any concrete results you can show at this point?
We are still in the process of getting the data ready for training. Since the scope of our project is relatively big, and we are dealing with a lot of time series data that needs to be aligned, the process has been time-consuming. However, we have the bulk of our data processing done - we have force aligned the transcripts with time stamps, cleaned up the raw asset tick history, and imputed tones from a CNN model. The next step would be to combine all these data series into a unified format that can be used for our training, validation, and testing.
Plan: Are you on track with your project?
What do you need to dedicate more time to?
Cleaning the data to make a dataset that is suitable for various structures so that we can explore the structure.
What are you thinking of changing, if anything?
We should focus our problem on a trading set that is more liquid so that we could match the sentences with more granular trading data
If we have more time, we should align the words in each time window instead of each sentences so that we could have sequences of the same length for the model to work on.
Log in or sign up for Devpost to join the conversation.