Title: PowellReader: Analyzing Market Response to FOMC Press Conferences

Who: Haoyu Sheng (hsheng4), and Kevin Zhao (qzhao25)

Github Repo: click here

Final Writeup: click here (Brown University access only)

Introduction: The Federal Reserve has been instrumental in managing business cycle fluctuations in the US economy, especially in times of economic uncertainty. While the Federal Reserve communicates its interest rate decisions through its statement releases, which also includes key economic indicators and the Fed’s projection for the future, the Federal Reserve chairs have been hosting live press conferences after the statement releases. These press conferences introduce additional modes of communication, such as the chair’s facial expression and tone of voice, that might or might not convey additional information to the public. Traditional analysis has focused on how the phrasing of text statements impact the public’s perception of the future path of interest rates, and some recent attempts are starting to look at how the tone and the expressions of Fed chairs impact market prices, often disjointly. We want to combine all channels (video, audio, text, statement) at the Federal Open Market Committee press conferences to analyze the impacts of Federal Reserve communications on economic stability and interest rate expectations, which we proxy for using volatility in asset prices and movements in futures prices. We want to apply multimodal learning to extract audio, text, and video features of the press conferences. We then want to build a deep learning model to classify periods of high market volatility and movements in expectations using previously extracted video features. We then want to conduct local perturbations to back up how variations in different channels change the effects of Fed stabilization. We chose this topic because Haoyu is a PhD student in Economics studying the effects of Federal Reserve communications, and he believes that press conferences are an important channel of Fed communication and its impact on market expectations cannot be fully understood without the machinery of multimodal learning. Kevin has also always been interested in applying deep learning in the field of finance. He also had experience in using various statistical models including neural networks to process cryptocurrency trade data. Throughout this course, he learned the latest development in deep learning and wished to explore how to build a multimodal network to incorporate various inputs in order to make insights in financial time series.

Related Work: Professor Gorodnichenko, professor Pham, and professor Talavera have done similar work in their publication “THE VOICE OF MONETARY POLICY” and made contributions to this field. They developed a CNN-based deep learning model in detecting emotions embedded and their impact on the financial market from Federal Open Market Committee press conferences. They used neural networks to process the audio input and output an emotion score. They use the BERT language model to process text data to conduct sentiment analysis. Finally they summarize the inputs to a final outcome that incorporates various input and financial data.

Data: We will be using multiple data sets containing information about various aspects of the press conferences. First, we get transcripts from the Federal Reserve Board. We scrape each of the Federal Open Market Committee press conferences directly from YouTube. These videos are publicly available and downloadable through YouTube. We also obtain tick-level Federal Funds Futures data from Refinitiv Tick History. The first FOMC press conference happened in 2011. There are a total of around nine FOMC meetings every year. The press conferences took place every other FOMC meeting until 2014, and followed all FOMC meetings after 2014. Each press conference lasts around an hour to an hour and half, making each video file a few hundred megabytes large. We will clean the transcripts by splitting them into sentences, and force-align the sentences with the video timestamps. The time-stamped transcripts can be used along with the video frames to analyze movements in asset prices.

Methodology: We have all transcript, audio data and trade data from all 44 conferences. We will match transcript, audios into time windows and also aggregate all trade data into the same time windows so that they can be paired as input and outputs. Then we will split the data into training and validation. Our model is a multi-input transformer which aims to encode information from transcript, audio, and the statement, and then output a trade sequence which predicts whether there is trade happening or not in a time window. By minimizing our RMSE loss on predicted trade sequence and the real trade sequence, we will be able to attain a model with predictive power.

Metrics: What constitutes “success?” Our base goal is to train a single-input transformer, which uses minute-level transcript to predict the sequence of second-level trade indicators within a minute. Our target goal is to train a single-input transformer with meaningful perturbation that allows us to understand how changes in wording can impact trade movements. Our stretch goal is to train a multi-modal transformer. Our metric of “success” is to see if we can predict high volatility regimes accurately, and if we can decompose the relative contribution of each mode of communication. As we have the capacity of working with multiple modes of communication (audio, text, facial expressions), a natural set of experiments we can run is to evaluate how each mode of communication impacts the accuracy of classifying whether a time period is high or low volatility. While there isn’t any existing research that directly looks at the impact of press conference communication on switches in market volatility regimes, we can adopt some of the architecture of the current models analyzing FOMC press conferences, and compare them to our model on our desired task.

Ethics: What broader societal issues are relevant to your chosen problem space? This work is important for central bank communication, as one of the Federal Reserve’s main goals is to manage public expectations on the future states of the economy. What this research finds can have meaningful implications on how central bankers communicate their views on the economy and monetary policy. Why is Deep Learning a good approach to this problem? Deep learning is a good approach to this problem, since traditional data preprocessing techniques don’t have the capacity to work with audiovisual data. Also, the ability of neural networks to act as “universal function approximators” is key to helping us accurately predict what words/facial expressions induce market volatility.

Division of labor: Briefly outline who will be responsible for which part(s) of the project. Haoyu will scrape the YouTube videos, the transcripts, and force-align the transcripts with the YouTube videos. Haoyu will download the high-frequency financial data from Refinitiv. Kevin will clean the tick history data and turn it into a second-level indicator for whether any trade occurred in the given time window. Kevin will also run baseline metrics of a fine tuned language model. Haoyu will create audio emotion labels for the FOMC press conference, using separately trained convolutional neural networks. Haoyu and Kevin will build a transformer model that takes minute-level transcripts as input, and a sequence of second-level trade indicators as output.

Built With

Share this project:

Updates

posted an update

Reflection 1

Introduction: The Federal Reserve has been instrumental in managing business cycle fluctuations in the US economy, especially in times of economic uncertainty. While the Federal Reserve communicates its interest rate decisions through its statement releases, which also includes key economic indicators and the Fed’s projection for the future, the Federal Reserve chairs have been hosting live press conferences after the statement releases. These press conferences introduce additional modes of communication, such as the chair’s facial expression and tone of voice, that might or might not convey additional information to the public. Traditional analysis has focused on how the phrasing of text statements impact the public’s perception of the future path of interest rates, and some recent attempts are starting to look at how the tone and the expressions of Fed chairs impact market prices, often disjointly. We want to combine all channels (video, audio, text, statement) at the Federal Open Market Committee press conferences to analyze the impacts of Federal Reserve communications on economic stability and interest rate expectations, which we proxy for using volatility in asset prices and movements in futures prices. We want to apply multimodal learning to extract audio, text, and video features of the press conferences. We then want to build a deep learning model to classify periods of high market volatility and movements in expectations using previously extracted video features. We then want to conduct local perturbations to back up how variations in different channels change the effects of Fed stabilization. We chose this topic because Haoyu is a PhD student in Economics studying the effects of Federal Reserve communications, and he believes that press conferences are an important channel of Fed communication and its impact on market expectations cannot be fully understood without the machinery of multimodal learning. Kevin has also always been interested in applying deep learning in the field of finance. He also had experience in using various statistical models including neural networks to process cryptocurrency trade data. Throughout this course, he learned the latest development in deep learning and wished to explore how to build a multimodal network to incorporate various inputs in order to make insights in financial time series. Challenges: What has been the hardest part of the project you’ve encountered so far? Formulating the problem to one that can be solved by transformers is difficult. Since we are working with multi-modal data - text and audio, we have been thinking about how to design attention heads to attend to different modes of communication. Matching trading data with the conference transcript is not easy. Sentences in the transcript do not come in equal length and it is challenging to align them at the second level. Therefore, we need to come up with a structure that can host sequences that have various lengths. Cleaning trading data shows that the trades do not come in equal time windows. Therefore when classifying the scenarios, we will have an imbalanced dataset for the target variable. Insights: Are there any concrete results you can show at this point? We are still in the process of getting the data ready for training. Since the scope of our project is relatively big, and we are dealing with a lot of time series data that needs to be aligned, the process has been time-consuming. However, we have the bulk of our data processing done - we have force aligned the transcripts with time stamps, cleaned up the raw asset tick history, and imputed tones from a CNN model. The next step would be to combine all these data series into a unified format that can be used for our training, validation, and testing.
Plan: Are you on track with your project? What do you need to dedicate more time to? Cleaning the data to make a dataset that is suitable for various structures so that we can explore the structure. What are you thinking of changing, if anything? We should focus our problem on a trading set that is more liquid so that we could match the sentences with more granular trading data If we have more time, we should align the words in each time window instead of each sentences so that we could have sequences of the same length for the model to work on.

Log in or sign up for Devpost to join the conversation.