The box plot shows the moving median component to be removed for time series analysis
Before March 2003, the drastic drop in the middle of the plot, the average ratio of Chinese travelers was slightly less than 1.0.
The model curve fit over the original data in the red shows- an approximate measure and forecasted for the next 24 months.
The grey ribbon around the blue forecast is wider when looking farther into the future, because there is more uncertainty.
The graph shows a linear trend and seasonality component along with a polynomial function in the end
As expected, our model predicts a stable continuation of the post-outbreak level (above 1.0).
This plot overlays the year for 2001 through 2005 inclusively. The seasonality trend of year wise was analysed using the graph.
The threat of a megavirus has been an impending source of fear for the global society. From the Bubonic Plague of the 1350s to diseases like SARS, the idea of mass sickness and death has loomed. Although our medical technology has come along way and is rapidly increasing, epidemics cause global anxiety, which is only exacerbated by the media.
As we find ourselves in the midst of COVID-19, the coronavirus, we believe it is integral to contain fear with the inference of past results through science. By carefully studying a previous epidemic stemming from China, we have produced predictions to aid in comforting people during this time. Large scale fear also has economic implications. Many people hold onto their money instead of spending it when there is uncertainty about the future, which can lead to recessions in the economy. From the perspective of investors, it may keep a lot of business to be able to give clients an estimation of when the market will reach its original levels before the outbreak.
We began by examining the most recent epidemics that have caused global anxiety, which have been ebola and H1N1. However, ebola originated near the Ebola River in Africa. For a variety of historical and structurally perpetuated reasons such as the wealth gap, the US has received far fewer travelers from African countries affected by the ebola virus than from China. H1N1, or swine flu, broke out in the US, rendering travelling data from the US to the US meaningless. We instead chose to compare the modern coronavirus to the SARS outbreak in China in 2003. Although this happened close to two decades ago, making comparisons about travelers embarking from the same country helped us create a more accurate model than if we had used data from a country other than China, no matter how recent the outbreak.
Our goal was to create a model to predict how long it will take after the coronavirus’ outbreak for travel numbers from China to the US to reach the same levels again, based on comparable data from during the SARS epidemic in China.
What it does
Because the SARS outbreak was so similar in nature to the current outbreak of the coronavirus, we are using tourism data from around the time of the SARS outbreak to help with our predictability model of the effects of the coronavirus on tourism from China to the United States. We gleaned tourism data from the NTTO datasets from 2001 until 2005, specifically focusing on the number of Chinese citizens entering the US during each month in this period. This data, the example of an epidemic that rebounded, will be used to train our predictive model.
How we built it
To forecast the potential effects of the coronavirus on tourism, we utilized data from “Final COR Volume.xlsx” provided by Fidelity Investments, and fed this into our predictive model to find a prediction for when the coronavirus will rebound.
To get our data into the most easily read forms, we compiled select data into Google spreadsheets from “Final COR Volume.xlsx” as well as data we found from during the SARS outbreak. We then downloaded these sheets into Excel files that could easily be read into R.
These showed us that there was a rebound in the number of travelers from China to the US to pre-outbreak levels after the SARS outbreak hit.
We built a forecasting model to predict the trend in the current number of Chinese visitors coming to the USA. The dataset on the change of chinese visitors entering into the USA before, during and after the SARS outbreak were fed to the ARIMA (Auto Regressive Integrated Moving Average) model for a prediction. We removed the trend and seasonality using the inbuilt acf(), pacf() and also box plots to build a model. The model was validated and the efficiency was tested using the serial correlation test- Ljung-Box test upto lag=5. The outcome was p~0.2, showing no serial correlation.
We used this model as a baseline to fit our new time series data related to the coronavirus and predicted the time it would take to for the ratio of Chinese travelers to the USA to return to pre-outbreak levels. From our model, we observed that the number of travelers will return to pre-outbreak levels from July 2020 onwards. Below are visualizations that helped lead us to this conclusion.
We created this prediction model with the intention that Fidelity investors feel more secure or certain of the timeline of flux during the coronavirus. Further, our model can be used to aid in increasing the security of investments in certain sectors. With more time, we would like to explore how a change in tourism affects different market sectors in the U.S. We would also have liked to generalize this model to be inclusive of economies affected not only by epidemics in China, but worldwide. We would need sufficient data to control for factors such as inflation, major world events (such as 9/11 or the Olympics), and historically or environmentally devastated regions.
Challenges we ran into
Despite carefully selecting our data to create our model, we did include some assumptions. To start, we assumed that no future travel bans from China given current political climate, which could greatly affect future tourism from China along with the coronavirus.
Also, we assumed that the coronavirus does not turn into a pandemic (controlled by about the same time SARS was). Further, we suspected that United States’ tourism was still rebounding from the 9/11 attacks before the SARS outbreak where tourism in general decreased by 50%.
Lastly, we assumed that uncertainties from other countries and the global political climate will not have an effect on tourism from China to the United States .
Accomplishments that we're proud of
Before Saturday morning, only 2 out of 4 had met before. Out of our four team members, two are sophomores at Brown, one is a junior at Brown, and our final member is a master’s student at UCSD. Moreover, all of us are studying different fields: data science, machine learning, applied math, graphics, cyber security, and political science. As such, we all applied a plethora of skills to come up with our research question and create the visualizations and model.
What we learned
We learned how to use ARIMA models, wrote in R, and learned how to choose the best visualizations.
What's next for Predicting Stability after COVID-19
After this, we hope to find more datasets to gain insight into specific market share sectors that Chinese tourists contribute to. The change in these sectors of the economy may be helpful to investors in making their decisions in the future.