Enlighted by Fama-French CAMP model, we try to find out a model with three significant factors and apply machine learning to offer a set better estimates. Firstly, cluster stocks from S&P 500 into 15 groups by k-means clustering, and we form up a portfolio consisting of items from different classes based on minimized volatility. Then, by putting penalties in case of overfitting, Net Elastic Algorithm predicts alphas with higher unbiasedness but lower total error overall.

Accumulative alpha diagrams at the end of the documentation are used to show potential excess returns.

Certainly, there are many assumptions to simplify our model as well as h, BUT it is still obvious and exciting to see extensibility in all aspects of the overall model and concepts we've built.

Our k-means clustering can be improved by including transaction cost to extend our model in short-selling. By capturing the reduce in alpha to increase our return, including transaction cost can allow us to be more adapted in the real-life situation. We had considered to include more factors in our k-means clustering such as transaction volume. However, due to limit in data extration and time, the model can be improved to fit in different market situation in future if we include more factors. Also, so far, all factors are fundamental indicators. In the future, we might have the chance to explore technical short-term indicators.

Regarding the monthly adjustment mechanism, we have used Sharpe ratio to choose the highest risk-adjusted return portfolio. If we would have more time, we would like to use other ratios and measures such as Treynor ratio (which considers market risk), Sortino Ratio (which considers downside risk) and company’s growth rate.

We also considered implementing a maximum cap for the percentage of capital to be invested in one company or one stock to avoid concentration risk. However, we could not implement the idea due to time constraint.

Moreover, the three factors of the Fama-French three-factor model might not be the best regressors to explain the excess return. We could improve by trying to include other factors. Due to time constraint, we haven't tried linear combination or principal component analysis. A lot of studies found out that momentum could also be included in the asset pricing model. The five-factor model could also be one of the consideration.

For social media analysis, after our thorough analysis to track the correlation between social media score and prices, we think our model would be further improved if we assign a score to the exact post according to the emotion level of wordings instead of just quantifying the number of likes, shares, and responses.

We believe that by adjusting the above-mentioned ideas and proposed improvements, we can apply our model with a better fit in different market situation across countries.

Built With

Share this project: