In this thesis, I have studied 298 capital market anomalies in time range from 1979 to 2019 by using four different machine learning models, namely linear regression model, logistic regression model, principle component regression and XGBoost model. Within each model, I have applied three different ways of presenting the training labels (the return) and two different ways of defining training set. Results indicate that models with standardized training labels are able to achieve better portfolio return than with actual return and models with time range of training set closer to the out-of-sample test date is
able to perform better. I compare the results with 298 classical single-anomaly portfolios as well as a single-signal portfolio by taking linear combination of all 298 anomalies as the one signal. Result shows that combining anomalies together yields significantly better results than single anomaly portfolio. Machine learning models are able to outperform the classical models and deliver higher average monthly out-of-sample portfolio returns. More complex machine learning model like XGBoost together with a large enough training dataset could perform better than linear model, which indicates the existence of non-linear
relationship among anomalies.
«
In this thesis, I have studied 298 capital market anomalies in time range from 1979 to 2019 by using four different machine learning models, namely linear regression model, logistic regression model, principle component regression and XGBoost model. Within each model, I have applied three different ways of presenting the training labels (the return) and two different ways of defining training set. Results indicate that models with standardized training labels are able to achieve better portfolio...
»