On Day 26, we extended the comparative error analysis to the original, 12-by-12 strategy and showed how results were similar to the unadjusted strategy relative to the adjusted one. The main observation that emerged was that the adjusted strategy performed better than the others due to identifying most of the big moves when it was correct and not missing the big moves when it was not. This was borne out by statistical tests that showed the mean difference between returns for the true positives and false negatives for the adjusted strategy were indeed significant relative to the others.
The last five days! On Day 25, we compared the peformance of the adjusted vs. unadjusted strategy for different prediction scenarios: true and false positives and negatives. For true positives and false negatives, the adjusted strategy performed better than the unadjusted. For true negatives and false positives, the unadjusted strategy performed better. Today, we run the same comparisons with the original 12-by-12 strategy. We present the confusion matrices below for all three strategies.
On Day 24, we explained in detail how the error correction term led to somewhat unexpected outperformance relative to the original and unadjusted strategies. The reason? We hypothesized that it was due to the the error term adjusting the prediction in a trending direction when or if the current walk-forward model was mean reverting. We noted that the walk-forward models tended to have negative size effects, so were likely mean reverting.
On Day 23 we dove into the deep end to understand why the error correction we used worked as well as it did. We showed how traditional machine learning uses loss functions and then hypothesized how our use helped improve predictions through its effect on the correlation of the signs of the prediction with that of the forward return. We have to admit that our decision to use the error term in the way we did was a bit hacky so while it did generate improvements vial trial and error, one wouldn’t have necessarily thought to use it in the way we did.
On Day 22 we saw a meaningful improvement in our strategy by waiting an additional week to quantify model error and then using that error term to adjust the prediction on the most recently completed week of data. What was even more dramatic was comparing this improved strategy to one that followed the same waiting logic, but did not include the error correct. It turned an underperforming strategy into an outperforming one!
On Day 21, we wrung our hands with frustration over how to proceed. The results of our circular block sampling suggested we shouldn’t expect a whole lot of outperformance in our 12-by-12 model out-of-sample. To deal with this our choices were, back to the drawing board or off to the waterboard to start over or to torture the data until it told us what we wanted. However, we found a third way, in which we could use the information we already had, to make a few minor tweaks to improve the model.
On Day 20 we completed our analysis of the 12-by-12 strategy using circular block sampling on the 3 and 7 blocks. We found the strategy did not outperform buy-and-hold on average and its frequency of outperformance was modest – in the 28-31% range – insufficient to warrant actually executing the strategy. What to do? Back to the drawing board to test a new strategy? Or to the water board to torture the current one?
On Day 19, we introduced circular block sampling and used it to test the likelihood the 200-day SMA strategy would outperform buy-and-hold over a five year period. We found that the 200-day outperformed buy-and-hold a little over 25% of the time across 1,000 simulations. The frequency of the 200-day’s Sharpe Ratio exceeding buy-and-hold was about 30%. Today, we apply the same analysis to the 12-by-12 strategy.
When we run the simulations we find that the strategy has an average underperformance of 11% points vs.
On Day 18 we started to discuss simulating returns to quantify the probability of success for a strategy out-of-sample. The reason for this was we were unsure whether or how much to merit the 12-by-12’s performance relative to the 200-Day SMA. We discussed various simulation techniques, but we settled on historical sampling that accounts for autocorrelation of returns. We then showed the autocorrelation plots for the strategy and the three benchmarks – buy-and-hold, 60-40, and 200-day – with up to 10 lags.
On Day 17 , we compared the 12-by-12 and 200-day SMA strategies in terms of magnitude and duration of drawdowns, finding in favor of the 200-day. We also noted that most of the contributors to the differences in performance were due to two periods at the beginning and end of the period we were examining. That suggests regime driven performance, which begs the question, how much can we rely on our analysis if we have no certainty that any of the regimes identified will recur, or, even be forecastable?