Python

On Day 29, we conducted our out-of-sample test on the four strategies and found that the adjusted strategy came out on top. We made this conclusion after ranking a cross section of the following metrics: cumulative return, Sharpe Ratio, and max drawdown. If we wanted to commit capital, there would be a lot more work to do. But with the bulk of the backtesting over, it’s time to sum up what we learned.

The moment of truth has arrived! On Day 28, we iterated through all the metrics we had previously used to identify and analyze the robustness of our strategy. We found the new adjusted strategy performed better than the original and adjusted strategies. Such performance was also statistically significant for key scenarios. But on simulation, buy-and-hold beat the new adjusted strategy on average across different sampling methods. Now it’s time to look at how our various strategies would have performed out of sample.

On Day 27, we had our strategy enhancement reveal. By modifying the arithmetic behind our error correction, we chiseled another 16% points of outperformance vs. buy-and-hold and the original 12-by-12 strategy. All that remains now is to run the prediction scenario metrics and conduct circular block sampling. Given that we’ve laid the ground work for these analyses in past posts, we will only spill a little bit of virtual, binary ink in the discussion.

On Day 26, we extended the comparative error analysis to the original, 12-by-12 strategy and showed how results were similar to the unadjusted strategy relative to the adjusted one. The main observation that emerged was that the adjusted strategy performed better than the others due to identifying most of the big moves when it was correct and not missing the big moves when it was not. This was borne out by statistical tests that showed the mean difference between returns for the true positives and false negatives for the adjusted strategy were indeed significant relative to the others.

The last five days! On Day 25, we compared the peformance of the adjusted vs. unadjusted strategy for different prediction scenarios: true and false positives and negatives. For true positives and false negatives, the adjusted strategy performed better than the unadjusted. For true negatives and false positives, the unadjusted strategy performed better. Today, we run the same comparisons with the original 12-by-12 strategy. We present the confusion matrices below for all three strategies.

On Day 24, we explained in detail how the error correction term led to somewhat unexpected outperformance relative to the original and unadjusted strategies. The reason? We hypothesized that it was due to the the error term adjusting the prediction in a trending direction when or if the current walk-forward model was mean reverting. We noted that the walk-forward models tended to have negative size effects, so were likely mean reverting.

On Day 23 we dove into the deep end to understand why the error correction we used worked as well as it did. We showed how traditional machine learning uses loss functions and then hypothesized how our use helped improve predictions through its effect on the correlation of the signs of the prediction with that of the forward return. We have to admit that our decision to use the error term in the way we did was a bit hacky so while it did generate improvements vial trial and error, one wouldn’t have necessarily thought to use it in the way we did.

On Day 22 we saw a meaningful improvement in our strategy by waiting an additional week to quantify model error and then using that error term to adjust the prediction on the most recently completed week of data. What was even more dramatic was comparing this improved strategy to one that followed the same waiting logic, but did not include the error correct. It turned an underperforming strategy into an outperforming one!

On Day 21, we wrung our hands with frustration over how to proceed. The results of our circular block sampling suggested we shouldn’t expect a whole lot of outperformance in our 12-by-12 model out-of-sample. To deal with this our choices were, back to the drawing board or off to the waterboard to start over or to torture the data until it told us what we wanted. However, we found a third way, in which we could use the information we already had, to make a few minor tweaks to improve the model.

On Day 20 we completed our analysis of the 12-by-12 strategy using circular block sampling on the 3 and 7 blocks. We found the strategy did not outperform buy-and-hold on average and its frequency of outperformance was modest – in the 28-31% range – insufficient to warrant actually executing the strategy. What to do? Back to the drawing board to test a new strategy? Or to the water board to torture the current one?

Python

Day 30: Summing up

Day 29: Out of sample

Day 28: Reveal

Day 27: Enhancement

Day 26: Adjusted vs. Original

Day 25: Positives and Negatives

Day 24: Lucky Logic

Day 23: Logic or Luck

Day 22: Error Correction

Day 21: Drawing Board