Day 12: Iteration

In Day 11, we presented an initial iteration of train/forecast steps to see if one combination performs better than another. Our metric of choice was root mean-squared error (RMSE)1 which is frequently used to compare model performance in machine learning circles. The advantage of RMSE is that it is in the same units as the forecast variable. The drawback is that it is tough to interpret on its own. A model with a RMSE of 10 is better than one with 20.

Day 11: Autocorrelation

On Day 10, we analyzed the performance of the 12-by-12 model by examining the predicted values and residuals. Our initial takeaway suggested the model did seem not overly biased or misspecified in the -10% to 10% region. But when it gets outside that range, watch out! We suspected that there was some autocorrelation in the residuals, which we want to discuss today. There are different statistical tests for autocorrelation and normality that would take too long to explain in a blog post.

Day 10: Residuals

On Day 9 we conducted a walk-forward analysis on the 12-by-12 week lookback-look forward combination. We then presented the canonical the actual vs. predicted value graph with a \(45^o\) line overlay to show what a perfect forecast would look like. Here’s the graph again. As noted previously, we limited the scale of the axes to make it easier to interpret. This omits some outliers, which we’ll touch on below. The main body of the graph shows a nice scattering of the data around the line.

Day 9: Forecast

Yesterday we finished up our analysis of the regression models we built using different combinations of lookback and look forward momentum values. Today, we see if we can generate good forecasts using that data. If you’re wondering why we still haven’t tested Fibonacci retracements with Bollinger Band breakouts filtered by Chaikin Volatility, the reason is that we’re first trying to establish some rigor – albeit modest – to our tests.

Day 8: Baseline effects

Yesterday, we discussed the size effects, their statistical significance (e.g., p-values), and some other summary statistics for the various momentum combinations – namely, 3, 6, 9, and 12 week lookback and look forward returns. We found that size effects were small, but a few were significant, and that in the case of the 12-by-12 combination about 75% of the results clustered in the -10% to 10% range for both directions – forward and back.

Day 7: Size effects

Welcome to the last day of the first week of 30 days of backtesting! We hope you’re enjoying the ride. If you have any questions or concerns, you can reach us at the contact details listed at the bottom of this post. On Day 6 we defined momentum rather roughly and ran a bunch of tests to identify the linear relationship between different lookback and look forward periods. However, we didn’t go into detail about the results.

Day 6: Momentum

Yesterday we examined the eponymous Fama-French factors to see if we could find something that will help us develop an investment strategy to backtest. It turned out the best performing factor was the market risk premium, which is essentially the return to the market in excess of the risk-free rate. In other words, the best factor is buy-and-hold! I guess that means we’ve finished 24 days early. Just buy the index.

Day 5: Trifactor

The day has finally arrived! Time to start backtesting! We’ve always wanted to test how Fibonacci retracements with Bollinger Band breakouts filtered by Chaikin Volatility would perform while implementing rolling stop-loss updates based on the ATR scaled by the 7-day minus 5-day implied volatility rank.1 Maybe we’re getting ahead of ourselves. Expeditions are fun and it’s always thrilling to explore uncharted territory. But it’s also easy to get lost and forget that we’re ultimately trying to generate superior, risk-adjusted returns.

Day 3: Metrics

Yesterday we investigated the effect of using the 200-day simple moving average (200SMA) as a proxy for a rules-based investing method. The idea was to approximate what a reasonably rational actor/agent might do in addition to the buy-and-hold approach. When folks talk about research, backtesting, and forecast comparisons, they usually use a naive model against which one compares performance. In econometric forecasting, that naive approach is often represented mathematically as \(f(x_n) = x_{n-1}\) or something like that.

Day 2: Hello World

On Day 1, we decided on a few benchmarks to use for our backtest. That is, a 60-40 and 50-50 weighting of the SPY and IEF ETFs. What we want to add in now is the Hello World version of trading strategies – the 200-DAY MOVING AVERAGE! Why are we adding this to our analysis? As we pointed out yesterday, the typical benchmark against which to compare a trading strategy is buy-and-hold.