Machines

Day 10: Residuals

On Day 9 we conducted a walk-forward analysis on the 12-by-12 week lookback-look forward combination. We then presented the canonical the actual vs. predicted value graph with a \(45^o\) line overlay to show what a perfect forecast would look like. Here’s the graph again. As noted previously, we limited the scale of the axes to make it easier to interpret. This omits some outliers, which we’ll touch on below. The main body of the graph shows a nice scattering of the data around the line.

Day 9: Forecast

Yesterday we finished up our analysis of the regression models we built using different combinations of lookback and look forward momentum values. Today, we see if we can generate good forecasts using that data. If you’re wondering why we still haven’t tested Fibonacci retracements with Bollinger Band breakouts filtered by Chaikin Volatility, the reason is that we’re first trying to establish some rigor – albeit modest – to our tests.

Day 8: Baseline effects

Yesterday, we discussed the size effects, their statistical significance (e.g., p-values), and some other summary statistics for the various momentum combinations – namely, 3, 6, 9, and 12 week lookback and look forward returns. We found that size effects were small, but a few were significant, and that in the case of the 12-by-12 combination about 75% of the results clustered in the -10% to 10% range for both directions – forward and back.

Day 7: Size effects

Welcome to the last day of the first week of 30 days of backtesting! We hope you’re enjoying the ride. If you have any questions or concerns, you can reach us at the contact details listed at the bottom of this post. On Day 6 we defined momentum rather roughly and ran a bunch of tests to identify the linear relationship between different lookback and look forward periods. However, we didn’t go into detail about the results.

Day 6: Momentum

Yesterday we examined the eponymous Fama-French factors to see if we could find something that will help us develop an investment strategy to backtest. It turned out the best performing factor was the market risk premium, which is essentially the return to the market in excess of the risk-free rate. In other words, the best factor is buy-and-hold! I guess that means we’ve finished 24 days early. Just buy the index.

Day 5: Trifactor

The day has finally arrived! Time to start backtesting! We’ve always wanted to test how Fibonacci retracements with Bollinger Band breakouts filtered by Chaikin Volatility would perform while implementing rolling stop-loss updates based on the ATR scaled by the 7-day minus 5-day implied volatility rank.1 Maybe we’re getting ahead of ourselves. Expeditions are fun and it’s always thrilling to explore uncharted territory. But it’s also easy to get lost and forget that we’re ultimately trying to generate superior, risk-adjusted returns.

Day 3: Metrics

Yesterday we investigated the effect of using the 200-day simple moving average (200SMA) as a proxy for a rules-based investing method. The idea was to approximate what a reasonably rational actor/agent might do in addition to the buy-and-hold approach. When folks talk about research, backtesting, and forecast comparisons, they usually use a naive model against which one compares performance. In econometric forecasting, that naive approach is often represented mathematically as \(f(x_n) = x_{n-1}\) or something like that.

Day 2: Hello World

On Day 1, we decided on a few benchmarks to use for our backtest. That is, a 60-40 and 50-50 weighting of the SPY and IEF ETFs. What we want to add in now is the Hello World version of trading strategies – the 200-DAY MOVING AVERAGE! Why are we adding this to our analysis? As we pointed out yesterday, the typical benchmark against which to compare a trading strategy is buy-and-hold.

Day 1: Benchmarks

Yesterday we set out our plan to backtest a strategy using the SPY ETF, which tracks the S&P 500. Before we commence, we obviously need to establish a baseline. What metrics will we use to assess the strategy? How will we define success? What benchmarks will we use? Typically, for a single asset strategy the comparison is buy-and-hold performance. That is, if you’re using Fibonacci retracements with Bollinger Band breakouts filtered by Chaikin Volatility to generate buy and sell signals, you’ll usually compare the performance of that strategy to one in which you bought the underlying on the first day of the test and held it until the end.

Day 0: So it begins

Over the next 30 days or so, we’ll be conducting a test of the emergency backtesting system. The test will attempt to go through all the steps one might usually follow to analyze, build, test, and then deploy an investment strategy. Now this probably won’t be a strategy that knocks the cover off the ball. It may not even be profitable. But our thought process is as follows. How often does one see the entire backtesting process presented step-by-step in a reproducible way?