Data Science

On Day 9 we conducted a walk-forward analysis on the 12-by-12 week lookback-look forward combination. We then presented the canonical the actual vs. predicted value graph with a \(45^o\) line overlay to show what a perfect forecast would look like. Here’s the graph again. As noted previously, we limited the scale of the axes to make it easier to interpret. This omits some outliers, which we’ll touch on below. The main body of the graph shows a nice scattering of the data around the line.

Yesterday we finished up our analysis of the regression models we built using different combinations of lookback and look forward momentum values. Today, we see if we can generate good forecasts using that data. If you’re wondering why we still haven’t tested Fibonacci retracements with Bollinger Band breakouts filtered by Chaikin Volatility, the reason is that we’re first trying to establish some rigor – albeit modest – to our tests.

Yesterday, we discussed the size effects, their statistical significance (e.g., p-values), and some other summary statistics for the various momentum combinations – namely, 3, 6, 9, and 12 week lookback and look forward returns. We found that size effects were small, but a few were significant, and that in the case of the 12-by-12 combination about 75% of the results clustered in the -10% to 10% range for both directions – forward and back.

Welcome to the last day of the first week of 30 days of backtesting! We hope you’re enjoying the ride. If you have any questions or concerns, you can reach us at the contact details listed at the bottom of this post. On Day 6 we defined momentum rather roughly and ran a bunch of tests to identify the linear relationship between different lookback and look forward periods. However, we didn’t go into detail about the results.

Yesterday we examined the eponymous Fama-French factors to see if we could find something that will help us develop an investment strategy to backtest. It turned out the best performing factor was the market risk premium, which is essentially the return to the market in excess of the risk-free rate. In other words, the best factor is buy-and-hold! I guess that means we’ve finished 24 days early. Just buy the index.

The day has finally arrived! Time to start backtesting! We’ve always wanted to test how Fibonacci retracements with Bollinger Band breakouts filtered by Chaikin Volatility would perform while implementing rolling stop-loss updates based on the ATR scaled by the 7-day minus 5-day implied volatility rank.1 Maybe we’re getting ahead of ourselves. Expeditions are fun and it’s always thrilling to explore uncharted territory. But it’s also easy to get lost and forget that we’re ultimately trying to generate superior, risk-adjusted returns.

We’re four days in and you’re probably wondering when are we actually going to start backtesting?! The answer is that while it is natural to want to rush to the fun part – the hope and elation of generating outsized returns and Sharpe Ratios greater than 2 – the reality is getting the foundation right should serve us well in the future. Or so that’s what we always hear from those longer in the tooth.

Yesterday we investigated the effect of using the 200-day simple moving average (200SMA) as a proxy for a rules-based investing method. The idea was to approximate what a reasonably rational actor/agent might do in addition to the buy-and-hold approach. When folks talk about research, backtesting, and forecast comparisons, they usually use a naive model against which one compares performance. In econometric forecasting, that naive approach is often represented mathematically as \(f(x_n) = x_{n-1}\) or something like that.

On Day 1, we decided on a few benchmarks to use for our backtest. That is, a 60-40 and 50-50 weighting of the SPY and IEF ETFs. What we want to add in now is the Hello World version of trading strategies – the 200-DAY MOVING AVERAGE! Why are we adding this to our analysis? As we pointed out yesterday, the typical benchmark against which to compare a trading strategy is buy-and-hold.

Yesterday we set out our plan to backtest a strategy using the SPY ETF, which tracks the S&P 500. Before we commence, we obviously need to establish a baseline. What metrics will we use to assess the strategy? How will we define success? What benchmarks will we use? Typically, for a single asset strategy the comparison is buy-and-hold performance. That is, if you’re using Fibonacci retracements with Bollinger Band breakouts filtered by Chaikin Volatility to generate buy and sell signals, you’ll usually compare the performance of that strategy to one in which you bought the underlying on the first day of the test and held it until the end.

Data Science

Day 10: Residuals

Day 9: Forecast

Day 8: Baseline effects

Day 7: Size effects

Day 6: Momentum

Day 5: Trifactor

Day 4: First analysis

Day 3: Metrics

Day 2: Hello World

Day 1: Benchmarks