Introduction to Portfolio Backtesting

Prerequisites

CRSP and Compustat Data Familiarity
Basic Programming

(There is no great book on the subject. Perhaps the closest is Andrew Ang’s “Asset Management.”)

Major Homework Tasks For Spring-I

Task 0: Learn to Program

Task 1: CRSP

Use CRSP To Create Firm-Size Data Sets (2 Weeks, early summer)

This task should be done individually by each and every student.
It is easiest to download the relevant data from WRDS first and then writing the program on a laptop.
For each stock and year calculate the marketcap on the last day of the year.
Calculate the rank of each stock on the last day of the year. (NA if not trading on last day of year)
Save your data as a csv data set:

permno, cusip, ticker, year, yyyymmdd, mktcap, rank(mktcap)
Handcheck outliers.
- For example, which firms had the biggest change in rank from year to year? Does this seem right?
Create some test cases, e.g.,

assert( (permno==10000) and (year==1986) and (rank(mktcap)==223) )

Task 2: Program CRSP

Create Annual Return Data Sets (1 Week, early summer)

Write a flexible function that can calculate holding rates of return
- a holding rate (i.e., compound rate of return) is not the percent change in price. Use the “1 plus” formula.
Use this function to create four data sets:
- Create one data set with annual Jan-Dec buy-and-hold rates of return for all stock-years.
- Create one data set but starting Jan 10 instead of Jan 1.
- Do the same, but restrict to stocks that ended the previous year in the top 2,000 stocks.
you can alternatively stick all data into one data set
Save your result as a csv data set:

permno, islarge, cusip, ticker, year, mktcap-last-year, rank(mktcap-last-year), yyyymmdd.1, yyyymmdd.T, ret.jan1dec31, ret.jan10dec31
Handcheck outliers
- For example, which firms had the biggest returns in the year? Does this seem right?
- For example, which firms had the biggest difference in allrjan1dec and allrjan10dec?
Create some test cases, e.g.,

assert( (permno==10000) and (year==1986) and (rank(mktcap)==223) )

Task 3: Compustat

Create a Compustat-based data set (2 weeks, early summer)

For each stock and year, pull off the earnings as of June 30.
- Be careful about how Compustat uses fiscal year.
When a stock ends its fiscal year in December, you want to calculate year-to-date earnings.
Handcheck a few outliers against SEC or Bloomberg information.
Make sure that you have the timing correct for one firm ending fiscal year in March and one ending in October.
Create some test cases, e.g.,

assert( (gvkey==1005) and (year==1986) and (earnings==2.0) )

Finance and Portfolio Basics

Data
Factors
(Abnormal) Performance (Alphas)
Exposures (Betas)
Regressions
Signals
Strategies
Timing

Important Finance Data Sources

CRSP
Compustat
Bloomberg
Real-Time Providers of stock and other data
TAQ

Warning: Many real-time vendors of cheap financial data introduce survivorship bias by dropping stocks that are no longer trading. They also often forget about dividends.

CRSP is also pretty good at defining what does and does not remain a firm in a merger or name change.

Important: percent changes in price are not rates of return!

Thinking of Stock Return Data

Stocks are an irregular of [securities,trading days].
Too many stock-days to shoehorn into a regular panel.
- 1963 is typical start, due to Compustat data availability.
- 50k stocks (not 5k). 23k (14k) trading days from 1926 (1963).
- would be about 10GB for daily returns alone in regular panel.
- but sparse, so more like 2GB.
- for easy processing, you need 3-5 times as much RAM as data.
Annual data is much smaller and fairly easy to handle.
you often need to “transpose” the “matrix.”
- we may want all stocks on a given day [a row?]
- we may want all days for a given stock [a column?]

Stock Data Questions

What fraction of stocks disappear every year?
- about 10%. Not necessarily negative, due to buyouts.
Does this create “survivorship” bias?
- not on crsp, because most delistings are announced ex-ante, plus they offer a delisting rate of return
How should you deal with a $(-90\%, 1000\%)$ return sequence?
- Was this a price recording blip or not?
How do you deal with firms that were not traded for a long time, and then suddenly came back?
- How does CRSP deal with it?
How does stock price data compare to other data? Indexes? Options? Futures? Bonds? Private Equity? Compustat?
- stocks are typically cleaner and practically marked to market at day’s end.
Should you worry about non-normality?
- Not for stocks. Even though it isn’t perfect.
- But we aware of non-representativeness, esp with respect to (hard to estimate) rare events.

ASAM

Fortunately, ASAM is not a high-activity fund
We buy-and-hold roughly for one calendar year.
This can also make data handling easier, too.
- we can work with annual stock returns.
- we can create them in one first pass
- we can throw out firms that are too small.
- oh, and you already created these data sets in your tasks 1 and 2

Finance Questions

What is abnormal performance?
- actual minus normal performance
What is “normal” or expected performance?
- shit, we don’t have a great benchmark model.
- presumably risk- and liqudity related, not idiosyncratic
What kind of things do you believe we should control for? Stuff that gives us high but not abnormal unusual returns?
- Market exposure
- X-costs
- Liq-risk
- But what about “value”?
Market exposure is not necessarily the CAPM.
- Could be a simple market model.
- if you have a stock that has a beta of 2 and the market went up by 10%, your stock should go up by 20%.
- the CAPM is related but says somethign more and different
  - it is about how beta should relate to the intercept alpha
How could you have a -8% performance but a +6% abnormal performance?
- your portfolio had a market-beta of 2
- the market (net of rf) dropped 7%
- your portfolio (net of rf) dropped 8%
- and your benchmark model was the CAPM.

What is a Factor?

Net-of-rf rate of return on the stock market
Value vs. Growth: HML (high B/M vs low B/M)
Firm Size: SMB (small minus big)
Robust Minus Weak: RMW
Conservative Minus Aggressive: CMA
Momentum: Rate of return from –2 to $\approx$ –12 months.
Reversal: Rate of return in previous month (–1).

Other factors are at Ken French’s Online Data.

Definition of Factor

A factor is a time-series of the rate of return on a zero-investment (self-financed) pfio, formed on the basis of an a-priori known signal or characteristic.

What is not a factor?

A “characteristic” is not a factor, but something that attaches to a firm-month. For example, IBM’s marketcap was $109.2B in Oct 2018.
A “factor-loading” is not a factor. Factor-loadings are like market-betas. They are specific to stocks.

Attribution

How could you have a –8% performance but a +6% abnormal performance?
- your portfolio had a market-beta of 2
- the market (net of rf) dropped 7%
- your portfolio (net of rf) dropped 8%
- and your benchmark model was the CAPM.
your total return is the sum of the abnormal return plus the betas times the factor realizations.
- here, abnormal was +6%
- due to market was 2*(–7%)
- and $6\% + 2\cdot(-7\%)= -8\%$.
note that you chose to have stocks with a beta of 2, too!
- this did not fall from heaven
- you could have hedged this market-risk
- in which case, you could have had a market-beta of 0 and just abnormal = normal = 6% rate of return

Signals

The hedge fund jargon is “signal.” A signal means some numeric value that tells you what to invest in.
To test whether a “signal” is useful, we want to ask:
- in a given month, did stocks that have more signal perform better in terms of rate of return later on?
The signal must be “comfortably” known ahead of time.
- The earnings for 2017 are not known on 1/1/2018!
- (They tend to be known about 4-5 months after the fiscal year end.)
Aggregate statistic are typically not signals, because they are the same for all stocks.
- GDP growth is useless
Signal Examples
- each stock’s market cap,
- its book/market ratio,
- its investment rate,
- its stock’s beta,
- its sigma,
- its momentum,
- the age of the parents of the CEO
- etc

What Kind of Stock Investment Strategies Are There?

Long Only
- ASAM’s domain
- Implicitly “sort-of-short” by not covering some
Long-Short
- (Often relatively) immune to overall market movements
(Market) Timing
- Move in and out of stocks.
Event-Related
- e.g., earnings drift
- requires a lot of attention

Investment Strategies

An investment strategy is a function that maps (known) signal(s) to an investment (pfio weights). Typically but not always, it is a monotonic function.
Here is a silly strategy:
- If marketcap $>$ $100 million and marketcap $<$ $500 million, and book-to-market ratio (call it BM) is greater than 1.0, buy BM$^2$ dollars worth of shares in this stock (long leg).
- If marketcap $>$ $500 million, and book-to-market ratio is greater than 0.8, short $\sqrt{\text{BM}}$ dollars worth of shares in this stock (short leg).
- All others, invest $0.
It is very common to scale investment strategies:
- Scale the pfio to invest $1 long and $1 short, so that it is a zero-investment strategy.
- For example, if your long leg goes from $1 to $1.25 and the short leg from $1 to $1.10, you will have earned $0.15.
- This is not a rate of return, because the net cost was (academically) zero.
- The strategy’s rate of return will be a lot easier to interpret (see below).
PS: A signal is more generic than investment weights.
- You could just call the weights from a zero-investment investment strategy a signal, too. The weights are known.
- But a signal is not necessarily a zero-investment strategy.

What To Predict: The Period-Ahead Rate of Return

The dependent variable is always a , typically over one month or year, and usually future.
For each stock, we have a lagged (known) signal predicting current return
- $\equiv$ current signal predicting future return.
The independent variables in the regression input can be “signals” or “signal-based pfio investment.” Again, these signals typically must vary stock by stock.
We can subtract the risk-free (or any other aggregate) rate of return from the dependent stock return, and it will not make much if any difference. It simply shifts all stocks the same way up or down a little bit, which does not change the slope that we are interested in.

Testing Methods

There are two basic methods to test for cross-sectional signals/ strategies. This will become clear soon.

[Fama-MacBeth]: These are “pooled averages of coefficients from many (monthly) cross-sectional regressions.” They work directly with signals.
[Fama-French]: This is one time-series regression given a strategy’s returns. The difficulty here is that we must define our strategy first.

I will explain them in the next sessions.

In addition, there is a wrong method, which can be useful because it can make quick-and-dirty exploration simpler:

[One Giant Pooled Regression]: Regress future return on your independent variables (signals). - Do trust the T-stats—if you want, mentally divide them by 10. - This should never be used for real, only for exploratory work.