Fama-Macbeth X-Sect Method (1973)

Explanation of FM Technique

Fama-Macbeth (FM) is the easiest correct way to do a quick check whether a signal is likely to work.
(One giant pooled regression is even easier, but should only be used at the exploratory phase, because its standard errors are incorrect.)

One Regression in Each Month

Run one cross-sectional regression in one month.
- Ignore all other months for the moment.
- The regression inputs are (constructed from) the 2,000 or so stocks
- Each stock has (lagged) signals and a (subsequent) stock return.
- This one-month x-sect regression gives one coefficient per signal
- x-sect means cross-sectional
Can you believe the OLS T-statistics on this one regression’s coefficients?
- Absolutely not. The residuals are correlated, e.g., by industry. OLS requires iid.
- ergo, you will ignore the T-stats completely
- keep only the coefficients
PS: It can be shown that the x-sect coefficient are also the rate of return on a zero-investment portfolio that invested according to the signal.

Reflect on How Many Months Should Be In Line

Even for the best of signals and strategies, how often do you think there will be positive abnormal performance?
- Maybe 51-53 out of 100 months?
- PS: The questions is indicative, but not fully insightful
  - you can build a bad E(r) strategy that wins 95 out of 100 months, too.
  - like doubling up on roulette
How many months of performance data on a strategy do you want to use?
- As many as possible?
- What is still relevant? Should 1933 still count? 1973? 1993?

Combine Many Monthly One-Month Regressions

$\Longrightarrow$ after you run the regressions in each month, you have a time-series of coefficients
- and each of which is also the monthly rate of return on a pfio.
Are the portfolio returns (coefficients) independent across months?
- Most likely yes.
Can you rely on normality inference for the mean and T-stat of the time-series of monthly rates of return and coefficients?
- Yes. They they are nearly uncorrelated
- If not, you could get rich!

Why the Name FM?

This method is generally referred to as a Fama-Macbeth (1973) (FM) regression.
- FM did not invent it, but they did it much better.
Nowadays, FM means simply ``pooled time-series coefficient averages from many cross-sections.’’
Despite its age, FM is not obsolete. Every quant fund in the world runs these.
Everyone halfway competent knows this technique.

A Stylized Example of FM

Example Data

Four Stocks, A through D
Seven Years, 1-7 (often also months instead of years)
One Signal (i.e., one signal per stock per year).
- The signal could be the market-cap, for example.

# one signal, with realization for each of 4 stocks A-D, and 7 years
sig1A <- c( 3,  4,  1, -3,  0,  2,  NA )
retA <- c( NA, -6, 12, -1, -22, 6,  -6 )/100.0

sig1B <- c(-2, -2,  1,  1,  2,  4,  NA )
retB <- c( NA, -2,-34,  1,  13, 12, 36 )/100.0

sig1C <- c(-1,  3, -4, -1,  4, -2,  NA )
retC <- c( NA, -7, 25,  2, -19, -6, -10)/100.0

sig1D <- c( 0,  4, -2,  2, -1,  1,  NA )
retD <- c( NA, -2, 29, 11, -5, -30, 7  )/100.0

Let’s print this better

d <-  data.frame( time=1:length(sig1A), sig1A, retA, sig1B, retB, sig1C, retC, sig1D, retD )
p(d)

Known Signals

To be investable, the signal must be known before you form the portfolio.
Example:
- Do you know the sig1A[t=2]=4 in time 2? Can you use it to short A to earn the retA[t=2]= -0.06 rate of return?
- In this example, no. The signal is known only at the end of year 2.
You can only use sig[t] to invest at the end of time t to earn the rate of return at time t+1.
PS: (You can ignore the risk-free rate. Its effects on the slopes are always minimal.)

Step 1: Align Signals With Subsequent Returns

Align known signals with (subsequent) rates of returns in each month.
You want to regress the lagged signal on the rate of return
- you could also say you want to regress the future rate of return on the current signal
Call the lagged signal lsig. This is how R aligns it properly:

d <- within(d, {
    lsig1D <- c(NA, sig1D[1:6])
    lsig1C <- c(NA, sig1C[1:6])
    lsig1B <- c(NA, sig1B[1:6])
    lsig1A <- c(NA, sig1A[1:6])
})
d <- subset(d, TRUE, select=c(lsig1A,retA,lsig1B,retB,lsig1C,retC,lsig1D,retD))
p(d)

Make sure you understand this new data alignment. Look back at the main table.

Step 2: Regress Signals on Subsequent Returns in One Month

Now regress the subsequent rate of return on the known signal in x-section.
Here, the first observation are the rates of return at time 2, explained by signals from time 1.
the dependent variable is the rate of return on each stock:

y <- subset( d[2,], TRUE, select=c(retA, retB, retC, retD) )
p( y )

the independent variable is the lagged signal on each stock:

x <- subset(d[2,], TRUE, c(lsig1A,lsig1B,lsig1C,lsig1D))  ## the lagged signal!
p(x)

Run one x-sect regression, explaining y with x. For time 2, this gives:

print( coef( lm( as.numeric(y) ~ as.numeric(x)) ) )

##   (Intercept) as.numeric(x) 
##       -0.0425       -0.0050

Note that the regression automatically included an intercept
- We will ignore this intercept. Consider it useless.
- The second coefficient, the slope, tells us whether a higher (lagged) signal meant, on average, a higher average rate of return in this one month.

Step 3: Run an Equivalent Regression in Every Month

Let’s repeat such x-sect regressions, one in each year:

coefs <- c()
for (t in 2:7) {
  y <- with(d[t,], c(retA,retB,retC,retD ))
  x <- with(d[t,], c(lsig1A,lsig1B,lsig1C,lsig1D))

  coefs <- rbind(coefs, coef( lm( y ~ x ) ))
  options(digits=3)
  cat("\nThe two coefficients explaining returns at time ", t, " are:\n")
  print( last(coefs) )
}

coefs <- as.data.frame(coefs)
names(coefs) <- c("a.notused", "b.sigeffect")
p(coefs)

## 
## The two coefficients explaining returns at time  2  are:
##      (Intercept)      x
## [1,]     -0.0425 -0.005
## 
## The two coefficients explaining returns at time  3  are:
##      (Intercept)      x
## [2,]      -0.134 0.0949
## 
## The two coefficients explaining returns at time  4  are:
##      (Intercept)        x
## [3,]      0.0231 -0.00944
## 
## The two coefficients explaining returns at time  5  are:
##      (Intercept)      x
## [4,]      -0.069 0.0541
## 
## The two coefficients explaining returns at time  6  are:
##      (Intercept)      x
## [5,]     -0.0895 0.0356
## 
## The two coefficients explaining returns at time  7  are:
##      (Intercept)      x
## [6,]      -0.016 0.0668

Because the regressions include an intercept, R calls the coefficient on signal A coefs[2] and not coefs[1]

Step 4: Run a T-Test On Coefficients’ Time-Series

The average slope coefficient (b.sigeffect) measures how signals influence subsequent rates of return, on average.
You can run a T-test to see whether this effect has a mean of zero:

mt <- function(x) c(mean=mean(x), sd=sd(x), T=mean(x)/sd(x)*sqrt(length(x)-1))

mt(coefs[,2])  ## Effect

##   mean     sd      T 
## 0.0395 0.0410 2.1516

Conclude: Here, stocks with more signals had higher average rates of return, almost statistically significantly so.
Q: What does the regression coefficient of 0.0395 suggest?
- for example, compare a stock with a signal of 0 vs one with a signal of 3.
- how would their rates of return be expected to differ?
You could include several signals at the same time

FM Implementation Details

Important Advice

Don’t be sloppy explaining what you are doing.
- Tell sample (types and time periods)
- Describe exact alignment
- Tell selection criteria and throw-out criteria
- Tell signals
- Tell returns
Specify exactly what your universe / data selection requirements are.
For example, here is a set of useful criteria. To be included, a stock had to
- be among the 2,000 largest-lagged-marketcap stocks (good for ASAM),
- with $>200$ days of valid returns over the calendar year preceding the investment (good for ASAM),
- share codes 10 or 11 (always a good idea),
- a marketcap of at least $100 million (good for ASAM),
- and have a price above $1 at the end of the preceding period (good idea).
Be very careful with ex-post data requirements. Ideally none.
- Super-Bad: have a stock price above $1 at the end of the period.
Specify exactly what the signal (the independent variable in the regs) is.
For example,
- An indicator that takes 1 if
  - the lag marketcap is at least $500 million,
  - the firm is in the S&P100 index,
  - and the lag-book-to-market ratio of the firm is among the top 50 stocks.
  - Otherwise, 0.
- Note that even simple signals can have many variations. For example,
  - Not an indicator, but the lag-book-to-market-ratio itself
  - 0 if the firm is not in the S&P100.
- Note that each signal-function variation is a different strategy!
- You can try many variants, but don’t go crazy or you will just overfit (=torture) the data.
Again, be precise. Do not forget that the inclusion criteria and signal must be known by the time you plan to invest. It is easy to get miraculous results when you make a mistake here.
In your description, show the signal inputs for a few sample stocks, together with what subsequent stock returns these signal inputs are actually predicting in your FM reg.

Are your F-M Inferences Robust? Still Relevant?

Print/plot the two (or more) FM coefficients for each year for about 40 years worth of coefficients. Do not forget to tell us the N and the $R^2$ of each regression.
- You can usually ignore the intercept. (It has nothing to do with the finance ``alpha.’’)
- Your prime interest is the (average) coefficient on your signal across all 40 annual coefficients.
Do not run regression just among stocks who have signal, but among all stocks.
Perhaps give an earnings or dividend-yield example
Fama-Macbeth regressions are sometimes run with log returns. There are good reasons why one should do so, and good reasons why one should not do so. Log-returns are not real returns.

ASAM Considerations

Analysis Time Units

For ASAM, we can work with yearly rates of return, which is much more convenient.
This is because ASAM mostly wants to avoid active trading.
Recall first assignments:
- Create an annual stock return data set first.
- Create a second data set, where you exclude the first week in January (because the first Jan week is often strange!)

ASAM Summer I

Learn FM techniques
Search for good signals with FM.

ASAM Summer II

Confirm strategy with FF techniques.
Get Ready to Implement!

Fama-Macbeth X-Sect Method (1973)

Ivo Welch

9/30/2019

Explanation of FM Technique

One Regression in Each Month

Reflect on How Many Months Should Be In Line

Combine Many Monthly One-Month Regressions

Why the Name FM?

A Stylized Example of FM

Example Data

Known Signals

Step 1: Align Signals With Subsequent Returns

Step 2: Regress Signals on Subsequent Returns in One Month

Step 3: Run an Equivalent Regression in Every Month

Step 4: Run a T-Test On Coefficients’ Time-Series

FM Implementation Details

Important Advice

Are your F-M Inferences Robust? Still Relevant?

ASAM Considerations

Analysis Time Units

ASAM Summer I

ASAM Summer II