1. Introduction
  2. Fama-Macbeth X-Sect Method (1973)
  3. Time-Series Method**
  4. Translating Signals Into Investment Strategies
  5. Simulating World
  6. Further Important Aspects (Meta/Taxes/Factors)

Explanation of FM Technique

One Regression in Each Month

  • Run one cross-sectional regression in one month.
    • Ignore all other months for the moment.
    • The regression inputs are (constructed from) the 2,000 or so stocks
    • Each stock has (lagged) signals and a (subsequent) stock return.
    • This one-month x-sect regression gives one coefficient per signal
    • x-sect means cross-sectional
  • Can you believe the OLS T-statistics on this one regression’s coefficients?
    • Absolutely not. The residuals are correlated, e.g., by industry. OLS requires iid.
    • ergo, you will ignore the T-stats completely
    • keep only the coefficients
  • PS: It can be shown that the x-sect coefficient are also the rate of return on a zero-investment portfolio that invested according to the signal.

Reflect on How Many Months Should Be In Line

  • Even for the best of signals and strategies, how often do you think there will be positive abnormal performance?
    • Maybe 51-53 out of 100 months?
    • PS: The questions is indicative, but not fully insightful
      • you can build a bad E(r) strategy that wins 95 out of 100 months, too.
      • like doubling up on roulette
  • How many months of performance data on a strategy do you want to use?
    • As many as possible?
    • What is still relevant? Should 1933 still count? 1973? 1993?

Combine Many Monthly One-Month Regressions

  • \(\Longrightarrow\) after you run the regressions in each month, you have a time-series of coefficients
    • and each of which is also the monthly rate of return on a pfio.
  • Are the portfolio returns (coefficients) independent across months?
    • Most likely yes.
  • Can you rely on normality inference for the mean and T-stat of the time-series of monthly rates of return and coefficients?
    • Yes. They they are nearly uncorrelated
    • If not, you could get rich!

Why the Name FM?

  • This method is generally referred to as a Fama-Macbeth (1973) (FM) regression.

    • FM did not invent it, but they did it much better.
  • Nowadays, FM means simply ``pooled time-series coefficient averages from many cross-sections.’’

  • Despite its age, FM is not obsolete. Every quant fund in the world runs these.

  • Everyone halfway competent knows this technique.

A Stylized Example of FM

Example Data

  • Four Stocks, A through D
  • Seven Years, 1-7 (often also months instead of years)
  • One Signal (i.e., one signal per stock per year).
    • The signal could be the market-cap, for example.
# one signal, with realization for each of 4 stocks A-D, and 7 years
sig1A <- c( 3,  4,  1, -3,  0,  2,  NA )
retA <- c( NA, -6, 12, -1, -22, 6,  -6 )/100.0

sig1B <- c(-2, -2,  1,  1,  2,  4,  NA )
retB <- c( NA, -2,-34,  1,  13, 12, 36 )/100.0

sig1C <- c(-1,  3, -4, -1,  4, -2,  NA )
retC <- c( NA, -7, 25,  2, -19, -6, -10)/100.0

sig1D <- c( 0,  4, -2,  2, -1,  1,  NA )
retD <- c( NA, -2, 29, 11, -5, -30, 7  )/100.0

Let’s print this better

d <-  data.frame( time=1:length(sig1A), sig1A, retA, sig1B, retB, sig1C, retC, sig1D, retD )
p(d)

Known Signals

  • To be investable, the signal must be known before you form the portfolio.

  • Example:

    • Do you know the sig1A[t=2]=4 in time 2? Can you use it to short A to earn the retA[t=2]= -0.06 rate of return?
    • In this example, no. The signal is known only at the end of year 2.
  • You can only use sig[t] to invest at the end of time t to earn the rate of return at time t+1.

  • PS: (You can ignore the risk-free rate. Its effects on the slopes are always minimal.)

Step 1: Align Signals With Subsequent Returns

  • Align known signals with (subsequent) rates of returns in each month.

  • You want to regress the lagged signal on the rate of return

    • you could also say you want to regress the future rate of return on the current signal
  • Call the lagged signal lsig. This is how R aligns it properly:

d <- within(d, {
    lsig1D <- c(NA, sig1D[1:6])
    lsig1C <- c(NA, sig1C[1:6])
    lsig1B <- c(NA, sig1B[1:6])
    lsig1A <- c(NA, sig1A[1:6])
})
d <- subset(d, TRUE, select=c(lsig1A,retA,lsig1B,retB,lsig1C,retC,lsig1D,retD))
p(d)
  • Make sure you understand this new data alignment. Look back at the main table.

Step 2: Regress Signals on Subsequent Returns in One Month

  • Now regress the subsequent rate of return on the known signal in x-section.

  • Here, the first observation are the rates of return at time 2, explained by signals from time 1.

  • the dependent variable is the rate of return on each stock:

y <- subset( d[2,], TRUE, select=c(retA, retB, retC, retD) )
p( y )
  • the independent variable is the lagged signal on each stock:
x <- subset(d[2,], TRUE, c(lsig1A,lsig1B,lsig1C,lsig1D))  ## the lagged signal!
p(x)
  • Run one x-sect regression, explaining y with x. For time 2, this gives:
print( coef( lm( as.numeric(y) ~ as.numeric(x)) ) )
##   (Intercept) as.numeric(x) 
##       -0.0425       -0.0050
  • Note that the regression automatically included an intercept

    • We will ignore this intercept. Consider it useless.

    • The second coefficient, the slope, tells us whether a higher (lagged) signal meant, on average, a higher average rate of return in this one month.

Step 3: Run an Equivalent Regression in Every Month

  • Let’s repeat such x-sect regressions, one in each year:
coefs <- c()
for (t in 2:7) {
  y <- with(d[t,], c(retA,retB,retC,retD ))
  x <- with(d[t,], c(lsig1A,lsig1B,lsig1C,lsig1D))

  coefs <- rbind(coefs, coef( lm( y ~ x ) ))
  options(digits=3)
  cat("\nThe two coefficients explaining returns at time ", t, " are:\n")
  print( last(coefs) )
}

coefs <- as.data.frame(coefs)
names(coefs) <- c("a.notused", "b.sigeffect")
p(coefs)
## 
## The two coefficients explaining returns at time  2  are:
##      (Intercept)      x
## [1,]     -0.0425 -0.005
## 
## The two coefficients explaining returns at time  3  are:
##      (Intercept)      x
## [2,]      -0.134 0.0949
## 
## The two coefficients explaining returns at time  4  are:
##      (Intercept)        x
## [3,]      0.0231 -0.00944
## 
## The two coefficients explaining returns at time  5  are:
##      (Intercept)      x
## [4,]      -0.069 0.0541
## 
## The two coefficients explaining returns at time  6  are:
##      (Intercept)      x
## [5,]     -0.0895 0.0356
## 
## The two coefficients explaining returns at time  7  are:
##      (Intercept)      x
## [6,]      -0.016 0.0668
  • Because the regressions include an intercept, R calls the coefficient on signal A coefs[2] and not coefs[1]

Step 4: Run a T-Test On Coefficients’ Time-Series

  • The average slope coefficient (b.sigeffect) measures how signals influence subsequent rates of return, on average.

  • You can run a T-test to see whether this effect has a mean of zero:

mt <- function(x) c(mean=mean(x), sd=sd(x), T=mean(x)/sd(x)*sqrt(length(x)-1))

mt(coefs[,2])  ## Effect
##   mean     sd      T 
## 0.0395 0.0410 2.1516
  • Conclude: Here, stocks with more signals had higher average rates of return, almost statistically significantly so.

  • Q: What does the regression coefficient of 0.0395 suggest?

    • for example, compare a stock with a signal of 0 vs one with a signal of 3.
    • how would their rates of return be expected to differ?
  • You could include several signals at the same time

FM Implementation Details

Important Advice

  • Don’t be sloppy explaining what you are doing.

    • Tell sample (types and time periods)
    • Describe exact alignment
    • Tell selection criteria and throw-out criteria
    • Tell signals
    • Tell returns
  • Specify exactly what your universe / data selection requirements are.

  • For example, here is a set of useful criteria. To be included, a stock had to

    • be among the 2,000 largest-lagged-marketcap stocks (good for ASAM),
    • with \(>200\) days of valid returns over the calendar year preceding the investment (good for ASAM),
    • share codes 10 or 11 (always a good idea),
    • a marketcap of at least $100 million (good for ASAM),
    • and have a price above $1 at the end of the preceding period (good idea).
  • Be very careful with ex-post data requirements. Ideally none.

    • Super-Bad: have a stock price above $1 at the end of the period.
  • Specify exactly what the signal (the independent variable in the regs) is.

  • For example,

    • An indicator that takes 1 if
      • the lag marketcap is at least $500 million,
      • the firm is in the S&P100 index,
      • and the lag-book-to-market ratio of the firm is among the top 50 stocks.
      • Otherwise, 0.
    • Note that even simple signals can have many variations. For example,
      • Not an indicator, but the lag-book-to-market-ratio itself
      • 0 if the firm is not in the S&P100.
    • Note that each signal-function variation is a different strategy!
    • You can try many variants, but don’t go crazy or you will just overfit (=torture) the data.
  • Again, be precise. Do not forget that the inclusion criteria and signal must be known by the time you plan to invest. It is easy to get miraculous results when you make a mistake here.

  • In your description, show the signal inputs for a few sample stocks, together with what subsequent stock returns these signal inputs are actually predicting in your FM reg.

Are your F-M Inferences Robust? Still Relevant?

  • Print/plot the two (or more) FM coefficients for each year for about 40 years worth of coefficients. Do not forget to tell us the N and the \(R^2\) of each regression.

    • You can usually ignore the intercept. (It has nothing to do with the finance ``alpha.’’)
    • Your prime interest is the (average) coefficient on your signal across all 40 annual coefficients.
  • Do not run regression just among stocks who have signal, but among all stocks.

  • Perhaps give an earnings or dividend-yield example

  • Fama-Macbeth regressions are sometimes run with log returns. There are good reasons why one should do so, and good reasons why one should not do so. Log-returns are not real returns.

ASAM Considerations

Analysis Time Units

  • For ASAM, we can work with yearly rates of return, which is much more convenient.

  • This is because ASAM mostly wants to avoid active trading.

  • Recall first assignments:

    • Create an annual stock return data set first.

    • Create a second data set, where you exclude the first week in January (because the first Jan week is often strange!)

ASAM Summer I

  • Learn FM techniques

  • Search for good signals with FM.

ASAM Summer II

  • Confirm strategy with FF techniques.

  • Get Ready to Implement!