Fama-Macbeth (FM) is the easiest correct way to do a quick check whether a signal is likely to work.
(One giant pooled regression is even easier, but should only be used at the exploratory phase, because its standard errors are incorrect.)
This method is generally referred to as a Fama-Macbeth (1973) (FM) regression.
Nowadays, FM means simply ``pooled time-series coefficient averages from many cross-sections.’’
Despite its age, FM is not obsolete. Every quant fund in the world runs these.
Everyone halfway competent knows this technique.
# one signal, with realization for each of 4 stocks A-D, and 7 years
sig1A <- c( 3, 4, 1, -3, 0, 2, NA )
retA <- c( NA, -6, 12, -1, -22, 6, -6 )/100.0
sig1B <- c(-2, -2, 1, 1, 2, 4, NA )
retB <- c( NA, -2,-34, 1, 13, 12, 36 )/100.0
sig1C <- c(-1, 3, -4, -1, 4, -2, NA )
retC <- c( NA, -7, 25, 2, -19, -6, -10)/100.0
sig1D <- c( 0, 4, -2, 2, -1, 1, NA )
retD <- c( NA, -2, 29, 11, -5, -30, 7 )/100.0
Let’s print this better
d <- data.frame( time=1:length(sig1A), sig1A, retA, sig1B, retB, sig1C, retC, sig1D, retD )
p(d)
To be investable, the signal must be known before you form the portfolio.
Example:
sig1A[t=2]=4
in time 2? Can you use it to short A to earn the retA[t=2]= -0.06
rate of return?You can only use sig[t]
to invest at the end of time t to earn the rate of return at time t+1.
PS: (You can ignore the risk-free rate. Its effects on the slopes are always minimal.)
Align known signals with (subsequent) rates of returns in each month.
You want to regress the lagged signal on the rate of return
Call the lagged signal lsig
. This is how R aligns it properly:
d <- within(d, {
lsig1D <- c(NA, sig1D[1:6])
lsig1C <- c(NA, sig1C[1:6])
lsig1B <- c(NA, sig1B[1:6])
lsig1A <- c(NA, sig1A[1:6])
})
d <- subset(d, TRUE, select=c(lsig1A,retA,lsig1B,retB,lsig1C,retC,lsig1D,retD))
p(d)
Now regress the subsequent rate of return on the known signal in x-section.
Here, the first observation are the rates of return at time 2, explained by signals from time 1.
the dependent variable is the rate of return on each stock:
y <- subset( d[2,], TRUE, select=c(retA, retB, retC, retD) )
p( y )
x <- subset(d[2,], TRUE, c(lsig1A,lsig1B,lsig1C,lsig1D)) ## the lagged signal!
p(x)
print( coef( lm( as.numeric(y) ~ as.numeric(x)) ) )
## (Intercept) as.numeric(x)
## -0.0425 -0.0050
Note that the regression automatically included an intercept
We will ignore this intercept. Consider it useless.
The second coefficient, the slope, tells us whether a higher (lagged) signal meant, on average, a higher average rate of return in this one month.
coefs <- c()
for (t in 2:7) {
y <- with(d[t,], c(retA,retB,retC,retD ))
x <- with(d[t,], c(lsig1A,lsig1B,lsig1C,lsig1D))
coefs <- rbind(coefs, coef( lm( y ~ x ) ))
options(digits=3)
cat("\nThe two coefficients explaining returns at time ", t, " are:\n")
print( last(coefs) )
}
coefs <- as.data.frame(coefs)
names(coefs) <- c("a.notused", "b.sigeffect")
p(coefs)
##
## The two coefficients explaining returns at time 2 are:
## (Intercept) x
## [1,] -0.0425 -0.005
##
## The two coefficients explaining returns at time 3 are:
## (Intercept) x
## [2,] -0.134 0.0949
##
## The two coefficients explaining returns at time 4 are:
## (Intercept) x
## [3,] 0.0231 -0.00944
##
## The two coefficients explaining returns at time 5 are:
## (Intercept) x
## [4,] -0.069 0.0541
##
## The two coefficients explaining returns at time 6 are:
## (Intercept) x
## [5,] -0.0895 0.0356
##
## The two coefficients explaining returns at time 7 are:
## (Intercept) x
## [6,] -0.016 0.0668
coefs[2]
and not coefs[1]
The average slope coefficient (b.sigeffect
) measures how signals influence subsequent rates of return, on average.
You can run a T-test to see whether this effect has a mean of zero:
mt <- function(x) c(mean=mean(x), sd=sd(x), T=mean(x)/sd(x)*sqrt(length(x)-1))
mt(coefs[,2]) ## Effect
## mean sd T
## 0.0395 0.0410 2.1516
Conclude: Here, stocks with more signals had higher average rates of return, almost statistically significantly so.
Q: What does the regression coefficient of 0.0395 suggest?
You could include several signals at the same time
Don’t be sloppy explaining what you are doing.
Specify exactly what your universe / data selection requirements are.
For example, here is a set of useful criteria. To be included, a stock had to
Be very careful with ex-post data requirements. Ideally none.
Specify exactly what the signal (the independent variable in the regs) is.
For example,
Again, be precise. Do not forget that the inclusion criteria and signal must be known by the time you plan to invest. It is easy to get miraculous results when you make a mistake here.
In your description, show the signal inputs for a few sample stocks, together with what subsequent stock returns these signal inputs are actually predicting in your FM reg.
Print/plot the two (or more) FM coefficients for each year for about 40 years worth of coefficients. Do not forget to tell us the N and the \(R^2\) of each regression.
Do not run regression just among stocks who have signal, but among all stocks.
Perhaps give an earnings or dividend-yield example
Fama-Macbeth regressions are sometimes run with log returns. There are good reasons why one should do so, and good reasons why one should not do so. Log-returns are not real returns.
For ASAM, we can work with yearly rates of return, which is much more convenient.
This is because ASAM mostly wants to avoid active trading.
Recall first assignments:
Create an annual stock return data set first.
Create a second data set, where you exclude the first week in January (because the first Jan week is often strange!)
Learn FM techniques
Search for good signals with FM.
Confirm strategy with FF techniques.
Get Ready to Implement!