This Rmarkdown document simulates a panel of stock returns and signals.
All stock returns are already quoted net of the risk-free rate of return. Thus, formed portfolios are all zero investment, too.
Each stock-month’s rates of returns are influenced by two aspects:
The rate of return on the market, multiplied by a firm-specific market-beta.
A firm-specific measurable lagged signal, x.
There is also a second signal, y, but it is constructed to be useless on the margin. (It is not useless in itself, because it correlates with x).
lagseries <- function( x ) c( NA, x[1:(length(x)-1)] )
lagpanelseries <- function( x, fmid=fmid, timeid=NULL ) ifelse( lagseries(fmid)==fmid, lagseries(x), NA )
The second function is a “grouping” histogram-ming function. It assigns a vector of numbers by rank into one of g groups.
Let me illustrate this with an example in which 12 random numbers are assigned to tertiale groups:
grouphist <- function(x, g=NQUANTILE) cut(x, quantile(x, (0:g)/g, na.rm=T) + c(-1e-10, rep(0,g)), labels=F)
x <- sample(1:12) ## a random rearrangement
p(rbind(x, grouphist(x,3))) ## show off both x and which group it goes into
We can also confirm that the frequency of observations in each group is 4:
print(table(grouphist(x,3)))
rm(x)
##
## 1 2 3
## 4 4 4
options(digits=3) ## nicer output
set.seed(0) ## replicability
sim.world <- function(N,M) {
d <- data.frame( fmid=rep(NA,M*N), month=NA, ri=NA, x=NA, z=NA )
mf <- rnorm(M, 0.01, 0.10)
ri <- betai <- xit <- zit <- NA; fmid <- 10000
for (k in 1:(M*N)) {
if ((k %% M)==1) {
## next firm starts
fmid <- fmid+1; m <- 1
ri <- NA
betai <- rnorm( 1, 1.0, 1.0 )
xit <- rnorm( 1, 100.0, 2.0 ) - (betai-1)*5
zit <- rnorm( 1, 0.0, 2.0 ) + xit/10
}
d[k,] <- c(fmid, m, ri, xit, zit )
ri <- betai*mf[m] + 0.2*(xit-100) + rnorm(1,0,0.1); ri <- round(ri, 2)
xit <- xit + rnorm(1)
zit <- xit + zit + rnorm(1)
m <- m+1
}
d <- round(d, 4)
d <- within(d, rm <- ave( ri, month ))
d
}
Here is an example of what the sim.world
generates, albeit with 2 firms and 3 months only:
p(sim.world(2,3))
Now we are ready to generate our real data set:
N <- 300 ; M <- 120 ## number of months and firms
d <- sim.world( N, M ); ## generate our large data set
First, we must decide which stocks go into which portfolio in each month. This can only be done on the basis of lagged information available to the investor. So we align past signals with current returns.
d <- within(d, {
lag.x <- lagpanelseries(x, fmid)
lag.z <- lagpanelseries(z, fmid)
})
Next, we want to find cutoffs that categorize each variable into NQUANTILE
groups.
For this example, we shall work with tertiale portfolios.
NQUANTILE <- 3
d <- do.call("rbind",
by( d, d$month, function(dd) { with(dd,
cbind( dd, g.lag.z=grouphist( lag.z, NQUANTILE ), g.lag.x=grouphist( lag.x, NQUANTILE ) )) })
)
For a two-dimensional rate of return table, we also want to put each stock return into buckets based on both attributes.
d <- within(d, g.lag.cross <- g.lag.x*10 + g.lag.z )
Each row in d now has a g.lag.cross
variable that designates into which bucket we want to put its rate of return. Here is the number of observations in each bucket (portfolio):
table(d$g.lag.cross) ## print frequencies of each group
##
## 11 12 13 21 22 23 31 32 33
## 9384 2399 117 2308 7281 2311 208 2220 9472
One problem with independent cutoffs when signals are correlated is that, in any given month, different portfolios have different numbers of observations. In our case, there were few firm-months with high X’s and Low Z’s or vice-versa. When we have few observations, our rate of return averages is noisy.
To avoid this problem, researchers often use clever double-sorts. This is explained in bt-sig2inv.
For each bucket, into which each stock-month (row) has been put in, calculate the average (equal-weighted) rate of return in each month:
pfio.ret.ts <- by(d, d$g.lag.cross, function(dd) { ## by bucket
rpfio <- rep(NA, M)
for (m in 1:M) rpfio[m] <- with(subset(dd, month==m), mean(ri,na.rm=TRUE))
rpfio
})
## some R magic to get the portfolio returns back into a matrix
pfio.ret.ts <- as.data.frame(t(do.call("rbind",pfio.ret.ts)))
## also, calculate the rate of return on the market
rm <- do.call("c",by(d, d$month, function(dd) dd$rm[1], simplify=FALSE ))
p(round(pfio.ret.ts,3)) ## show some sample output
The row name is the month. The column name is the portfolio id.
Recall the problem of “uneven number of observations in groups”? Here, we have a few extreme cases. In these extremes, some portfolios can even have zero rates of returns in some months, because there were no observations!
Now we can run a Black-Jensen-Scholes/Fama-French analysis. We will do this relative to the risk-neutral model (which are just the means, all already net of the risk-free rate) and relative to a 1-factor CAPM.
The following are the rate of return values portfolio by portfolio:
NPFIOS <- ncol( pfio.ret.ts )
alphas <- rep(NA,NPFIOS)
for (pfioid in 1:NPFIOS) {
rs <- pfio.ret.ts[,pfioid]
if (sum(!is.na(rs))==0) next
alphas[pfioid] <- coef(lm(rs ~ rm))[1]
cat("\tPfio ", names(pfio.ret.ts)[pfioid], ": ",
" mean=", sprintf("%+.2f", mean(rs, na.rm=TRUE)),
" alpha=", sprintf("%+.2f",alphas[pfioid]), "\n")
}
## Pfio 11 : mean= -1.96 alpha= -2.01
## Pfio 12 : mean= -1.13 alpha= -1.15
## Pfio 13 : mean= -1.02 alpha= -1.03
## Pfio 21 : mean= -0.21 alpha= -0.26
## Pfio 22 : mean= +0.05 alpha= +0.02
## Pfio 23 : mean= +0.31 alpha= +0.30
## Pfio 31 : mean= +1.17 alpha= +1.12
## Pfio 32 : mean= +1.26 alpha= +1.23
## Pfio 33 : mean= +1.96 alpha= +1.96
The meaning of our analysis becomes a whole lot clearer in a two-dimensional table:
alphas <- matrix(alphas , nrow=NQUANTILE)
rownames(alphas) <- paste0("z",1:NQUANTILE)
colnames(alphas) <- paste0("x",1:NQUANTILE)
pa <- data.frame(alphas)
pa <- cbind( pa, xavg=rowMeans(pa) )
pa <- rbind( pa, zavg=colMeans(pa) )
pa <- round(pa,3)
print(pa)
p(pa)
## x1 x2 x3 xavg
## z1 -2.01 -0.260 1.12 -0.384
## z2 -1.15 0.025 1.23 0.033
## z3 -1.03 0.298 1.96 0.410
## zavg -1.40 0.021 1.44 0.020
In this world, CAPM alphas are always strongly increasing for x categories (the horizontal dimension). The pattern should be less clear along the z (vertical) dimension.
Q: How reliable is the z1-x3 alpha (average abnormal rate of return)?
Q: How reliable is the z1-x1 alpha (average abnormal rate of return)?
NOT FULLY TRUE This needs to be improved.