Simulating A Financial World and Testing For Alpha

Setup

This Rmarkdown document simulates a panel of stock returns and signals.
All stock returns are already quoted net of the risk-free rate of return. Thus, formed portfolios are all zero investment, too.
Each stock-month’s rates of returns are influenced by two aspects:
1. The rate of return on the market, multiplied by a firm-specific market-beta.
2. A firm-specific measurable lagged signal, x.
There is also a second signal, y, but it is constructed to be useless on the margin. (It is not useless in itself, because it correlates with x).

Utility Functions

Lagging

lagseries <- function( x ) c( NA, x[1:(length(x)-1)] )
lagpanelseries <- function( x, fmid=fmid, timeid=NULL ) ifelse( lagseries(fmid)==fmid, lagseries(x), NA )

Group Histograms

The second function is a “grouping” histogram-ming function. It assigns a vector of numbers by rank into one of g groups.

Let me illustrate this with an example in which 12 random numbers are assigned to tertiale groups:

grouphist <- function(x, g=NQUANTILE) cut(x, quantile(x, (0:g)/g, na.rm=T) + c(-1e-10, rep(0,g)), labels=F)

x <- sample(1:12)           ## a random rearrangement
p(rbind(x, grouphist(x,3))) ## show off both x and which group it goes into

We can also confirm that the frequency of observations in each group is 4:

print(table(grouphist(x,3)))
rm(x)

## 
## 1 2 3 
## 4 4 4

Simulation of World

options(digits=3)   ## nicer output
set.seed(0)     ## replicability

sim.world <- function(N,M) {
    d <- data.frame( fmid=rep(NA,M*N), month=NA, ri=NA, x=NA, z=NA )

    mf <- rnorm(M, 0.01, 0.10)
    ri <- betai <- xit <- zit <- NA; fmid <- 10000
    for (k in 1:(M*N)) {
        if ((k %% M)==1) {
        ## next firm starts
            fmid <- fmid+1; m <- 1
            ri <- NA
            betai <- rnorm( 1, 1.0, 1.0 )
            xit <- rnorm( 1, 100.0, 2.0 ) - (betai-1)*5
            zit <- rnorm( 1, 0.0, 2.0 ) + xit/10
        }
        d[k,] <- c(fmid, m, ri, xit, zit )

        ri <- betai*mf[m] + 0.2*(xit-100) + rnorm(1,0,0.1); ri <- round(ri, 2)
        xit <- xit + rnorm(1)
        zit <- xit + zit + rnorm(1)
        m <- m+1
    }
    d <- round(d, 4)
    d <- within(d, rm <- ave( ri, month ))
    d
}

Here is an example of what the sim.world generates, albeit with 2 firms and 3 months only:

p(sim.world(2,3))

Now we are ready to generate our real data set:

N <- 300 ; M <- 120  ## number of months and firms
d <- sim.world( N, M );  ## generate our large data set

Investor Analysis of This World

1. Classify Returns Based On Lagged Signals

First, we must decide which stocks go into which portfolio in each month. This can only be done on the basis of lagged information available to the investor. So we align past signals with current returns.

d <- within(d, {
    lag.x <- lagpanelseries(x, fmid)
    lag.z <- lagpanelseries(z, fmid)
})

Next, we want to find cutoffs that categorize each variable into NQUANTILE groups.

For this example, we shall work with tertiale portfolios.

NQUANTILE <- 3

d <- do.call("rbind",

     by( d, d$month, function(dd) { with(dd,
          cbind( dd, g.lag.z=grouphist( lag.z, NQUANTILE ), g.lag.x=grouphist( lag.x, NQUANTILE ) )) })

)

For a two-dimensional rate of return table, we also want to put each stock return into buckets based on both attributes.

d <- within(d, g.lag.cross <- g.lag.x*10 + g.lag.z )

Each row in d now has a g.lag.cross variable that designates into which bucket we want to put its rate of return. Here is the number of observations in each bucket (portfolio):

table(d$g.lag.cross)    ## print frequencies of each group

## 
##   11   12   13   21   22   23   31   32   33 
## 9384 2399  117 2308 7281 2311  208 2220 9472

One problem with independent cutoffs when signals are correlated is that, in any given month, different portfolios have different numbers of observations. In our case, there were few firm-months with high X’s and Low Z’s or vice-versa. When we have few observations, our rate of return averages is noisy.

To avoid this problem, researchers often use clever double-sorts. This is explained in bt-sig2inv.

2. Calculate Each Portfolio Return Time-Series

For each bucket, into which each stock-month (row) has been put in, calculate the average (equal-weighted) rate of return in each month:

pfio.ret.ts <- by(d, d$g.lag.cross, function(dd) {   ## by bucket
        rpfio <- rep(NA, M)
        for (m in 1:M) rpfio[m] <- with(subset(dd, month==m), mean(ri,na.rm=TRUE))
        rpfio
    })


## some R magic to get the portfolio returns back into a matrix
pfio.ret.ts <- as.data.frame(t(do.call("rbind",pfio.ret.ts)))

## also, calculate the rate of return on the market
rm <- do.call("c",by(d, d$month, function(dd) dd$rm[1], simplify=FALSE ))

p(round(pfio.ret.ts,3)) ## show some sample output

The row name is the month. The column name is the portfolio id.

Recall the problem of “uneven number of observations in groups”? Here, we have a few extreme cases. In these extremes, some portfolios can even have zero rates of returns in some months, because there were no observations!

3. BJSFF Analysis on Each Portfolio

Now we can run a Black-Jensen-Scholes/Fama-French analysis. We will do this relative to the risk-neutral model (which are just the means, all already net of the risk-free rate) and relative to a 1-factor CAPM.

The following are the rate of return values portfolio by portfolio:

NPFIOS <- ncol( pfio.ret.ts )

alphas <- rep(NA,NPFIOS)
for (pfioid in 1:NPFIOS) {
    rs <- pfio.ret.ts[,pfioid]
    if (sum(!is.na(rs))==0) next

    alphas[pfioid] <- coef(lm(rs ~ rm))[1]
    cat("\tPfio ", names(pfio.ret.ts)[pfioid], ": ",
            " mean=", sprintf("%+.2f", mean(rs, na.rm=TRUE)),
            " alpha=", sprintf("%+.2f",alphas[pfioid]), "\n")

}

##  Pfio  11 :   mean= -1.96  alpha= -2.01 
##  Pfio  12 :   mean= -1.13  alpha= -1.15 
##  Pfio  13 :   mean= -1.02  alpha= -1.03 
##  Pfio  21 :   mean= -0.21  alpha= -0.26 
##  Pfio  22 :   mean= +0.05  alpha= +0.02 
##  Pfio  23 :   mean= +0.31  alpha= +0.30 
##  Pfio  31 :   mean= +1.17  alpha= +1.12 
##  Pfio  32 :   mean= +1.26  alpha= +1.23 
##  Pfio  33 :   mean= +1.96  alpha= +1.96

The meaning of our analysis becomes a whole lot clearer in a two-dimensional table:

alphas <- matrix(alphas , nrow=NQUANTILE)

rownames(alphas) <- paste0("z",1:NQUANTILE)
colnames(alphas) <- paste0("x",1:NQUANTILE)

pa <- data.frame(alphas)
pa <- cbind( pa, xavg=rowMeans(pa) )
pa <- rbind( pa, zavg=colMeans(pa) )

pa <- round(pa,3)

print(pa)
p(pa)

##         x1     x2   x3   xavg
## z1   -2.01 -0.260 1.12 -0.384
## z2   -1.15  0.025 1.23  0.033
## z3   -1.03  0.298 1.96  0.410
## zavg -1.40  0.021 1.44  0.020

In this world, CAPM alphas are always strongly increasing for x categories (the horizontal dimension). The pattern should be less clear along the z (vertical) dimension.

Q: How reliable is the z1-x3 alpha (average abnormal rate of return)?
Q: How reliable is the z1-x1 alpha (average abnormal rate of return)?

NOT FULLY TRUE This needs to be improved.