ROC and Precision-Recall Curves using Bayesian MCMC estimates generalized

This function generates ROC and Precision-Recall curves after fitting a Bayesian logit or probit regression. For fast calculation for from an "rjags" object use mcmcRocPrc

mcmcRocPrcGen(
  modelmatrix,
  mcmcout,
  modelframe,
  curves = FALSE,
  link = "logit",
  fullsims = FALSE
)

Arguments

modelmatrix	model matrix, including intercept (if the intercept is among the parameters estimated in the model). Create with model.matrix(formula, data). Note: the order of columns in the model matrix must correspond to the order of columns in the matrix of posterior draws in the `mcmcout` argument. See the `mcmcout` argument for more and Beger (2016) for background.
mcmcout	posterior distributions of all logit coefficients, in matrix form. This can be created from rstan, MCMCpack, R2jags, etc. and transformed into a matrix using the function as.mcmc() from the coda package for `jags` class objects, as.matrix() from base R for `mcmc`, `mcmc.list`, `stanreg`, and `stanfit` class objects, and `object$sims.matrix` for `bugs` class objects. Note: the order of columns in this matrix must correspond to the order of columns in the model matrix. One can do this by examining the posterior distribution matrix and sorting the variables in the order of this matrix when creating the model matrix. A useful function for sorting column names containing both characters and numbers as you create the matrix of posterior distributions is `mixedsort()` from the gtools package.
modelframe	model frame in matrix form. Can be created using as.matrix(model.frame(formula, data))
curves	logical indicator of whether or not to return values to plot the ROC or Precision-Recall curves. If set to `FALSE` (default), results are returned as a list without the extra values.
link	type of generalized linear model; a character vector set to `"logit"` (default) or `"probit"`.
fullsims	logical indicator of whether full object (based on all MCMC draws rather than their average) will be returned. Default is `FALSE`. Note: If `TRUE` is chosen, the function takes notably longer to execute.

Value

This function returns a list with 4 elements:

area_under_roc: area under ROC curve (scalar)
area_under_prc: area under precision-recall curve (scalar)
prc_dat: data to plot precision-recall curve (data frame)
roc_dat: data to plot ROC curve (data frame)

Details

This function generates ROC and precision-recall curves after fitting a Bayesian logit or probit model.

References

Beger, Andreas. 2016. “Precision-Recall Curves.” Available at SSRN: https://ssrn.com/Abstract=2765419. http://dx.doi.org/10.2139/ssrn.2765419.

Examples

.old_wd <- setwd(tempdir())
# \donttest{
if (interactive()) {
# simulating data

set.seed(123456)
b0 <- 0.2 # true value for the intercept
b1 <- 0.5 # true value for first beta
b2 <- 0.7 # true value for second beta
n <- 500 # sample size
X1 <- runif(n, -1, 1)
X2 <- runif(n, -1, 1)
Z <- b0 + b1 * X1 + b2 * X2
pr <- 1 / (1 + exp(-Z)) # inv logit function
Y <- rbinom(n, 1, pr) 
df <- data.frame(cbind(X1, X2, Y))

# formatting the data for jags
datjags <- as.list(df)
datjags$N <- length(datjags$Y)

# creating jags model
model <- function()  {
  
  for(i in 1:N){
    Y[i] ~ dbern(p[i])  ## Bernoulli distribution of y_i
    logit(p[i]) <- mu[i]    ## Logit link function
    mu[i] <- b[1] + 
      b[2] * X1[i] + 
      b[3] * X2[i]
  }
  
  for(j in 1:3){
    b[j] ~ dnorm(0, 0.001) ## Use a coefficient vector for simplicity
  }
  
}

params <- c("b")
inits1 <- list("b" = rep(0, 3))
inits2 <- list("b" = rep(0, 3))
inits <- list(inits1, inits2)

## fitting the model with R2jags
set.seed(123)
fit <- R2jags::jags(data = datjags, inits = inits, 
                    parameters.to.save = params, n.chains = 2, n.iter = 2000, 
                    n.burnin = 1000, model.file = model)

# processing the data
mm <- model.matrix(Y ~ X1 + X2, data = df)
xframe <- as.matrix(model.frame(Y ~ X1 + X2, data = df))
mcmc <- coda::as.mcmc(fit)
mcmc_mat <- as.matrix(mcmc)[, 1:ncol(xframe)]

# using mcmcRocPrcGen
fit_sum <- mcmcRocPrcGen(modelmatrix = mm,
                      modelframe = xframe,
                      mcmcout = mcmc_mat,
                      curves = TRUE,
                      fullsims = FALSE)
}
# }

setwd(.old_wd)