Demo 1 - slzhang-fd/mvreprobit GitHub Wiki

Four-process 2-level (1 random effect) probit model

In this simulation, we fit a four-variate, two-level (1 group random effect) probit model to simulated data using the mvreprobit_1re_Gibbs() function in the mvreprobit package.

We index the multivariate responses by $r$, time points by $t$ and individuals (level 2 units) by $i$. For each binary response $y_{rti}$, there exists an underlying continuous variable $y_{rti}^*$ such that

$$ y_{rti}=1 \Leftrightarrow y_{rti}^* > 0, $$

where $r=1,\ldots,J,t=1,\ldots,T_i,i=1,\ldots,n$. The underlying variables $\boldsymbol y_{ti}^* = ( y_{1ti}^* , \ldots , y_{Jti}^* )^\top$ satisfy

$$ \boldsymbol y_{ti}^* =\mathbf B \boldsymbol x_{ti} + \boldsymbol u_i + \boldsymbol e_{ti}, $$

where the $\mathbf{B}=(\boldsymbol \beta_1,\ldots,\boldsymbol\beta_J)$ are coefficient parameters, $\boldsymbol u_i\sim N(0,\Sigma_u)$ is a J-dimensional vector of individual random effects, $\boldsymbol e_{ti}\sim N(0,\Sigma_e)$ is a J-dimensional vector of time-varying error term. For identification, the diagonal elements of $\Sigma_e$ are set to one.

The model above is the same as the model (1) in the paper

Steele, F., Zhang, S., Grundy, E., and Burchardt, T. (2022). Longitudinal analysis of exchanges of support between parents and children in the UK.

###################################################
## Install the `mvreprobit` R package and its dependant
## when running for the first time

required_packages <- c("remotes", "truncnorm", "mvtnorm", "MCMCpack", "matrixcalc")
new_packages <- required_packages[!(required_packages %in% installed.packages()[,"Package"])]
if(length(new_packages)) install.packages(new_packages)

## install `mvreprobit` from Github
remotes::install_github("slzhang-fd/mvreprobit")

## load required packages
library(truncnorm)
library(mvtnorm)
library(mvreprobit)

###
## clean workspace
rm(list = ls())

#####################################################
## 1. Four-variate, two-level hierarchical probit model
## (1 level 2 random effects)
##
## Simulate data

# 4 response variables
K <- 4
# N: number of level 2 units (individuals)
N <- 1000
# NN: number of observations
# 4 time points for each individual
NN <- 4000
# p: number of explanatory variables (including constant)
p <- 5

## set true parameters
set.seed(123)
coeffs <- matrix(rnorm(K*p),p,K)
x_covs <- cbind(rep(1,NN),matrix(rnorm(NN*(p-1)),NN,(p-1)))

# Residual covariance matrix
Sigma_e <- matrix(0.2, K, K)
diag(Sigma_e) <- 1
B_e <- t(chol(Sigma_e))

# Level 2 random effects covariance matrix
Sigma_u <- matrix(0.3, K,K)
diag(Sigma_u) <- runif(K, 1,2)
B_u <- t(chol(Sigma_u))

# Level 2 identifier
i_ind <- rep(1:N, each=4)

## generate random variables
U_all <- rmvnorm(N, sigma = B_u %*% t(B_u))
Y_star <- x_covs %*% coeffs + U_all[i_ind,] + rmvnorm(NN, sigma = Sigma_e)

## generate response matrix
Y <- 1 * (Y_star>0)

## Fit 4-process 1 random effect model
## Run for 5,000 iterations (including burn-in)
##
## Note: for running K-process model (K>1), the MH step
## size cor_step_size needs to be specified.
## Please choose a large step size and adjust the rejection rate
## output between 0.7~0.9 when running the program (output every 100 steps)
res_1re_4process <-
  mvreprobit_1re_Gibbs(Y, x_covs=x_covs, i_ind=i_ind,
                       max_steps = 5000, cor_step_size = 0.06)

#####################################################
## Plot Gibbs posterior samples credible interval
## versus simulation true
res_compare_plot(res_1re_4process$coeffs_all, coeffs, burn_in = 1000)
res_compare_plot(res_1re_4process$Sigma_e_all, Sigma_e, burn_in = 1000)
res_compare_plot(res_1re_4process$Sigma_u_all, Sigma_u, burn_in = 1000)

## Obtain estimated posterior means and credible intervals
## (based on sample size of 4,000 as burn-in is 1,000)
## Compare each with true values used in simulation (in coeffs, Sigma_e and Sigma_u)
##
## SDs and 95% credible intervals are also stored in res_summary_1re
res_summary_1re <- res_summary_1re_Gibbs(res_1re_4process, x_covs, burnin = 1000)

round(coeffs, 3)
round(res_summary_1re$coeffs, 3)

round(Sigma_e,2)
round(res_summary_1re$Sigma_e, 2)

round(Sigma_u,2)
round(res_summary_1re$Sigma_u, 2)

Results after 5,000 iterations of Gibbs sampling (first 1,000 as burnin)

Time spent: 1 min 8 sec.

## Posterior means are used as estimated values.

coefficients:
True:
       [,1]   [,2]   [,3]   [,4]
[1,] -0.560  1.715  1.224  1.787
[2,] -0.230  0.461  0.360  0.498
[3,]  1.559 -1.265  0.401 -1.967
[4,]  0.071 -0.687  0.111  0.701
[5,]  0.129 -0.446 -0.556 -0.473
Estimated:
       [,1]   [,2]   [,3]   [,4]
[1,] -0.607  1.678  1.119  1.782
[2,] -0.202  0.387  0.333  0.534
[3,]  1.506 -1.316  0.418 -2.099
[4,]  0.015 -0.697  0.141  0.739
[5,]  0.176 -0.470 -0.521 -0.492

Sigma.e (covariance matrix for e_ti, diagnal one for identifiability)
True:
     [,1] [,2] [,3] [,4]
[1,]  1.0  0.2  0.2  0.2
[2,]  0.2  1.0  0.2  0.2
[3,]  0.2  0.2  1.0  0.2
[4,]  0.2  0.2  0.2  1.0
Estimated
     [,1] [,2] [,3] [,4]
[1,] 1.00 0.31 0.23 0.19
[2,] 0.31 1.00 0.24 0.11
[3,] 0.23 0.24 1.00 0.19
[4,] 0.19 0.11 0.19 1.00

Sigma.u (covariance matrix for u_ij, individual-level random effect)
True:
     [,1] [,2] [,3] [,4]
[1,] 1.72 0.30 0.30 0.30
[2,] 0.30 1.24 0.30 0.30
[3,] 0.30 0.30 1.17 0.30
[4,] 0.30 0.30 0.30 1.78
Estimated
     [,1] [,2] [,3] [,4]
[1,] 1.63 0.17 0.33 0.34
[2,] 0.17 1.18 0.29 0.34
[3,] 0.33 0.29 1.07 0.47
[4,] 0.34 0.34 0.47 2.26

Plot Gibbs posterior credible intervals against simulation true values

coeffcients