| Title: | Doubly Regularized Matrix-Variate Regression |
|---|---|
| Description: | The doubly regularized matrix-variate regression solves a low-rank-plus-sparse structure for matrix-variate generalized linear models through a weighted combination of nuclear-norm and L1-norm. The methodology implemented by this package is described in the paper "Doubly Regularized Matrix-Variate Regression", which has been tentatively accepted for publication but does not yet have a DOI or URL. A formal citation will be added in a future update once the final publication details are available. |
| Authors: | Zengchao Xu [aut, cre, cph], Shan Luo [aut], Binyan Jiang [aut] |
| Maintainer: | Zengchao Xu <[email protected]> |
| License: | AGPL-3 |
| Version: | 0.3.2 |
| Built: | 2026-05-21 08:32:58 UTC |
| Source: | https://github.com/paradoxical-rhapsody/drrglm |
Solve the following problem
where refers to the negative-log-likelihood on under pre-specified GLM.
In linear model, the problem has the form
drrglm( x, y, family, lambda1, lambda2, C0 = NULL, tol = 0.001, maxIter = 300, verbose = FALSE )drrglm( x, y, family, lambda1, lambda2, C0 = NULL, tol = 0.001, maxIter = 300, verbose = FALSE )
x |
|
y |
Numeric vector of length |
family |
|
lambda1 |
|
lambda2 |
|
C0 |
Initialization to |
tol |
Convergence tolerance. |
maxIter |
Maximal step of iterations. |
verbose |
Print iterations? |
list(L, S, Lsv, iter, lambda1, lambda2, isConvergent).
set.seed(2025) r0 <- 13 c0 <- 17 N <- 500 x <- array(runif(r0*c0*N), c(r0, c0, N)) y <- rnorm(N) family <- "gaussian" lambda1 <- 0.15 lambda2 <- 0.04 system.time( egg <- drrglm(x, y, family, lambda1, lambda2) )set.seed(2025) r0 <- 13 c0 <- 17 N <- 500 x <- array(runif(r0*c0*N), c(r0, c0, N)) y <- rnorm(N) family <- "gaussian" lambda1 <- 0.15 lambda2 <- 0.04 system.time( egg <- drrglm(x, y, family, lambda1, lambda2) )
This data arises from a large study to examine EEG correlates of genetic predisposition to alcoholism. See details.
EEGEEG
list(alcholic, control).
The original data are fully open-access and available in UCI Machine Learning Repository (http://kdd.ics.uci.edu/databases/eeg/). It includes two groups of subjects: 77 alcoholic and 45 control. In the original study, each subject was exposed to either a single stimulus (S1) or two stimuli (S1 and S2), which were pictures chosen from the 1980 Snodgrass and Vanderwart picture set. Each subject underwent 120 trials under each condition.
Here we provide this preprocessed data of the averages of 120 trials under S1 condition, which has been studied in several literature. The dataset is structured as a list containing two arrays,
EEG$alcholic: Array of dimensions 256 x 64 x 77.
EEG$control: Array of dimensions 256 x 64 x 45.
If you use this dataset, we would be very grateful if you could cite both the original data source and our work:
Zengchao Xu, Shan Luo, and Binyan Jiang. "Doubly Regularized Matrix-Variate Regression". Submitted.
data(EEG, package="drrglm")data(EEG, package="drrglm")
Initialize via nuclear-norm
regularized regression.
ini_paras( x, y, family, maxRank = min(floor(NROW(x)/2), floor(NCOL(x)/2)), tol = 0.001, verbose = FALSE )ini_paras( x, y, family, maxRank = min(floor(NROW(x)/2), floor(NCOL(x)/2)), tol = 0.001, verbose = FALSE )
x |
Numeric array. |
y |
Numeric vector. |
family |
|
maxRank |
The maximum of rank to be detected. |
tol |
Convergence tolerance. |
verbose |
Print iterations? |
list( family, grad, lipschitz, sigma0, rankStar, rankStarHat, lambda.star, L0.star )
with ,
and .
simu_factor_model_paras: simulate (B, S).
simu_factor_model_data: simulate y.
simu_factor_model_paras(p, r, noise, seed = 2023) simu_factor_model_data(N, B, S)simu_factor_model_paras(p, r, noise, seed = 2023) simu_factor_model_data(N, B, S)
p |
Dimension of data. |
r |
Number of factors. |
noise |
Mode of
|
seed |
Random seed. |
N |
Sample size. |
B |
Matrix |
S |
Symmetric matrix |
simu_factor_model_paras: list(B, S).
simu_factor_model_data: Numeric matrix y of .
set.seed(2025) N <- 50 p <- 15 r <- 7 paras <- simu_factor_model_paras(p, r, 'diag') paras <- simu_factor_model_paras(p, r, 'rand') paras <- simu_factor_model_paras(p, r, 'tridiag') paras <- simu_factor_model_paras(p, r, 'block') DT <- simu_factor_model_data(N, paras[["B"]], paras[["S"]])set.seed(2025) N <- 50 p <- 15 r <- 7 paras <- simu_factor_model_paras(p, r, 'diag') paras <- simu_factor_model_paras(p, r, 'rand') paras <- simu_factor_model_paras(p, r, 'tridiag') paras <- simu_factor_model_paras(p, r, 'block') DT <- simu_factor_model_data(N, paras[["B"]], paras[["S"]])
Simulation: Matrix-Variate GLM
simu_reg_coefs(p1, p2, rank0, S.type, seed = 2023) simu_reg_data(family, N, C0, err.student.dof = NULL)simu_reg_coefs(p1, p2, rank0, S.type, seed = 2023) simu_reg_data(family, N, C0, err.student.dof = NULL)
p1 |
Row dimension. |
p2 |
Column dimension. |
rank0 |
Rank of |
S.type |
Type of
|
seed |
Random seed. |
family |
|
N |
Sample size. |
C0 |
Coefficient matrix. |
err.student.dof |
The degree of freedom of Student- |
simu_reg_coefs: list(L, S).
simu_reg_data: list( train=list(x, y), test=list(x, y) ).
p1 <- 10 p2 <- 9 rank0 <- 3 coefs <- simu_reg_coefs(p1, p2, rank0, "zero") coefs <- simu_reg_coefs(p1, p2, rank0, "rand") coefs <- simu_reg_coefs(p1, p2, rank0, "block") coefs <- simu_reg_coefs(p1, p2, rank0, "diag") N <- 100 S.type <- "rand" family <- "binomial" coefs <- simu_reg_coefs(p1, p2, rank0, S.type) C0 <- coefs[["L"]] + coefs[["S"]] set.seed(2025) DT <- simu_reg_data(family, N, C0) DT <- simu_reg_data("gaussian", N, C0, err.student.dof=3)p1 <- 10 p2 <- 9 rank0 <- 3 coefs <- simu_reg_coefs(p1, p2, rank0, "zero") coefs <- simu_reg_coefs(p1, p2, rank0, "rand") coefs <- simu_reg_coefs(p1, p2, rank0, "block") coefs <- simu_reg_coefs(p1, p2, rank0, "diag") N <- 100 S.type <- "rand" family <- "binomial" coefs <- simu_reg_coefs(p1, p2, rank0, S.type) C0 <- coefs[["L"]] + coefs[["S"]] set.seed(2025) DT <- simu_reg_data(family, N, C0) DT <- simu_reg_data("gaussian", N, C0, err.student.dof=3)
Simulation: Setups in Regularized Matrix Regression
simu_zhouandli2014(N, p1, p2, r0, s0, family)simu_zhouandli2014(N, p1, p2, r0, s0, family)
N |
Sample size. |
p1 |
Row dimension. |
p2 |
Column dimension. |
r0 |
Rank. |
s0 |
signal proportion. |
family |
|
list( C0, train=list(x, y), test=list(x, y) ).
Zhou Hua and Li Lexin (2014). Regularized Matrix Regression. Journal of the Royal Statistical Society (Series B). https://doi.org/10.1111/rssb.12031
N <- 100 p1 <- 64 p2 <- 64 r0 <- 1 s0 <- 0.05 family <- 'gaussian' set.seed(2026) DT <- simu_zhouandli2014(N, p1, p2, r0, s0, family)N <- 100 p1 <- 64 p2 <- 64 r0 <- 1 s0 <- 0.05 family <- 'gaussian' set.seed(2026) DT <- simu_zhouandli2014(N, p1, p2, r0, s0, family)
Doubly regularized decomposition for covariance matrix in factor model.
tune_drr_factor_model( y, lambda1.factor = 1.1^(-2:2), S.diag.penalize = FALSE, tol = 0.001, maxIter = 500 )tune_drr_factor_model( y, lambda1.factor = 1.1^(-2:2), S.diag.penalize = FALSE, tol = 0.001, maxIter = 500 )
y |
Numeric matrix of |
lambda1.factor |
Factor on |
S.diag.penalize |
Whether to penalize the diagonal elements of |
tol |
Convergence tolerance. |
maxIter |
Maximal step of iterations. |
List.
set.seed(2025) N <- 500 p <- 30 r <- 10 noise <- "diag" paras <- simu_factor_model_paras(p, r, noise) y <- simu_factor_model_data(N, paras[["B"]], paras[["S"]]) lambda1.factor <- 1.1^(-1:1) system.time( egg <- tune_drr_factor_model(y, lambda1.factor) )set.seed(2025) N <- 500 p <- 30 r <- 10 noise <- "diag" paras <- simu_factor_model_paras(p, r, noise) y <- simu_factor_model_data(N, paras[["B"]], paras[["S"]]) lambda1.factor <- 1.1^(-1:1) system.time( egg <- tune_drr_factor_model(y, lambda1.factor) )
Tune the in drr.
tune_drrglm( iniParas, x, y, lambda1.factor = 1.3^seq(-3, 3, 0.2), maxCard = 30, tol = 0.001, maxIter = 500 )tune_drrglm( iniParas, x, y, lambda1.factor = 1.3^seq(-3, 3, 0.2), maxCard = 30, tol = 0.001, maxIter = 500 )
iniParas |
Initial values returned by ini_paras. See examples below. |
x |
Numeric array. |
y |
Numeric vector. |
lambda1.factor |
Factor on initial guess |
maxCard |
The maximal cardinality of |
tol |
Convergence tolerance. |
maxIter |
Maximal step of iterations. |
List.
p1 <- 10 p2 <- 9 rank0 <- 3 S.type <- "rand" coefs <- simu_reg_coefs(p1, p2, rank0, S.type) N <- 500 family <- "gaussian" L0 <- coefs[["L"]] S0 <- coefs[["S"]] C0 <- L0 + S0 DT <- simu_reg_data(family, N, C0) x <- DT[["train"]][["x"]] y <- DT[["train"]][["y"]] lambda1.factor <- 1.1^(0:1) maxRank <- 5 system.time( iniParas <- ini_paras(x, y, family, maxRank) ) system.time( drrObj <- tune_drrglm(iniParas, x, y, lambda1.factor) )p1 <- 10 p2 <- 9 rank0 <- 3 S.type <- "rand" coefs <- simu_reg_coefs(p1, p2, rank0, S.type) N <- 500 family <- "gaussian" L0 <- coefs[["L"]] S0 <- coefs[["S"]] C0 <- L0 + S0 DT <- simu_reg_data(family, N, C0) x <- DT[["train"]][["x"]] y <- DT[["train"]][["y"]] lambda1.factor <- 1.1^(0:1) maxRank <- 5 system.time( iniParas <- ini_paras(x, y, family, maxRank) ) system.time( drrObj <- tune_drrglm(iniParas, x, y, lambda1.factor) )