| Title: | Partial Profile Score Feature Selection in High-Dimensional Generalized Linear Interaction Models |
|---|---|
| Description: | This is an implementation of the partial profile score feature selection (PPSFS) approach to generalized linear (interaction) models. The PPSFS is highly scalable even for ultra-high-dimensional feature space. See the paper by Xu, Luo and Chen (2022, <doi:10.4310/21-SII706>). |
| Authors: | Zengchao Xu [aut, cre], Shan Luo [aut], Zehua Chen [aut] |
| Maintainer: | Zengchao Xu <[email protected]> |
| License: | GPL-3 |
| Version: | 0.1.2 |
| Built: | 2026-05-28 07:42:29 UTC |
| Source: | https://github.com/paradoxical-rhapsody/ppsfs |
ppsfs: PPSFS for main-effects.
ppsfsi: PPSFS for interaction effects.
ppsfs( x, y, family, keep = NULL, I0 = NULL, ..., ebicFlag = 1, maxK = min(NROW(x) - 1, NCOL(x) + length(I0)), verbose = FALSE ) ppsfsi( x, y, family, keep = NULL, ..., ebicFlag = 1, maxK = min(NROW(x) - 1, choose(NCOL(x), 2)), verbose = FALSE )ppsfs( x, y, family, keep = NULL, I0 = NULL, ..., ebicFlag = 1, maxK = min(NROW(x) - 1, NCOL(x) + length(I0)), verbose = FALSE ) ppsfsi( x, y, family, keep = NULL, ..., ebicFlag = 1, maxK = min(NROW(x) - 1, choose(NCOL(x), 2)), verbose = FALSE )
x |
Matrix. |
y |
Vector. |
family |
|
keep |
Initial set of features that are included in model fitting. |
I0 |
Index set of interaction effects to be identified. |
... |
Additional parameters for glm.fit. |
ebicFlag |
The procedure stops when the EBIC increases after |
maxK |
Maximum number of identified features. |
verbose |
Print the procedure path? |
That ppsfs(x, y, family="gaussian") is an implementation to
sequential lasso method proposed by Luo and Chen(2014, <doi:10/f6kfr6>).
Index set of identified features.
Z. Xu, S. Luo and Z. Chen (2022). Partial profile score feature selection in high-dimensional generalized linear interaction models. Statistics and Its Interface. doi:10.4310/21-SII706
## *************************************************** ## Identify main-effect features ## *************************************************** set.seed(2022) n <- 300 p <- 1000 x <- matrix(rnorm(n*p), n) eta <- drop( x[, 1:3] %*% runif(3, 1.0, 1.5) ) y <- eta + rnorm(n, sd=sd(eta)/5) print( A <- ppsfs(x, y, 'gaussian', verbose=TRUE) ) ## *************************************************** ## Identify interaction effects ## *************************************************** set.seed(2022) n <- 300 p <- 150 x <- matrix(rnorm(n*p), n) eta <- drop( cbind(x[, 1:3], x[, 4:6]*x[, 7:9]) %*% runif(6, 1.0, 1.5) ) y <- eta + rnorm(n, sd=sd(eta)/5) print( group <- ppsfsi(x, y, 'gaussian', verbose=TRUE) ) print( A <- ppsfs(x, y, "gaussian", I0=group, verbose=TRUE) ) print( A <- ppsfs(x, y, "gaussian", keep=c(1, "5:8"), I0=group, verbose=TRUE) )## *************************************************** ## Identify main-effect features ## *************************************************** set.seed(2022) n <- 300 p <- 1000 x <- matrix(rnorm(n*p), n) eta <- drop( x[, 1:3] %*% runif(3, 1.0, 1.5) ) y <- eta + rnorm(n, sd=sd(eta)/5) print( A <- ppsfs(x, y, 'gaussian', verbose=TRUE) ) ## *************************************************** ## Identify interaction effects ## *************************************************** set.seed(2022) n <- 300 p <- 150 x <- matrix(rnorm(n*p), n) eta <- drop( cbind(x[, 1:3], x[, 4:6]*x[, 7:9]) %*% runif(6, 1.0, 1.5) ) y <- eta + rnorm(n, sd=sd(eta)/5) print( group <- ppsfsi(x, y, 'gaussian', verbose=TRUE) ) print( A <- ppsfs(x, y, "gaussian", I0=group, verbose=TRUE) ) print( A <- ppsfs(x, y, "gaussian", keep=c(1, "5:8"), I0=group, verbose=TRUE) )
This is a simplified but more stable implementation for main-effects selection compared to ppsfs. It removes the standaridization, but adds the weights in the partial profile score.
ppsfs.fit( x, y, family, fitFun = glm.fit, ..., keep = NULL, maxK = NULL, verbose = FALSE )ppsfs.fit( x, y, family, fitFun = glm.fit, ..., keep = NULL, maxK = NULL, verbose = FALSE )
x |
Matrix. |
y |
Vector. |
family |
|
fitFun |
Fitting function, by default glm.fit. |
... |
Additional parameters for glm.fit. |
keep |
Initial index set of features that are included in model fitting. |
maxK |
Maximum number of identified features. |
verbose |
Print the feature path? |
That ppsfs(x, y, family="gaussian") is an implementation to
sequential lasso method proposed by Luo and Chen(2014, doi:10/f6kfr6).
Z. Xu, S. Luo and Z. Chen (2022). Partial profile score feature selection in high-dimensional generalized linear interaction models. Statistics and Its Interface. doi:10.4310/21-SII706
set.seed(2025) n <- 300 p <- 1000 x <- matrix(rnorm(n*p), n) eta <- drop( x[, 1:3] %*% runif(3, 1.0, 1.5) ) y <- eta + rnorm(n, sd=sd(eta)/5) print( A <- ppsfs.fit(x, y, 'gaussian', verbose=TRUE) )set.seed(2025) n <- 300 p <- 1000 x <- matrix(rnorm(n*p), n) eta <- drop( x[, 1:3] %*% runif(3, 1.0, 1.5) ) y <- eta + rnorm(n, sd=sd(eta)/5) print( A <- ppsfs.fit(x, y, 'gaussian', verbose=TRUE) )