Title: | Miniature Logistic-Normal Multinomial Models |
---|---|
Description: | Logistic-normal Multinomial (LNM) models are common in problems with multivariate count data. This package gives a simple implementation with a 30 line 'Stan' script. This lightweight implementation makes it an easy starting point for other projects, in particular for downstream tasks that require analysis of "compositional" data. It can be applied whenever a multinomial probability parameter is thought to depend linearly on inputs in a transformed, log ratio space. Additional utilities make it easy to inspect, create predictions, and draw samples using the fitted models. More about the LNM can be found in Xia et al. (2013) "A Logistic Normal Multinomial Regression Model for Microbiome Compositional Data Analysis" <doi:10.1111/biom.12079> and Sankaran and Holmes (2023) "Generative Models: An Interdisciplinary Perspective" <doi:10.1146/annurev-statistics-033121-110134>. |
Authors: | Kris Sankaran [aut, cre] |
Maintainer: | Kris Sankaran <[email protected]> |
License: | CC0 |
Version: | 0.1.0 |
Built: | 2025-02-11 05:45:09 UTC |
Source: | https://github.com/krisrs1128/minilnm |
Helper function for printing ANSI in Rmarkdown output. Use this at the start of your Rmarkdown files to include colors in the printed object names in the final compiled output.
ansi_aware_handler(x, options)
ansi_aware_handler(x, options)
x |
A character vector potentially including ANSI. |
options |
Unused placeholder argument. |
Taken from the post at
https://blog.djnavarro.net/posts/2021-04-18_pretty-little-clis/
A string with HTML reformatted to ensure colors appear in printed code blocks in rmarkdown output.
knitr::knit_hooks$set(output = ansi_aware_handler) options(crayon.enabled = TRUE)
knitr::knit_hooks$set(output = ansi_aware_handler) options(crayon.enabled = TRUE)
Average the samples for the beta parameter from the VB posterior mean. This
is used to get predicted compositions when using predict
on an lnm model.
beta_mean(fit)
beta_mean(fit)
fit |
An object of class |
A matrix whose rows are predictors and columns are outcomes in the beta parameter for the LNM model.
example_data <- lnm_data(N = 50, K = 10) xy <- dplyr::bind_cols(example_data[c("X", "y")]) fit <- lnm( starts_with("y") ~ starts_with("x"), xy, iter = 25, output_samples = 25 ) beta_mean(fit)
example_data <- lnm_data(N = 50, K = 10) xy <- dplyr::bind_cols(example_data[c("X", "y")]) fit <- lnm( starts_with("y") ~ starts_with("x"), xy, iter = 25, output_samples = 25 ) beta_mean(fit)
Return multiple samples for the beta parameter from the VB posterior mean.
This is used to simulate new compositions when using sample
on an lnm
model.
beta_samples(fit, size = 1)
beta_samples(fit, size = 1)
fit |
An object of class |
size |
The number of draws from the posterior to return. |
A matrix whose rows are predictors and columns are outcomes in the beta parameter for the LNM model.
example_data <- lnm_data(N = 50, K = 10) xy <- dplyr::bind_cols(example_data[c("X", "y")]) fit <- lnm( starts_with("y") ~ starts_with("x"), xy, iter = 25, output_samples = 25 ) beta_samples(fit, size = 2)
example_data <- lnm_data(N = 50, K = 10) xy <- dplyr::bind_cols(example_data[c("X", "y")]) fit <- lnm( starts_with("y") ~ starts_with("x"), xy, iter = 25, output_samples = 25 ) beta_samples(fit, size = 2)
This function fits a logistic normal multinomial (LNM) model to the data using R's formula interface. The LNM model is a generalization of the multinomial logistic regression model, allowing for correlated responses within each category of the response variable. It can be used to learn the relationship between experimental/environmental factors and community composition. It is a statistical model that estimates the probabilities of different outcomes in a multinomial distribution, given a set of covariates. The LNM model assumes that a log-ratio of the outcome probabilities follow a multivariate normal distribution. By fitting the LNM model to observed data, we can infer the effects of the covariates on the outcome compositions.
lnm(formula, data, sigma_b = 2, l1 = 10, l2 = 10, ...)
lnm(formula, data, sigma_b = 2, l1 = 10, l2 = 10, ...)
formula |
A formula specifying the model structure. |
data |
A data frame containing the variables specified in the formula. |
sigma_b |
The prior standard deviation of the beta coefficients in the LNM model. See the 'Stan' code definition in inst/stan/lnm.stan for the full model specification. |
l1 |
The first inverse gamma hyperprior parameter for sigmas_mu. |
l2 |
The first inverse gamma hyperprior parameter for sigmas_mu. |
... |
Additional arguments to be passed to the underlying vb() call from 'rstan'. |
An object of class "lnm" representing the fitted LNM model.
example_data <- lnm_data(N = 50, K = 10) xy <- dplyr::bind_cols(example_data[c("X", "y")]) fit <- lnm( starts_with("y") ~ starts_with("x"), xy, iter = 25, output_samples = 25 )
example_data <- lnm_data(N = 50, K = 10) xy <- dplyr::bind_cols(example_data[c("X", "y")]) fit <- lnm( starts_with("y") ~ starts_with("x"), xy, iter = 25, output_samples = 25 )
Simulates data from a Logistic Normal Multinomial Model.
lnm_data(N = 100, D = 5, K = 10)
lnm_data(N = 100, D = 5, K = 10)
N |
The number of samples in the output data. |
D |
The number of covariates, each of which can influence the response composition vector (e.g., the timepoint or disease status). |
K |
The number of output dimensions (e.g., number of taxa). |
A list with the following components:
An N x D matrix of covariates.
The N x K simulated samples.
The D x K relationship between covariates and outputs.
lnm_data(5, 3, 3)
lnm_data(5, 3, 3)
The Logistic Normal Multinomial (LNM) model is used to learn the relationship between experimental/environmental factors and community composition. It is a statistical model that estimates the probabilities of different outcomes in a multinomial distribution, given a set of covariates. The LNM model assumes that a log-ratio of the outcome probabilities follow a multivariate normal distribution. By fitting the LNM model to observed data, we can infer the effects of the covariates on the outcome compositions.
This class combines all information into three slots:
estimate
The fitted logistic normal multinomial model, with parameter B relating covariates to outcome compositions.
template
The data used to estimate the parameters in the estimate slot.
formula
The R formula representation of the relationship between output compositions and input variables.
This function applies the inverse logistic function to a vector, which maps the values of the vector to the range (0, 1).
phi_inverse(mu)
phi_inverse(mu)
mu |
A numeric vector to transform using an inverse log ratio transformation. |
A numeric vector with values mapped to the range (0, 1) and a reference coordinate added.
phi_inverse(c(-5, 0, 5))
phi_inverse(c(-5, 0, 5))
Given an input dataset, predict the output composition. Specifically, this
outputs , for the inverse log ratio transformation
and fitted covariate matrix
.
## S4 method for signature 'lnm' predict(object, newdata = NULL, ...)
## S4 method for signature 'lnm' predict(object, newdata = NULL, ...)
object |
An object of class lnm with fitted parameters |
newdata |
New samples on which to form predictions. Defaults to NULL, in which case predictions are made at the same design points as those used during the original training. |
... |
Additional keyword arguments, for consistency with R's predict generic (never used). |
A matrix with predictions along rows and outcomes along columns. Rows sum up to one.
example_data <- lnm_data(N = 50, K = 10) xy <- dplyr::bind_cols(example_data[c("X", "y")]) fit <- lnm( starts_with("y") ~ starts_with("x"), xy, iter = 25, output_samples = 25 ) head(predict(fit))
example_data <- lnm_data(N = 50, K = 10) xy <- dplyr::bind_cols(example_data[c("X", "y")]) fit <- lnm( starts_with("y") ~ starts_with("x"), xy, iter = 25, output_samples = 25 ) head(predict(fit))
This is a helper function to form the design matrix for an LNM regression starting from a fitted model's formula object. It is an analog of model.matrix for the multiresponse setting.
prepare_newdata(fit, newdata = NULL)
prepare_newdata(fit, newdata = NULL)
fit |
An object of class |
newdata |
A data.frame containing variables in the formula definition of the fit, but which hasn't been converted into the matrix format needed for internal prediction. |
A matrix containing the design matrix that can be multiplied with the fitted Beta parameter to get fitted compositions.
example_data <- lnm_data(N = 10, K = 5) xy <- dplyr::bind_cols(example_data[c("X", "y")]) fit <- lnm( starts_with("y") ~ starts_with("x"), xy, iter = 5, output_samples = 5 ) prepare_newdata(fit, example_data[["X"]])
example_data <- lnm_data(N = 10, K = 5) xy <- dplyr::bind_cols(example_data[c("X", "y")]) fit <- lnm( starts_with("y") ~ starts_with("x"), xy, iter = 5, output_samples = 5 ) prepare_newdata(fit, example_data[["X"]])
Given an input dataset, sample compositions that are consistent with the
input. Specifically, this samples from a multinomial with mean
. The default depth is 5e4. Modify the "depth" parameter
to change this.
## S4 method for signature 'lnm' sample(x, size = 1, depth = 50000, newdata = NULL, ...)
## S4 method for signature 'lnm' sample(x, size = 1, depth = 50000, newdata = NULL, ...)
x |
An object of class lnm with fitted parameters |
size |
The number of samples to generate. |
depth |
The depth to use when sampling the multinomial for each simulated element. |
newdata |
New samples on which to form predictions. Defaults to NULL, in which case predictions are made at the same design points as those used during the original training. |
... |
Additional keyword arguments, for consistency with R's predict generic (never used). |
A matrix of dimension size
x n_outcomes
, where each row
represents one sample from the posterior predictive of the fitted
logistic-normal multinomial model. Each row sums up to the depth argument,
which defaults to 5e4.
example_data <- lnm_data(N = 50, K = 10) xy <- dplyr::bind_cols(example_data[c("X", "y")]) fit <- lnm( starts_with("y") ~ starts_with("x"), xy, iter = 25, output_samples = 25 ) head(sample(fit))
example_data <- lnm_data(N = 50, K = 10) xy <- dplyr::bind_cols(example_data[c("X", "y")]) fit <- lnm( starts_with("y") ~ starts_with("x"), xy, iter = 25, output_samples = 25 ) head(sample(fit))