Assistant Professor
Department of Statistics
University of Wisconsin
Madison, Wisconsin

In microbiome and genomic studies, the regression of compositional data has been a crucial tool for identifying microbial taxa or genes that are associated with clinical phenotypes. To account for the variation in sequencing depth and high dimensionality of read counts, a high-dimensional log-contrast model is often used where log compositions of read counts are used as covariates.

In this talk, we first introduce a method for constructing confidence intervals and p-values for such regression. Our method is based on the construction of a de-biased version of the L1 regularized constrained estimator. Theoretical justifications and simulation results are provided to show the validity of the confidence intervals. We also extend our method to include multiple linear constraints imposed on subcompositions such that subcompositional coherence property is obtained. Simulation results show that imposing linear constraints can lead to shorter confidence intervals. The method is applied to a microbiome study that relates body mass index to human gut microbiome composition.

Then, we address the issues of zero read counts and the high variability in the sequencing reads. We introduce a surprisingly simple, interpretable, and efficient method for the estimation of compositional data regression through the lens of a novel high-dimensional log-error-in-variable regression model. The proposed method provides both corrections on sequencing data with possible overdispersion and simultaneously avoids any subjective imputation of zero read counts. We provide theoretical justification with matching upper and lower bounds for the estimation error. We also consider a general log-error-in-variable regression model and the corresponding method to accommodate broader situations. The merit of the procedure is illustrated through real data analysis and simulation studies.

plate with fork and knife, books, microscope and test tubes
Sponsor(s)
Medicine: Biostatistics
Speaker(s)
Pixu Shi, Ph.D.
Audience
All ( Open to the public )