Title: | Bayesian Multivariate Analysis of Summary Statistics |
---|---|
Description: | Multivariate tool for analyzing genome-wide association study results in the form of univariate summary statistics. The goal of 'bmass' is to comprehensively test all possible multivariate models given the phenotypes and datasets provided. Multivariate models are determined by assigning each phenotype to being either Unassociated (U), Directly associated (D) or Indirectly associated (I) with the genetic variant of interest. Test results for each model are presented in the form of Bayes factors, thereby allowing direct comparisons between models. The underlying framework implemented here is based on the modeling developed in "A Unified Framework for Association Analysis with Multiple Related Phenotypes", M. Stephens (2013) <doi:10.1371/journal.pone.0065245>. |
Authors: | Michael Turchin [aut, cre], Matthew Stephens [aut], Peter Carbonetto [ctb] |
Maintainer: | Michael Turchin <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.0.3 |
Built: | 2025-01-26 04:40:34 UTC |
Source: | https://github.com/mturchin20/bmass |
bmass
)Run bmass
on a set of phenotypes that each have
univariate GWAS statistics on the same set of SNPs
bmass(DataSources, GWASsnps = NULL, SNPMarginalUnivariateThreshold = 1e-06, SNPMarginalMultivariateThreshold = 1e-06, GWASThreshFlag = TRUE, GWASThreshValue = 5e-08, NminThreshold = 0, PrintMergedData = FALSE, PrintProgress = FALSE, ...)
bmass(DataSources, GWASsnps = NULL, SNPMarginalUnivariateThreshold = 1e-06, SNPMarginalMultivariateThreshold = 1e-06, GWASThreshFlag = TRUE, GWASThreshValue = 5e-08, NminThreshold = 0, PrintMergedData = FALSE, PrintProgress = FALSE, ...)
DataSources |
A string indicating the variable names of the input datafiles and phenotypes. No default value. |
GWASsnps |
A data.table containing rows of SNPs that were
univariate genome-wide significant in the phenotypes being used for
analysis; |
SNPMarginalUnivariateThreshold |
A numerical value indicating
the univariate p-value threshold to use when collecting marginally
significant SNPs for final |
SNPMarginalMultivariateThreshold |
A numerical value
indicating the basic multivariate p-value threshold to use when
collecting marginally significant SNPs for final |
GWASThreshFlag |
A logical |
GWASThreshValue |
A numerical value indicating the univariate
p-value threshold to use in conjunction with the |
NminThreshold |
A numerical value that indicates a sample size
threshold to use where SNPs below which are removed. Default is
|
PrintMergedData |
A logical |
PrintProgress |
A logical |
... |
Additional optional arguments. |
A list containing model, SNP, and posterior information for
both the previously significant univariate SNPs (PreviousSNPs
)
and the newly significant multivariate SNPs (NewSNPs
). For a
full breakdown of the bmass
output list structure, please see
the associated vignettes.
bmass(c("HDL","LDL","TG","TC"), GWASsnps, NminThreshold = 50000)
bmass(c("HDL","LDL","TG","TC"), GWASsnps, GWASThreshValue = 1e-8,
NminThreshold = 50000, PrintProgress = TRUE)
bmass(c("HDL", "LDL", "TG", "TC"), GWASsnps, GWASThreshFlag = FALSE,
SNPMarginalUnivariateThreshold = 1e-4,
SNPMarginalMultivariateThreshold = 1e-4,
PrintMergedData = TRUE)
bmassOutput <- bmass(c("HDL","LDL","TG","TC"),
GWASsnps, NminThreshold = 50000)
Phenotypes <- c("bmass_SimulatedData1", "bmass_SimulatedData2") bmassOutput <- bmass(Phenotypes, bmass_SimulatedSigSNPs) summary(bmassOutput) bmassOutput$NewSNPs$SNPs
Phenotypes <- c("bmass_SimulatedData1", "bmass_SimulatedData2") bmassOutput <- bmass(Phenotypes, bmass_SimulatedSigSNPs) summary(bmassOutput) bmassOutput$NewSNPs$SNPs
A manually created sample dataset for use in Roxygen2 documents and vignettes.
A data frame with 11 rows and 9 variables:
chromosome
basepair position
rsID# or other identifier
Minor Allele Frequency
reference allele
alternative allele
direction of association effect size, + or -
p-Value of GWAS association
sample size
Manually created
A manually created sample dataset for use in Roxygen2 documents and vignettes.
A data frame with 11 rows and 9 variables:
chromosome
basepair position
rsID# or other identifier
Minor Allele Frequency
reference allele
alternative allele
direction of association effect size, + or -
p-Value of GWAS association
sample size
Manually created
A manually created list of GWAS significant SNPs to be used in conjunction with 'bmass_SimulatedData1' and 'bmass_SimulatedData2'.
A data frame with 2 rows and 2 variables:
chromosome
basepair position
Manually created
A manually created sample dataset for use in unit tests.
A data frame with 11 rows and 9 variables:
chromosome
basepair position
rsID# or other identifier
Minor Allele Frequency
reference allele
alternative allele
direction of association effect size, + or -
p-Value of GWAS association
sample size
Manually created
A manually created sample dataset for use in unit tests.
A data frame with 11 rows and 9 variables:
chromosome
basepair position
rsID# or other identifier
Minor Allele Frequency
reference allele
alternative allele
direction of association effect size, + or -
p-Value of GWAS association
sample size
Manually created
A manually created list of GWAS significant SNPs to be used in conjunction with 'bmass_TestData1' and 'bmass_TestData2'.
A data frame with 2 rows and 2 variables:
chromosome
basepair position
Manually created
Get marginal posteriors for how much every individual phenotype belongs to categories {U,D,I} across each SNP
GetMarginalPosteriors(DataSources, ListSNPs, Models, LogFile)
GetMarginalPosteriors(DataSources, ListSNPs, Models, LogFile)
DataSources |
A string indicating the variable names of the input datafiles and phenotypes. |
ListSNPs |
A list produced from running |
Models |
A matrix describing the models being explored
(default output from running |
LogFile |
A matrix of string outputs for function logging
purposes (default output from running |
A list containing three matrices of SNPs x Phenotypes
marginal posteriors for each category {U,D,I};
this list is appended to the input ListSNPs as a new object,
Marginals
(the full returned object is a list containing the
input ListSNPs and the input LogFile).
Phenotypes <- c("bmass_SimulatedData1", "bmass_SimulatedData2") bmassOutput <- bmass(Phenotypes, bmass_SimulatedSigSNPs) bmassOutput[c("PreviousSNPs", "LogFile")] <- GetMarginalPosteriors(Phenotypes, bmassOutput$PreviousSNPs, bmassOutput$Models, bmassOutput$LogFile) bmassOutput$PreviousSNPs$Marginals
Phenotypes <- c("bmass_SimulatedData1", "bmass_SimulatedData2") bmassOutput <- bmass(Phenotypes, bmass_SimulatedSigSNPs) bmassOutput[c("PreviousSNPs", "LogFile")] <- GetMarginalPosteriors(Phenotypes, bmassOutput$PreviousSNPs, bmassOutput$Models, bmassOutput$LogFile) bmassOutput$PreviousSNPs$Marginals
Creates a matrix containing the model descriptions and their associated priors.
GetModelPriorMatrix(DataSources, Models, ModelPriors, LogFile, SigmaAlphas = c(0.005, 0.0075, 0.01, 0.015, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.15))
GetModelPriorMatrix(DataSources, Models, ModelPriors, LogFile, SigmaAlphas = c(0.005, 0.0075, 0.01, 0.015, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.15))
DataSources |
A string indicating the variable names of the input datafiles and phenotypes. |
Models |
A matrix describing the models being explored
(default output from running |
ModelPriors |
A vector containing the priors on each model
across each tranche of sigma alpha (default output from running
|
LogFile |
A matrix of string outputs for function logging
purposes (default output from running |
SigmaAlphas |
A vector containing the different values traversed for this 'effect size controlling' hyperparameter (see "Prior on Sigma_Alpha" in Stephens 2013 PLoS ONE, https://doi.org/10.1371/journal.pone.0065245). |
A matrix containing the original description of each model sort by prior, each model's trained prior, the cummulative prior distribution, and the model's original order position.
Phenotypes <- c("bmass_SimulatedData1", "bmass_SimulatedData2") bmassOutput <- bmass(Phenotypes,bmass_SimulatedSigSNPs) bmassOutput[c("ModelPriorMatrix", "LogFile")] <- GetModelPriorMatrix(Phenotypes, bmassOutput$Models, bmassOutput$ModelPriors, bmassOutput$LogFile) head(bmassOutput$ModelPriorMatrix)
Phenotypes <- c("bmass_SimulatedData1", "bmass_SimulatedData2") bmassOutput <- bmass(Phenotypes,bmass_SimulatedSigSNPs) bmassOutput[c("ModelPriorMatrix", "LogFile")] <- GetModelPriorMatrix(Phenotypes, bmassOutput$Models, bmassOutput$ModelPriors, bmassOutput$LogFile) head(bmassOutput$ModelPriorMatrix)
Get a summary of the top models per SNP across all multivariate {U,D,I} combinations based on posterior probabilities.
GetTopModelsPerSNPViaPosteriors(DataSources, ListSNPs, ModelPriorMatrix, LogFile)
GetTopModelsPerSNPViaPosteriors(DataSources, ListSNPs, ModelPriorMatrix, LogFile)
DataSources |
A string indicating the variable names of the input datafiles and phenotypes. |
ListSNPs |
A list produced from running |
ModelPriorMatrix |
A matrix detailing the models being
explored and their associated priors (obtained by running
|
LogFile |
A matrix of string outputs for function logging
purposes (default output from running |
A matrix containing each model that was a SNP's top model
at least once, along with related information; this matrix is
appended to the input ListSNPs as a new object, TopModels
(the full returned object is a list containing the input ListSNPs and
the input LogFile).
Phenotypes <- c("bmass_SimulatedData1", "bmass_SimulatedData2") bmassOutput <- bmass(Phenotypes, bmass_SimulatedSigSNPs) bmassOutput[c("ModelPriorMatrix", "LogFile")] <- GetModelPriorMatrix(Phenotypes, bmassOutput$Models, bmassOutput$ModelPriors, bmassOutput$LogFile) bmassOutput[c("PreviousSNPs", "LogFile")] <- GetTopModelsPerSNPViaPosteriors(Phenotypes, bmassOutput$PreviousSNPs, bmassOutput$ModelPriorMatrix, bmassOutput$LogFile) head(bmassOutput$PreviousSNPs$TopModels)
Phenotypes <- c("bmass_SimulatedData1", "bmass_SimulatedData2") bmassOutput <- bmass(Phenotypes, bmass_SimulatedSigSNPs) bmassOutput[c("ModelPriorMatrix", "LogFile")] <- GetModelPriorMatrix(Phenotypes, bmassOutput$Models, bmassOutput$ModelPriors, bmassOutput$LogFile) bmassOutput[c("PreviousSNPs", "LogFile")] <- GetTopModelsPerSNPViaPosteriors(Phenotypes, bmassOutput$PreviousSNPs, bmassOutput$ModelPriorMatrix, bmassOutput$LogFile) head(bmassOutput$PreviousSNPs$TopModels)
A list of the univariate GWAS significant SNPs from the GlobalLipids2013 dataset to be used in the second introductory bmass vignette.
A data frame with 157 rows and 2 variables:
chromosome
basepair position
Supplementary Tables 2 and 3 from https://doi.org/10.1038/ng.2797.