phylosmith is a supplementary package to build on the phyloseq-objecy from the phyloseq package. Phyloseq-objects are a great data-standard for microbiome and gene-expression data, this package is aimed to provied easy data-wrangling and visualization.
A lot of these functions are just to make “data-wrangling” easier for investigators. Others will implement complex routines in a, hopefully, efficient and concise manner. I have also made functions to make figures for quick examination of data, but they may or may not be suitable for publication, as some may require parameter optimization.
For some Linux systems you may need to install the following two programs through your terminal.
Ubuntu example:
sudo apt-get install libmysqlclient-dev libgdal-dev libudunits2-dev
These programs are required by some dependancies and may not come in your default OS distribution.
if you are working on WINDOWS you likely need to install the CRAN program Rtools.When prompted, select add rtools to system PATH
.
phylosmith depends on the usage of the phyloseq package released by Dr. Paul McMurdie. The pacakge is maintained on BioConductor, and can be installed through R using the following commands:
if(!requireNamespace("BiocManager", quietly = TRUE)){
install.packages("BiocManager")
}
BiocManager::install("phyloseq")
Additionally, the package imports a number of other packages to use their advanced functions. These packages may install with the phylosmith installation, but it is always best to install independently.
install.packages(c("devtools", RcppEigen", "RcppParallel", "Rtsne", "ggforce", "units"))
The package is hosted on Github, and can be installed through R with:
devtools::install_github('schuyler-smith/phylosmith')
library(phylosmith)
Call | Description |
---|---|
conglomerate_samples | combines samples based on common factor within sample_data |
conglomerate_taxa | combines taxa that have same classification |
melt_phyloseq | melts a phyloseq object into a data.table |
merge_treatments | combines multiple columns in meta-data into a new column |
set_sample_order | sets the order of the samples of a phyloseq object |
set_treatment_levels | sets the order of the factors in a sample_data column |
taxa_extract | creates a phyloseq-object containing only taxa of choice |
taxa_filter | filter taxa by proportion of samples seen in |
taxa_prune | remove taxa from a phyloseq-object |
Call | Description |
---|---|
library_size | transform abundance data using library-size normalization |
relative_abundance | transform abundance data to relative abundance |
Call | Description |
---|---|
common_taxa | find taxa common to each treatment |
taxa_core | filter taxa by proportion of samples and relative abundance |
taxa_proportions | computes the proportion of a taxa classification |
unique_taxa | find taxa unique to each treatment |
Call | Description |
---|---|
abundance_heatmap | create a ggplot object of the heatmaps of the abundance table |
abundance_lines | create a ggplot object of the abundance data as a line graph |
phylogeny_profile | create a ggplot barplot object of the compositons of each sample at a taxonomic level |
taxa_abundance_bars | create a ggplot object of the abundance of taxa in each sample |
taxa_core_graph | create a ggplot object of the core taxa over a range of parameters |
variable_correlation_heatmap | create a ggplot heatmatp of the correlation of numerical variables with taxa |
Call | Description |
---|---|
alpha_diversity_graph | create a ggplot-object box-plot of the alpha-diversity from a phyloseq-object. |
dendrogram_phyloseq | create a ggplot-object dendrogram of the distance measurement from a phyloseq-object. |
nmds_phyloseq | create a ggplot object of the NMDS from a phyloseq object |
pcoa_phyloseq | create a ggplot object of the PCoA from a phyloseq object |
tsne_phyloseq | create a ggplot object of the t-SNE from a phyloseq object |
Call | Description |
---|---|
co_occurrence_network | creates a network of the co-occurrence of taxa |
network_layout_ps | creates a layout object for a network |
network_ps | creates a network object from a phyloseq_object |
variable_correlation_network | creates a network of the correlation of taxa and sample variables |
Call | Description |
---|---|
co_occurrence | calculate co-occurrence between taxa |
curate_co_occurrence | filter co-occurrence tables |
permute_rho | runs permutations of the otu_table to calculate a significant \(\rho\) value |
histogram_permuted_rhos | Create a ggplot object of the distribution of rho values. |
quantile_permuted_rhos | calculate quantiles for the permuted rho values from the Spearman-rank co-occurrence |
variable_correlation | calculate the correlation of numerical variables with taxa abundances |
Originally I had created 2 mock phyloseq objects (mock_phyloseq
and mock_phyloseq2
) that had no real-world data but served to show simple examples of how the functions worked.
Then I decided that I should include a real example of microbiome data (soil_column
) becasue it’s always nice to see real examples. soil_column
is a published dataset from my lab-group. The data is from an experiment where they looked at the microbial composition of farmland soil before and after manure application, over time, using 16S-sequencing.
Schuyler Smith
Ph.D. Bioinformatics and Computational Biology