Examples used in this vignette will use the GlobalPatterns
dataset from phyloseq
.
library(phyloseq)
data(GlobalPatterns)
Merges samples within a phyloseq-class
object which match on the given criteria (treatment
). Any sample_data factors that do not match will be set to NA
. otu_table
counts will be reassigned as the mean of all the samples that are merged together.
Use this with caution as replicate samples may be crucial to the experimental design and should be proven statistically to be similar enough to combine for downstream analysis.
Usage
conglomerate_samples(phyloseq_obj, treatment, subset = NULL)
Arguments
Call | Description |
---|---|
phyloseq_obj |
A phyloseq-class object. |
treatment |
Column name as a string , or vector of, in the sample_data . |
subset |
A factor within the treatment . This will remove any samples that to not contain this factor. This can be a vector of multiple factors to subset on. |
Examples
phyloseq::sample_sums(GlobalPatterns)
## CL3 CC1 SV1 M31Fcsw M11Fcsw M31Plmr M11Plmr F21Plmr
## 864077 1135457 697509 1543451 2076476 718943 433894 186297
## M31Tong M11Tong LMEpi24M SLEpi20M AQC1cm AQC4cm AQC7cm NP2
## 2000402 100187 2117592 1217312 1167748 2357181 1699293 523634
## NP3 NP5 TRRsed1 TRRsed2 TRRsed3 TS28 TS29 Even1
## 1478965 1652754 58688 493126 279704 937466 1211071 1216137
## Even2 Even3
## 971073 1078241
conglomerated <- conglomerate_samples(GlobalPatterns, treatment = 'SampleType')
phyloseq::sample_sums(conglomerated)
## Soil Feces Skin Tongue
## 899014.3 1442116.0 446378.0 1050294.5
## Freshwater Freshwater (creek) Ocean Sediment (estuary)
## 1667452.0 1741407.3 1218451.0 277172.7
## Mock
## 1088483.7
A re-write of the phyloseq::tax_glom()
. This iteration runs faster with the implementation of data.table
.
Usage
conglomerate_taxa(phyloseq_obj, classification, hierarchical = TRUE)
Arguments
Call | Description |
---|---|
phyloseq_obj |
A phyloseq-class object. |
classification |
Column name as a string in the tax_table for the factor to conglomerate by. |
hierarchical |
Whether the order of factors in the tax_table represent a decreasing hierarchy (TRUE ) or are independant (FALSE ). If FALSE , will only return the factor given by classification . |
Examples
conglomerate_taxa(GlobalPatterns, classification = 'Phylum', hierarchical = TRUE)
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 67 taxa and 26 samples ]
## sample_data() Sample Data: [ 26 samples by 7 sample variables ]
## tax_table() Taxonomy Table: [ 67 taxa by 2 taxonomic ranks ]
Converts the otu_table
, tax_table
, and sam_data
to a 2-dimensional data.table
.
Usage
melt_phyloseq(phyloseq_obj)
Arguments
Call | Description |
---|---|
phyloseq_obj |
A phyloseq-class object. |
Examples
melt_phyloseq(GlobalPatterns)
## Warning in `[.data.table`(sample_data, , `:=`(Sample, NULL)): Column 'Sample'
## does not exist to remove
Combines multiple columns from the sample-data into a single column. Doing this can make it easier to subset and look at the data on multiple factors.
Usage
merge_treatments(phyloseq_obj, ...)
Arguments
Call | Description |
---|---|
phyloseq_obj |
A phyloseq-class object. It must contain sample_data() with information about each sample. |
treatment |
Column name as a string , or vector of, in the sample_data . |
Examples
merge_treatments(GlobalPatterns, c('Final_Barcode', 'Barcode_truncated_plus_T'))
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 19216 taxa and 26 samples ]
## sample_data() Sample Data: [ 26 samples by 8 sample variables ]
## tax_table() Taxonomy Table: [ 19216 taxa by 7 taxonomic ranks ]
## phy_tree() Phylogenetic Tree: [ 19216 tips and 19215 internal nodes ]
Arranged the phyloseq object so that the samples are listed in a given order, or sorted on metadata. This is most useful for visual inspection of the metadata, and having the samples presented in a correct order in ggplot2
figures.
Usage
set_sample_order(phyloseq_obj, treatment)
Arguments
Call | Description |
---|---|
phyloseq_obj |
A phyloseq-class object. |
treatment |
Column name as a string , or vector of, in the sample_data . |
Examples
phyloseq::sample_names(GlobalPatterns)
## [1] "CL3" "CC1" "SV1" "M31Fcsw" "M11Fcsw" "M31Plmr"
## [7] "M11Plmr" "F21Plmr" "M31Tong" "M11Tong" "LMEpi24M" "SLEpi20M"
## [13] "AQC1cm" "AQC4cm" "AQC7cm" "NP2" "NP3" "NP5"
## [19] "TRRsed1" "TRRsed2" "TRRsed3" "TS28" "TS29" "Even1"
## [25] "Even2" "Even3"
ordered_obj <- set_sample_order(GlobalPatterns, "SampleType")
phyloseq::sample_names(ordered_obj)
## [1] "M31Fcsw" "M11Fcsw" "TS28" "TS29" "LMEpi24M" "SLEpi20M"
## [7] "AQC1cm" "AQC4cm" "AQC7cm" "Even1" "Even2" "Even3"
## [13] "NP2" "NP3" "NP5" "TRRsed1" "TRRsed2" "TRRsed3"
## [19] "M31Plmr" "M11Plmr" "F21Plmr" "CL3" "CC1" "SV1"
## [25] "M31Tong" "M11Tong"
Set the order of the levels of a factor in the sample-data. Primarily useful for easy formatting of the order that ggplot2
will display samples.
Useful for:
Usage
set_treatment_levels(phyloseq_obj, treatment, order)
Arguments
Call | Description |
---|---|
phyloseq_obj |
A phyloseq-class object. |
treatment |
Column name as a string , or vector of, in the sample_data . |
order |
The order of factors in treatment column as a vector of string s. If assigned “numeric” will set ascending numerical order. |
Examples
levels(soil_column@sam_data$Day)
## [1] "0" "10" "108" "24" "38" "59" "80"
ordered_days <- set_treatment_levels(soil_column, 'Day', 'numeric')
levels(ordered_days@sam_data$Day)
## [1] "0" "10" "24" "38" "59" "80" "108"
Create a new phyloseq-object containing defined taxa. Taxa names can be a substring or entire taxa name. It will match that string
in all taxa levels unless a specific classification
level is declared.
Useful for:
Usage
taxa_extract(phyloseq_obj, taxa_to_extract, classification = NULL)
Arguments
Call | Description |
---|---|
phyloseq_obj |
A phyloseq-class object. |
taxa_to_extract |
A string , or vector of taxa of interest. |
classification |
Column name as a string in the tax_table for the factor |
to conglomerate by.
Examples
GlobalPatterns
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 19216 taxa and 26 samples ]
## sample_data() Sample Data: [ 26 samples by 7 sample variables ]
## tax_table() Taxonomy Table: [ 19216 taxa by 7 taxonomic ranks ]
## phy_tree() Phylogenetic Tree: [ 19216 tips and 19215 internal nodes ]
taxa_extract(GlobalPatterns, c("Cyano", "Proteo","Actinobacteria"))
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 8441 taxa and 26 samples ]
## sample_data() Sample Data: [ 26 samples by 7 sample variables ]
## tax_table() Taxonomy Table: [ 8441 taxa by 7 taxonomic ranks ]
## phy_tree() Phylogenetic Tree: [ 8441 tips and 8440 internal nodes ]
This is a robust function that is implemented in nearly every other function of this package. It uses many of the subsetting processes distributed within phyloseq
, but strives to make them a more user-friendly and combined into a one-stop function. The function works in several steps.
treatments
were specified. If so, it splits the phyloseq into separate objects for each treatment to process.frequency
(filtering out taxa seen in few samples) and then merge back to one objectsubset
is declared, remove all treatment
outside of the subset
drop_samples
is TRUE
then remove any samples that have 0 taxa observed after filtering (this is a very situational need)If frequency
is set to 0 (default), then the function removes any taxa with no abundance in any sample.
Useful for:
Usage
taxa_filter(phyloseq_obj, treatment = NULL, subset = NULL, frequency = 0, below = FALSE, drop_samples = FALSE)
Arguments
Call | Description |
---|---|
phyloseq_obj |
A phyloseq-class object. |
treatment |
Column name as a string , or vector of, in the sample_data . |
subset |
A factor within the treatment . This will remove any samples that to not contain this factor. This can be a vector of multiple factors to subset on. |
frequency |
The proportion of samples the taxa is found in. |
below |
Does frequency define the minimum (FALSE ) or maximum (TRUE ) proportion of samples the taxa is found in. |
drop_samples |
Should the function remove samples that that are empty after removing taxa filtered by frequency (TRUE ). |
Examples The soil_column
data has 19,216 OTUs listed in its taxa_table
.
GlobalPatterns
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 19216 taxa and 26 samples ]
## sample_data() Sample Data: [ 26 samples by 7 sample variables ]
## tax_table() Taxonomy Table: [ 19216 taxa by 7 taxonomic ranks ]
## phy_tree() Phylogenetic Tree: [ 19216 tips and 19215 internal nodes ]
However, 228 of those taxa are not actually seen in any of the samples.
length(phyloseq::taxa_sums(GlobalPatterns)[phyloseq::taxa_sums(GlobalPatterns) == 0])
## [1] 228
taxa_filter
with frequency = 0
will remove those taxa.
taxa_filter(GlobalPatterns, frequency = 0)
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 18988 taxa and 26 samples ]
## sample_data() Sample Data: [ 26 samples by 7 sample variables ]
## tax_table() Taxonomy Table: [ 18988 taxa by 7 taxonomic ranks ]
## phy_tree() Phylogenetic Tree: [ 18988 tips and 18987 internal nodes ]
Say that we wanted to only look at taxa that are seen in 80% of the samples.
taxa_filter(GlobalPatterns, frequency = 0.80)
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 435 taxa and 26 samples ]
## sample_data() Sample Data: [ 26 samples by 7 sample variables ]
## tax_table() Taxonomy Table: [ 435 taxa by 7 taxonomic ranks ]
## phy_tree() Phylogenetic Tree: [ 435 tips and 434 internal nodes ]
But if we want taxa that are seen in 80% of any 1 teatment group;
taxa_filter(GlobalPatterns, frequency = 0.80, treatment = 'SampleType')
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 435 taxa and 26 samples ]
## sample_data() Sample Data: [ 26 samples by 7 sample variables ]
## tax_table() Taxonomy Table: [ 435 taxa by 7 taxonomic ranks ]
## phy_tree() Phylogenetic Tree: [ 435 tips and 434 internal nodes ]
It returns a larger number of taxa, since they need to be seen in less samples overall.
Create a new phyloseq-object ommitting the defined taxa. Taxa names can be a substring or entire taxa name. It will match that string
in all taxa levels unless a specific classification
level is declared.
Useful for:
Usage
taxa_prune(phyloseq_obj, taxa_to_remove, classification = NULL)
Arguments
Call | Description |
---|---|
phyloseq_obj |
A phyloseq-class object. |
taxa_to_remove |
A string , or vector of taxa to remove. |
classification |
Column name as a string in the tax_table for the factor |
to conglomerate by.
Examples
GlobalPatterns
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 19216 taxa and 26 samples ]
## sample_data() Sample Data: [ 26 samples by 7 sample variables ]
## tax_table() Taxonomy Table: [ 19216 taxa by 7 taxonomic ranks ]
## phy_tree() Phylogenetic Tree: [ 19216 tips and 19215 internal nodes ]
taxa_prune(GlobalPatterns, c("Cyano", "Proteo","Actinobacteria"))
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 17585 taxa and 26 samples ]
## sample_data() Sample Data: [ 26 samples by 7 sample variables ]
## tax_table() Taxonomy Table: [ 17585 taxa by 7 taxonomic ranks ]
## phy_tree() Phylogenetic Tree: [ 17585 tips and 17584 internal nodes ]
Schuyler Smith
Ph.D. Bioinformatics and Computational Biology