vsearch
is the primary tool we use for chimera detection and removal.
There are still an enormous number of reads in our dataset. Many of these reads are identical. It doesn’t make sense to analyze the same sequence repeatedly, so we concatenate the duplicate reads. derep_fulllength
removes duplicate sequences and sorts the sequences by length.
$VSEARCH --threads $CORES --derep_fulllength $DIRECTORY/quality_check/chimera_removal/all_combined_q$Q.fa --output $DIRECTORY/quality_check/chimera_removal/all_combined_q$Q\_unique_sort_min2.fa --sizeout --minuniquesize 2
There is debate to whether de novo and refernce filtering is necessary. Most people seem to think that reference is superior and will get most of the reads that de novo would get anyways. The way this is written, it assumes both will be done. If you want to skip one, then the files need to
uchime_denovo
evaluates each sequence without an outside reference base.
$VSEARCH --threads $CORES --uchime_denovo $DIRECTORY/quality_check/chimera_removal/all_combined_q$Q\_unique_sort_min2.fa --chimeras $DIRECTORY/quality_check/chimera_removal/all_combined_q$Q\_unique_sort_min2_denovo.chimera --nonchimeras $DIRECTORY/quality_check/chimera_removal/all_combined_q$Q\_unique_sort_min2_denovo.good
uchime_ref
screens the selected chimera database for potential matches in your sequences.
$VSEARCH --threads $CORES --uchime_ref $DIRECTORY/quality_check/chimera_removal/all_combined_q$Q\_unique_sort_min2_denovo.good --nonchimeras $DIRECTORY/quality_check/chimera_removal/all_combined_q$Q\_unique_sort_min2_denovo_ref.good --db $CHIMERA_DB
$VSEARCH --threads $CORES --derep_fulllength $DIRECTORY/quality_check/chimera_removal/all_combined_q$Q\_unique_sort_min2_denovo_ref.good --relabel "U_" --output $DIRECTORY/quality_check/chimera_removal/relabeled_denovo_ref.good
Schuyler Smith
Ph.D. Student - Bioinformatics and Computational Biology
Iowa State University. Ames, IA.