• 3.1 Dereplicating and Sorting Reads
  • 3.2 Chimera Detection
    • De Novo
    • Reference
  • 3.3 Sort for Clustering


vsearch is the primary tool we use for chimera detection and removal.

3.1 Dereplicating and Sorting Reads

There are still an enormous number of reads in our dataset. Many of these reads are identical. It doesn’t make sense to analyze the same sequence repeatedly, so we concatenate the duplicate reads. derep_fulllength removes duplicate sequences and sorts the sequences by length.

$VSEARCH --threads $CORES --derep_fulllength $DIRECTORY/quality_check/chimera_removal/all_combined_q$Q.fa --output $DIRECTORY/quality_check/chimera_removal/all_combined_q$Q\_unique_sort_min2.fa --sizeout --minuniquesize 2


3.2 Chimera Detection

There is debate to whether de novo and refernce filtering is necessary. Most people seem to think that reference is superior and will get most of the reads that de novo would get anyways. The way this is written, it assumes both will be done. If you want to skip one, then the files need to

De Novo

uchime_denovo evaluates each sequence without an outside reference base.

$VSEARCH --threads $CORES --uchime_denovo $DIRECTORY/quality_check/chimera_removal/all_combined_q$Q\_unique_sort_min2.fa --chimeras $DIRECTORY/quality_check/chimera_removal/all_combined_q$Q\_unique_sort_min2_denovo.chimera --nonchimeras $DIRECTORY/quality_check/chimera_removal/all_combined_q$Q\_unique_sort_min2_denovo.good

Reference

uchime_ref screens the selected chimera database for potential matches in your sequences.

$VSEARCH --threads $CORES --uchime_ref $DIRECTORY/quality_check/chimera_removal/all_combined_q$Q\_unique_sort_min2_denovo.good --nonchimeras $DIRECTORY/quality_check/chimera_removal/all_combined_q$Q\_unique_sort_min2_denovo_ref.good --db $CHIMERA_DB

3.3 Sort for Clustering

$VSEARCH --threads $CORES --derep_fulllength $DIRECTORY/quality_check/chimera_removal/all_combined_q$Q\_unique_sort_min2_denovo_ref.good --relabel "U_" --output $DIRECTORY/quality_check/chimera_removal/relabeled_denovo_ref.good



Schuyler Smith
Ph.D. Student - Bioinformatics and Computational Biology
Iowa State University. Ames, IA.