Consensus Tree Methods & Prelim Results
Genomes from 17 bivalves with RefSeq annotations were retrieved from NCBI (Table 1).
Table 1. Species with RefSeq annotated genomes pulled from NCBI and genome size.
Assembly Accession | Organism Name | Common Name | Gbp | # Seq in DB (genome) |
GCF_002022765.2 | Crassostrea virginica | Eastern Oyster | 0.118 | 60213 |
GCF_026914265.1 | Mya arenaria | Soft Shell Clam | 0.126 | 61919 |
GCF_963853765.1 | Magallana gigas | Pacific Oyster | 0.110 | 51045 |
GCF_041381155.1 | Argopecten irradians | Bay Scallop | 0.085 | 41903 |
GCF_963676685.1 | Mytilus edulis | Blue Mussel | 0.134 | 62345 |
GCF_902652985.1 | Pecten maximus | Great Scallop | 0.072 | 39918 |
GCF_025612915.1 | Magallana angulata | Portugese Oyster | 0.102 | 49950 |
GCF_947568905.1 | Ostrea edulis | European Flat Oyster | 0.117 | 57349 |
GCF_021730395.1 | Mercenaria mercenaria | Hard Clam | 0.121 | 63247 |
GCF_020536995.1 | Dreissena polymorpha | Zebra Mussel | 0.140 | 75288 |
GCF_036588685.1 | Mytilus trossulus | Pacific Blue Mussel | 0.099 | 53269 |
GCF_026571515.1 | Ruditapes philippinarum | Manila Clam | 0.097 | 57637 |
GCF_002113885.1 | Mizuhopecten yessoensis | Yesso Scallop | 0.082 | 41567 |
GCF_021869535.1 | Mytilus californianus | California Mussel | 0.104 | 49739 |
GCF_033153115.1 | Saccostrea echinata | Blacklip Rock Oyster | 0.065 | 36217 |
GCF_032062105.1 | Saccostrea cuccullata | Natal Rock Oyster | 0.091 | 56432 |
GCF_031769215.1 | Ylistrum balloti | Ballot Saucer Scallop | 0.042 | 24047 |
I created a consensus tree using ultraconserved elements (UCEs) present within my selected bivalves.
To identify and harvest I used two sets of 10,000 bait sequences, designed for molluscan (v1) and bivalves (v2) by Yi-Xuan et al. (2024) using Phyluce (v1.7.3). Bivalve genomes were aligned to each bait file using `phyluce_probe_run_multiple_lastzs_sqlite` with a cutoff of 80% identity for sequences to be kept. Aligned sequences were extracted as FASTA using `phyluce_proble_slice_sequence_from_genomes` with a 500 bp flanking sequence included with matched bait loci to retain additional sequence information (and therefore potential variation) outside of the UCE core itself. Extracted sequences were then matched to bait sequences for downstream analysis using `phyluce_assembly_match_contigs_to_probes`.
Two data sets were created for each bait version: a complete set which contains only the UCEs present in all taxa, and an incomplete set which contains the UCEs present in any of the taxa, using `phyluce_assembly_get_match_counts`, resulting in a total of four data sets (complete v1, complete v2, incomplete v1, incomplete v2). FASTA sequences were extracted from these datasets using `phyluce_assembly_get_fastas_from_match_counts`, aligned, and trimmed.
Alignment and edge trimming was done using `phyluce_align_seqcap_align` with arguments for 17 taxa and MAFFT muti-sequence alignment. [megan note: edge trimmed outputs were not used for downstream analyses, but were done as part of the phyluce tutorial for learning. internal trimming was done next and the outputs from those were used for downstream analyses].
Alignment was done using `phyluce_align_seqcap_align` for 17 taxa using MAFFT multi-sequence alignment, `–incomplete-matrix argument`, and `–no-trim` (to skip edge trimming). For internal trimming, `phyluce_align_get_gblocks_trimmed_alignments_from_untrimmed` was used to trim the aligned FASTAs. Aligned and trimmed sequences were cleaned using `phyluce_align_remove_locus_name_from_files` to remove taxon and locus names.
Aligned, cleaned, and trimmed sequences could then be used to generate data matrices with varying degrees of coverage. Matrices were created for the incomplete datasets (both v1 and v2) with specified 50%, 75%, or 95% coverage of the 17 taxa using `phyluce_align_get_only_loci_with_min_taxa`. Matrices were concatenated and converted to phylip data type to be used in IQtree using `phyluce_align_concatenate_alignments` with `–phylip` argument specified.
IQtree (v3.0.1) was used to generate consensus trees using the 50% and 75% coverage taxa [megan note: the 75% has 100 (v2) or 66 (v1) loci. The 50% has 283 (v2) or 222 (v1). The 95% was 14 (v2) or 7 (v1)]. Tree model was determined using ModelFinder Plus (MFP) argument within IQtree, and then a treefile was generated within IQtree using the determined best fitting model and 1000 bootstraps. Trees were visualized using FigTree (v1.4.4).
50% coverage (v1 baits) ^
50% coverage (v2 baits) ^
75% coverage (v1 baits) ^
75% coverage (v2 baits) ^
[megan note]:
75_v1 model: TVM+F+I+R3
75_v2 model: GTR+F+I+R3
50_v1 model: TVM+F+R3
50_v2 model: GTR+F+R3