Goals
As a recap – july goals were:
✅ Finish clam chapter intro + discussion, send to committee
✅ Get gene lists for all bivalve species
Get consensus tree (or ✅ get UCEs mapped for
at least 2ALL species)
August goals are then to:
Get consensus tree
Make gene trees to run robinson foulds and identify genes of interest
Which should then put me in a good place for mapping genes of interest onto the tree and having results (or at the very least, solid tentative results) by end of summer, keeping me right on schedule with the plan I presented at my committee meeting :D
Dailies
Aug 1
Previewed some of my outputs to see if they looked as expected. The 2 bait files I was using each had 10,000 sequences that are ultraconserved throughout mollusca (bait v1) and bivalvia (bait v2), yet some of the species had very few hits, some (zebra mussel) with 0 total hits. Not sure why this is happening so spent some time reading previous issues on the phyluce github page and troubleshooting to see why some uce’s weren’t found in some genomes. Another issue was that the samples were losing sequences when converting from lastz to fasta (see below). spent some time troubleshooting that as well. Luke and I are going to meet to talk about it but we’re both out of the office for the next ~two weeks so it’ll have to wait.
Aug 19
Back in town! Met with Luke this morning to review outputs and troubleshoot together. It seems that the reduction from lastz to fasta sequences is expected as the conversion only keeps the UCE sequence (DB) that the genome sequence (query) had the strongest match to.
The issue regarding some of the samples not having any hits seems to be with the significance or identiy match specification default for phyluce. There was an ommitted argument in the phyluce alignment (was omitted in the tutorial I was following) but I was able to find it in a different phyluce and re ran the alignment with --identity 50
and --identity 80
with much more success. The average for 50 identity was ~860 (v1) and ~740 (v2), and for 80 identity was ~500 (v1) and ~550 (v2). (v1 is bait for mollusca more generally, and v2 is the bait for bivalvia more specifically). This is on par with the ~850 average UCE matches found by Li et al., 2024 (who provided the bait sequences used).
Oh and here’s the repo. Phyluce stuff is in data/phyluce/
Aug 21
now that I had my UCEs successfully harvested, I had to switch to a different phyluce tutorial. Eventually I want to have folders with fastas of UCEs that map to 50%, 75%, and 95% of the taxa, and matrixes in proper format representing these so that they can be input into IQ tree for the species consensus tree I want to build. The species consensus tree will then be what I use to run robinson foulds distance for the (repro) gene trees to identify reproductive genes of interest.
the general workflow for all this is then:
harvest UCEs (identifying which UCEs are present in each of the 17 bivalves) (Tutorial III) –>
create two data matrixes – one with complete representation (UCEs in all samples) and one with incomplete (UCEs in at least 3 samples). treat the UCE outputs from tutorial III as contigs (daily use phyluce tutorial) –>
use these matrixes to then extract, align, and trim UCEs of interest and generate fasta files for them (tutorial I, starting with the aligning UCE loci)
so today I took the outputs from tutorial III and started working on the data matrices to proccess UCEs.
Aug 22
finished with the daily use tutorial and got UCE fastas for all the UCEs found within my bivavles. next step is to continue onto tutorial I and align and trim them to prep for IQ tree.
Aug 25
raven down…womp womp
started reviewing some of the comments on my proposal draft from like 9 months ago lol. realized a common theme was that i needed more sources and a bit more explanation regarding 1) how gametogenesis in bivalves works, and 2) why it matters. So started some lit review for that. mostly reading chapters 1, 2, 4, and 6 in “Reproduction in Aquatic Animals”: (Yoshida M, Asturiano JF (eds) (2020) Reproduction in aquatic animals, 2020th ed. Springer, Singapore, Singapore.) which covered spermatogenesis, oogenesis, fertilization, etc.
Aug 26
Raven was still down in the morning so I continued with some of my lit review in the morning looking at parental effects on offspring (specifically looking for paternal impacts since the manila clam stuff gave a decent dive into maternal influence).
Then started work on tutorial I
Aug 27
continued and concluded work on tutorial I. aligned and trimmed the uce loci using mafft alignment and internal trimming (due to potential evolutionary distance of some of the bivalve species) and cleaned alignments. then was able to generate data folders of 50%, 75% and 95% representation (folder contained the fasta files of UCEs present for each % of taxa and stated uce and corresponding taxa sequence) and matrices that concatenated these files. Then converted the concatenated files into phyllip format for use in IQ tree.
Aug 28
Created the IQ trees and visualized with FigTree. Noticed an error fairly upstream in the UCE/phyluce code so went back and edited that and re ran all downstream analyses. Wrote up methods to date for whole consensus tree process.