July Goals

Finish clam chapter intro + discussion, send to committee
Get gene lists for all bivalve species
Get consensus tree (or get UCEs mapped for at least 2 species)

July Daily Updates

Note, once Spring quarter and finals finished, I was moving so didn’t get much done aside from a little bit of outlining for clam chapter.

July 1-9th

Lumping this swath of days together as it was pretty much one continuous effort of putting together the clam chapter. I pulled the methods and results from the shared MS and added in the extra contextualizing details as needed (eg., experimental setup info). I also pulled the last discussion draft I had worked on with Mac and Emma before Emma did the final push and revision for the MS draft. There was still a little bit of work to be done on that, including concluding statements, tying back in some more sources to support my interpretations and compare the results to similar studies, and tying in the DEGs to carryover effects (ie., can these results really indicate the parent conditions primed the offspring). The intro I wrote from scratch. Much of this time was me sifting through literature (incl. many maternal RNA papers), outlining, and staring at my screen trying to figure out what I wanted to say. Eventually got a draft and then spent a day or two refining it before sending it out to the committee on the 10th.

There was also a day in here where I stepped away from the writing and worked some more on the workflow for the other chapter. Started w the C. gigas genome. Ran blast using tblastn with the genome as the database and the genes under the GO term as the query. Set e value to 1e-10 which returned around ~3000, with some sequence IDs in the genome having multiple hits for different species versions of the same gene (eg., MOUSE_TEST1, HUMAN_TEST1). Code here.

July 10th

Final lookover of clam chapter, double checking all sources were sited (and hunting down some of them that seemed to elude my paperpile folders — which was mostly just the methods/code package related sources). Sent to committee (first july goal done!). Shifting gears to bivalve chapter.

(was out of town july 11th-14th)

July 15th

Continued with the gmt chapter workflow. Got the GO:bp info from uniprot and finished the annotation. Since there was multiple hits I also filtered by out the top hit by highest bitscore (so if MOUSE_TEST1 had bitscore of 98, and HUMAN_TEST1 had a bitscore of 87, the mouse version of the gene was the one was kept). A potential issue with this may be that there could be different genes mapped to the same sequence like TEST1 vs. some other gene that both had lets say 30% match to. One solution to this might be to redo my filtering for top hits based on % identity, so the protein version that has a higher % match is the one that is kept, but I’m also not sure if I want to filter down further to say ‘anything less than 30% match gets excluded’ for example. Currently, the lowest % post filter is ~17%

Luke is at a conference currently but I’ve emailed him to set a time to meet and discuss some of this. My goal is to get this example workflow finalized with an output that sets me up nicely for the Robinson-foulds mapping. I want my gene list to be filtered but need guidance on which way might be the most justified. Code here. Outputs here (note that the gene list is filtered for top bitscore hit but does not include pident, while the annotation file does include pident but is not filtered for top bitscore hit).

Looking forward

By early next week I’d like to get the code running for the other 16 bivalve species and have all of my gene lists by the end of next week. Concurrently, I may start working on creating my consensus tree workflow, but since I’m very unfamiliar with this methodology, I’m not sure what the timeline or procedure is for that. So between coding things, it’ll be a little bit of looking through forums and literature about building trees using UCEs and I’ll look into the bitscore vs. pident filter debate a bit as well in the more immediate meantime (aka this week).

July 21st

Finished gene list generation for all bivalve species:

Assembly Accession	Organism Name	Common Name	Gbp	# Seq in DB (genome)	# Seq in Querey	# of Query Matches
GCF_002022765.2	Crassostrea virginica	Eastern Oyster	0.118	60213	4234	3129
GCF_026914265.1	Mya arenaria	Soft Shell Clam	0.126	61919	4234	3116
GCF_963853765.1	Magallana gigas	Pacific Oyster	0.110	51045	4234	3127
GCF_041381155.1	Argopecten irradians	Bay Scallop	0.085	41903	4234	3129
GCF_963676685.1	Mytilus edulis	Blue Mussel	0.134	62345	4234	3128
GCF_902652985.1	Pecten maximus	Great Scallop	0.072	39918	4234	3116
GCF_025612915.1	Magallana angulata	Portugese Oyster	0.102	49950	4234	3135
GCF_947568905.1	Ostrea edulis	European Flat Oyster	0.117	57349	4234	3140
GCF_021730395.1	Mercenaria mercenaria	Hard Clam	0.121	63247	4234	3058
GCF_020536995.1	Dreissena polymorpha	Zebra Mussel	0.140	75288	4234	3105
GCF_036588685.1	Mytilus trossulus	Pacific Blue Mussel	0.099	53269	4234	3128
GCF_026571515.1	Ruditapes philippinarum	Manila Clam	0.097	57637	4234	3098
GCF_002113885.1	Mizuhopecten yessoensis	Yesso Scallop	0.082	41567	4234	3122
GCF_021869535.1	Mytilus californianus	California Mussel	0.104	49739	4234	3083
GCF_033153115.1	Saccostrea echinata	Blacklip Rock Oyster	0.065	36217	4234	3143
GCF_032062105.1	Saccostrea cuccullata	Natal Rock Oyster	0.091	56432	4234	3120
GCF_031769215.1	Ylistrum balloti	Ballot Saucer Scallop	0.042	24047	4234	3106

July 24th

Started on phyluce tutorial III to retrieve UCEs from each of the bivalve genomes & installing packages (and troubleshooting associated with that)

July 28th

continued phyluce tutorial. First run at aligning genomes to bait sequences. Kept hanging; troubleshooted with sam.

July 29th

troubleshooting continues and was able to successfully align all genomes to the bait sequences (probe files) and extract UCEs

July 30th

continued UCE extraction and file conversion. ran into issue where the output UCE matches for each genome (which come as a lastz output format) was struggling to get converted to fasta. was a naming scheme issue – had to go upstream and remove the lcl| from sequence names, re run phyluce alignment, and was able to solve the issue.

July 31st

Finished converting lastz outputs to fasta – UCEs extracted for all my species :D