July Goals + Dailies

goals, daily
Author

Megan Ewing

Published

July 16, 2025

July Goals

  • Finish clam chapter intro + discussion, send to committee

  • Get gene lists for all bivalve species

  • Get consensus tree (or get UCEs mapped for at least 2 species)

July Daily Updates

Note, once Spring quarter and finals finished, I was moving so didn’t get much done aside from a little bit of outlining for clam chapter.

July 1-9th

Lumping this swath of days together as it was pretty much one continuous effort of putting together the clam chapter. I pulled the methods and results from the shared MS and added in the extra contextualizing details as needed (eg., experimental setup info). I also pulled the last discussion draft I had worked on with Mac and Emma before Emma did the final push and revision for the MS draft. There was still a little bit of work to be done on that, including concluding statements, tying back in some more sources to support my interpretations and compare the results to similar studies, and tying in the DEGs to carryover effects (ie., can these results really indicate the parent conditions primed the offspring). The intro I wrote from scratch. Much of this time was me sifting through literature (incl. many maternal RNA papers), outlining, and staring at my screen trying to figure out what I wanted to say. Eventually got a draft and then spent a day or two refining it before sending it out to the committee on the 10th.

There was also a day in here where I stepped away from the writing and worked some more on the workflow for the other chapter. Started w the C. gigas genome. Ran blast using tblastn with the genome as the database and the genes under the GO term as the query. Set e value to 1e-10 which returned around ~3000, with some sequence IDs in the genome having multiple hits for different species versions of the same gene (eg., MOUSE_TEST1, HUMAN_TEST1). Code here.

July 10th

Final lookover of clam chapter, double checking all sources were sited (and hunting down some of them that seemed to elude my paperpile folders — which was mostly just the methods/code package related sources). Sent to committee (first july goal done!). Shifting gears to bivalve chapter.

(was out of town july 11th-14th)

July 15th

Continued with the gmt chapter workflow. Got the GO:bp info from uniprot and finished the annotation. Since there was multiple hits I also filtered by out the top hit by highest bitscore (so if MOUSE_TEST1 had bitscore of 98, and HUMAN_TEST1 had a bitscore of 87, the mouse version of the gene was the one was kept). A potential issue with this may be that there could be different genes mapped to the same sequence like TEST1 vs. some other gene that both had lets say 30% match to. One solution to this might be to redo my filtering for top hits based on % identity, so the protein version that has a higher % match is the one that is kept, but I’m also not sure if I want to filter down further to say ‘anything less than 30% match gets excluded’ for example. Currently, the lowest % post filter is ~17%

Luke is at a conference currently but I’ve emailed him to set a time to meet and discuss some of this. My goal is to get this example workflow finalized with an output that sets me up nicely for the Robinson-foulds mapping. I want my gene list to be filtered but need guidance on which way might be the most justified. Code here. Outputs here (note that the gene list is filtered for top bitscore hit but does not include pident, while the annotation file does include pident but is not filtered for top bitscore hit).

Looking forward

By early next week I’d like to get the code running for the other 16 bivalve species and have all of my gene lists by the end of next week. Concurrently, I may start working on creating my consensus tree workflow, but since I’m very unfamiliar with this methodology, I’m not sure what the timeline or procedure is for that. So between coding things, it’ll be a little bit of looking through forums and literature about building trees using UCEs and I’ll look into the bitscore vs. pident filter debate a bit as well in the more immediate meantime (aka this week).

July 21st

Finished gene list generation for all bivalve species:

Assembly Accession Organism Name Common Name Gbp # Seq in DB (genome) # Seq in Querey # of Query Matches
GCF_002022765.2 Crassostrea virginica Eastern Oyster 0.118 60213 4234 3129
GCF_026914265.1 Mya arenaria Soft Shell Clam 0.126 61919 4234 3116
GCF_963853765.1 Magallana gigas Pacific Oyster 0.110 51045 4234 3127
GCF_041381155.1 Argopecten irradians Bay Scallop 0.085 41903 4234 3129
GCF_963676685.1 Mytilus edulis Blue Mussel 0.134 62345 4234 3128
GCF_902652985.1 Pecten maximus Great Scallop 0.072 39918 4234 3116
GCF_025612915.1 Magallana angulata Portugese Oyster 0.102 49950 4234 3135
GCF_947568905.1 Ostrea edulis European Flat Oyster 0.117 57349 4234 3140
GCF_021730395.1 Mercenaria mercenaria Hard Clam 0.121 63247 4234 3058
GCF_020536995.1 Dreissena polymorpha Zebra Mussel 0.140 75288 4234 3105
GCF_036588685.1 Mytilus trossulus Pacific Blue Mussel 0.099 53269 4234 3128
GCF_026571515.1 Ruditapes philippinarum Manila Clam 0.097 57637 4234 3098
GCF_002113885.1 Mizuhopecten yessoensis Yesso Scallop 0.082 41567 4234 3122
GCF_021869535.1 Mytilus californianus California Mussel 0.104 49739 4234 3083
GCF_033153115.1 Saccostrea echinata Blacklip Rock Oyster 0.065 36217 4234 3143
GCF_032062105.1 Saccostrea cuccullata Natal Rock Oyster 0.091 56432 4234 3120
GCF_031769215.1 Ylistrum balloti Ballot Saucer Scallop 0.042 24047 4234 3106

July 24th

Started on phyluce tutorial III to retrieve UCEs from each of the bivalve genomes & installing packages (and troubleshooting associated with that)

July 28th

continued phyluce tutorial. First run at aligning genomes to bait sequences. Kept hanging; troubleshooted with sam.

July 29th

troubleshooting continues and was able to successfully align all genomes to the bait sequences (probe files) and extract UCEs

July 30th

continued UCE extraction and file conversion. ran into issue where the output UCE matches for each genome (which come as a lastz output format) was struggling to get converted to fasta. was a naming scheme issue – had to go upstream and remove the lcl| from sequence names, re run phyluce alignment, and was able to solve the issue.

July 31st

Finished converting lastz outputs to fasta – UCEs extracted for all my species :D