Home and schedule - barrettlab/2021-Genomics-bootcamp GitHub Wiki

Welcome to the 2021-Genomics-bootcamp wiki!

This itinerary also exists as a color-coded google doc (sorry, no colors here in GitHub!)

Itinerary

WVU CPING Genomics training workshop

Tuesday, June 1- June 4, 2021

(Year 2) 4-day Virtual Workshop for Regional PIs

Participants, WVU:

Craig Barrett (CPING Co-PI)

Dhanu Ramachandran (CPING postdoc)

Cameron Corbett (PhD student, WVU), Sam Skibicki (PhD candidate, WVU), Hana Thixton (PhD Candidate, WVU)

CREU students at WVU

Zoe Bender (Gettysburg College)

Trezalka Budinsky (Pitt)

Regional PIs plus their CREU students:

Pamela Puppo (Marshall University) & Rebecca Foy (Marshall U)

Melanie Link-Perez (Eastern Kentucky University, no student this summer)

Michael Sundue (University of Vermont) & River Pasquale (U Vermont)

GitHub Wiki for (most tutorials):

Github Wiki for Bootcamp

Google Drive Link for CREU student activities, usually 4-5pm, 3-5pm Fridays

You may need to request access for the link above!

Week prior to workshop

-pre-workshop survey (asynchronous)

Some video resources:

-sampling herbarium specimens (James Beck)

https://www.invasiongenomics.com/herbarium.html

-How to conduct a 96-well CTAB DNA extraction from herbarium or silica-dried material:

https://www.invasiongenomics.com/dna.html

-introduction to UNIX video:

https://www.invasiongenomics.com/computing.html

-Craig will configure logins to myco (Craig’s server) and/or Thorny Flat (WVU-HPC)

-Regionals prepare 5-10 minute ppt with their project objectives and experimental plan

-Regionals download any software and data locally for the workshop (UNIX shell, etc.)

-First reading (read before Day 1) : McKain et al. (2018) “Practical considerations for plant phylogenomics”

McKain et al., 2018

Day 1: Tuesday June 1, 2021

Morning:

- 9-10:30am, or before workshop: introduction to CPING projects and hub structure

*Watch from: 1h 35m – 1h 55m 30s:

Intro to CPING hub structure, people, and projects

- 10:30-11am: participant introductions

- 11am-12pm: connection to user interests

Afternoon:

- introduction to common genomics workflows video (from 2020 Botany Intro to HTS workshop): Watch from: Start – 1h 35m

Genomics workflows

-2-4pm: Practice following the Github Wiki:

a. Logging into ‘myco’ (the Barrett Lab server): this is a short video by Dhanu Ramachandran, our CPING postdoc. (https://drive.google.com/file/d/1XBrYKiYUAhHE8LM6G032xCjJYCKbhLpK/view?usp=sharing)

b. Commandline (self-guided):

https://github.com/barrettlab/2021-Genomics-bootcamp/wiki/Preliminaries:-Using-the-commandline-and-signing-into-a-server

c. Getting started with UNIX (self-guided):

https://github.com/barrettlab/2021-Genomics-bootcamp/wiki/I.-Getting-Started-with-UNIX

Evening:

Second reading(s), for Day 2:

Sinn, Simon et al. (in review, Methods in Ecology & Evolution). ISSRseq: an extensible, low-cost, and efficient method for reduced representation sequencing

https://www.biorxiv.org/content/10.1101/2020.12.21.423774v1

Campbell et al., 2018. Would an RRS by any other name sound as RAD? Methods in Ecology and Evolution.

https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/2041-210X.13038

Unix "Homework problem:

Tuesday June 1, 2021 -- UNIX "homework" assignment

Take the following rbcL sequence, and modify the following aspects:

1. This file is 'interleaved,' which can cause issues in some applications
   Your first task is to make it a non-interleaved file, such that the sequence is all in a single line:
   
   >Fasta_header
   ATGAGTTGTAGGGAGGG...yadayadayada...TAA

2. The Fasta header is ugly and not very useful, as downloaded from GenBank.
   Using you newly aquired UNIX expertise, modify the fasta file to look like:
   
   >EU391358_Corallorhiza_trifida_Barrett_0161c
   ATGAGTTGTAGGGAGGG...yadayadayada...TAA

3. BONUS: combine your two commands into a single command (a "one-liner")
   Super extra BONUS: Write a shell script that does this for a multi-fasta file (i.e. a file with multiple species or accessions)


>EU391358.1 Corallorhiza trifida voucher Barrett 0161c atpB-rbcL intergenic spacer, partial sequence; ribulose-1,5-bisphosphate carboxylase/oxygenase large subunit (rbcL) gene, complete cds; and rbcL-accD intergenic spacer, partial sequence; chloroplast
ATGAGTTGTAGGGAGGGACTTATGTCACCACAAACAGAAACTAAAGCGAGCGTTGGATTTAAAGCTGGTG
TTAAAGATTACAAATTGACTTATTATACTCCTGACTACGAAACCAAAGATACTGATATCTTGGCAGCATT
CCGAGTAACTCCTCAACCGGGAGTTCCGCCTGAAGAAGCGGGGGCTGCGGTAGCTGCCGAATCTTCTACT
GGTACATGGACAACTGTGTGGACTGATGGACTTACCAGTCTTGATCGTTACAAAGGACGATGCTACCACA
TTGAGGTCGTTGTTGGGGAGGAAAATCAATATATTGCTTATGTAGCTTATCCTTTAGACCTTTTTGAAGA
AGGTTCTGTTACTAACATGTTTACTTCCATTGTGGGTAATGTCTTTGGTTTCAAAGCCCTGCGAGCTCTA
CGTCTGGAAGATCTGCGAATTCCCACTTCTTATTCCAAAACTTTCCAGGGTCCACCTCATGGCATCCAAG
TTGAAAGAGATAAATTAAACAAGTATGGTCGTCCCCTATTGGGATGTACTATTAAACCAAAATTGGGATT
ATCCGCAAAAAACTACGGCAGAGCGGTTTATGAATGTCTACGGGGTGGACTTGATTTTACTAAAGATGAT
GAAAACGTAAATTCACAACCATTTATGCGTTGGAGAGATCGTTTCTTATTTTGTGCCGAAGCAATTTATA
AAGCGCAAGCCGAAACGGGTGAAATTAAAGGACATTACTTGAATGCAACTGCGGGTACGTGTGAAGAAAT
GATGAAAAGAGCAGTATTTGCCAGAGAATTGGGAGTTCCTATCGTAATGCATGACTACTTAACTGGGGGG
TTCACCGCAAATACTAGCTTGTCTCATTATTGCCGCGACAATGGTCTACTTCTTCACATCCATCGCGCAA
TGCATGCAGTTATTGATAGACAGAAAAATCATGGTATGCATTTTCGTGTACTAGCTAAAGCATTACGTAT
GTCTGGTGGAGATCATATTCATGCTGGTACAGTAGTGGGTAAACTGGAGGGGGAGCGTGAGATGACTTTG
GGTTTTGTTGATTTATTACGTGATGATTTTATTGAAAAAGATCGAAGTCGTGGTATTTTTTTCACTCAAG
ACTGGGTCTCTATGCCAGGTGTTCTGCCCGTGGCTTCAGGGGGTATTCATGTTTGGCATATGCCTGCCCT
AACTGAAATCTTTGGGGATGATTCCGTACTACAGTTCGGTGGAGGAACTCTAGGACACCCTTGGGGAAAT
GCACCCGGCGCAGTAGCTAATCGGGTGGCTTTAGAAGCATGTGTACAAGCTCGTAATGAGGGACGTGACC
TTGCTCGTGAAGGTAATGATATTATTCGTGAAGCTACCAAATGGAGCCCTGAGCTAGCCGCTGCTTGTGA
AGTATGGAAAGAGATCACATTCGATTTCGACCCAGTGGATAAGCTAGATAAAGAGACAAAATAA

4. Super duper extra bonus: Can you come up with a simple command to use the 'grep' function to quickly count how many sequences are in a fasta file? 
### Hint, focus on the caret '>' in the fasta header

Day 2: Wednesday June 2, 2021

Morning:

- 9am-9:30am or night before: ISSR-seq – Watch Craig’s Botany2020 Virtual talk

https://drive.google.com/file/d/17orQiTyfZJ-qfOoIVqS0gsiwueHyuMdH/view?usp=sharing

ISSRseq + library prep protocol:

https://github.com/barrettlab/2021-Genomics-bootcamp/wiki/Lab-Protocol-for-ISSR-seq

- Begin ISSR-seq ~10am

YouTube Video protocol link

On CPING website

• Dilute gDNAs to 20 ng/ul

• Set up PCR plate with master mix

• Pool 4 primers, so only 1 PCR needed

• PCR ~3 hours, so plan to put the reactions on the cycler ~10am • Pour a gel for later

• Store amplicons in 4C fridge temporarily, -20C for longer term

Afternoon:

-ISSRseq gel, nanodrop, pooling amplicons across samples

Evening:

- 5-6pm data exploration on HPC (structure of FASTQ files, moving data, trimming reads); day 2 debriefing

*this will involve previously generated data, lots of options here

Reading for day 3: Chown et al. (2014) “Biological invasions, climate change, and genomics”

https://onlinelibrary.wiley.com/doi/10.1111/eva.12234

Day 3: Thursday June 3, 2021

Morning:

-Illumina library preps, part I: gDNA quantification with Qubit, fragmentation and adapter ligation, bead cleanups

YouTube Video protocol link

On CPING website

• Preps will be conducted either on gDNA OR on ISSRseq pooled amplicons, depending on your particular project!

Afternoon:

- Illumina library preps, part II: Library amplification, cleanups, Qubit, and pooling

Evening:

-7PM (or TBD based on preferences) -- Invasive species trivia night!

Virtual beer will be provided.

Reading for Day 4: Andermann et al., 2020: “A Guide to Carrying Out a Phylogenomic Target Sequence Capture Project”

https://www.frontiersin.org/articles/10.3389/fgene.2019.01407/full

Day 4: Friday June 4, 2021

Morning:

- sequence capture video (~3min):

https://www.youtube.com/watch?v=VsXAbF972xk

Sequence capture & reduced representation presentation

- 10:30-11:30am: Group discussion with Sam Skibicki, PhD student in Barrett lab using sequence capture in Asteraceae

Afternoon:

- 1-2pm: talk about metabarcoding? (Hana, Craig, Regional PIs)

- - 2-5pm: data analysis on server (plastid genome assembly, alignment, phylogeny reconstruction), day 4 debriefing (Barrett Lab, Regional PIs)

*this will involve previously generated data, lots of options here

Evening:

Reading for Day 5 (over the weekend): Braasch et al. 2019. “Expansion history and environmental suitability shape effective population size in a plant invasion”

https://onlinelibrary.wiley.com/doi/pdf/10.1111/mec.15104

Day 5: Date TBD, after bootcamp

Morning:

- 9am-11am: Data analysis on HPC (ISSRseq analysis pipeline using Corallorhiza, Microstegium, or “20-20-20” data; possibly metabarcoding)

Afternoon:

- 11am-1pm: Discussion of results from data analysis; regional group project planning

After workshop

- post-workshop survey

- Botany2021-Virtual. This will include our workshop, “Introduction to HTS”.

- If you can’t get into the workshop, we will record and share with you and your students!

- In-person visits to WVU on individual basis (Late Summer or Fall 2021 if possible?)

- CPING-wide meeting/symposium. In Louisiana? Virtual? Mid-August before classes start