Repositories for the dataset examples - BelenJM/supeRbaits GitHub Wiki

Here you can find the references where to download the datasets where we found the resources to create the regions/areas of interest/exclusion for the three species used in our study:

Atlantic salmon

  • Reference genome can be found at Lien et al., 2016 and downloaded from NCBI.

  • Areas of exclusion: the RepeatMasker file can be downloaded from NCBI, file name "GCA_000233375.4_ICSASG_v2_rm.out.gz". We downloaded it on November 2016. From this file we extracted the chromosome name (column 6), position_Begin and position_End (columns 7 and 8) within the chromosome and saved it for further use.

  • Regions of interest: these were genes of interest or regions with known Quantitative Trait Loci (QTL):

-Related to growth (Baranski, Moen, & Våge, 2010 and Tsai et al., 2015

-Susceptibility to pancreatic disease (Gonen et al., 2015); or infectious pancreatic necrosis (Houston et al., 2010; Houston et al., 2012; Moen, Baranski, Sonesson, & Kjøglum, 2009; Moen et al., 2015)

-Genes related to survival in the wild (Besnier et al., 2015)

  • Points of interest: mainly Single Nucleotide Polymorphisms of interest (SNPs) from:

-SNP chip can be downloaded from Karlsson et al.(2011)

-List of SNPs identified as related to parasite-driven evolution in Atlantic salmon (Zueva et al., 2014);

-Growth-related traits(Gutierrez, Yáñez, Fukui, Swift, & Davidson, 2015; Vasemägi, Kahar, & Ozerov, 2016);

-Inmune-related areas Kjærner-Semb et al., 2016;

-Hatchery-related environment or differentiation between wild/hatchery individuals (Karlsson et al., 2011; Pritchard et al., 2016).

-Specific SNPs from SNP chips (e.g. Karlsson et al., 2011), included the ones known to be linked to age of maturation at sea (Barson et al., 2015)

  • Random areas: baits on this category were designed using the whole-genome excluding the masked regions.

Atlantic cod

  • Reference genome: we downloaded the latest reference genome of the species from Tørresen et al., 2017

  • Areas of exclusion: we used the repeat dataset published as part of Tørresen et al., 2017 and downloaded from here. In particular, we used the "gadMor2_annotation_complete.gff" file, where we selected all with "repeatmasker" and "repeatrunner" to identify all the repeats masked by MAKER based on a RepeatModeler library, plus all sequences in RepBase.

  • Regions of interest:

-Inversions: We extracted the positions (beginning and end) of the four known inversions in Atlantic cod that characterize the different ecotypes. We found this information in Barney et al., 2017 and Kirubakaran et al., 2016. Barth et al., 2019 provides a summary of the location of the four inversions (i.e. extracted from the paper: "LG01: positions 9,114,741–26,192,386; LG02: positions 18,609,260–23,660,985; LG07: positions 13,622,710–23,019,113; LG12 positions: 426,531–13,445,150").

-Genes: gene regions within the inversions in Linkage Groups (LG) 01 (Kirubakaran et al., 2016) and LG02, LG07 and LG12 (published in Barney et al., 2017).

  • Points of interest:
  • Random areas: these were designed using the inversion areas (see above) and the rest of the genome excluding the masked areas referred before.

Tiger shark

  • Transcriptome: we used the transcriptome kindly provided by the authors of Swift et al., 2016.

  • Areas of interest:

  • Regions containing genes of interest, selected based on the annotation of the transcriptome in Blast2Go (Conesa et al. 2005);
  • Points of interest:
  • Random areas: baits falling on these areas were designed using contigs larger than 200 bp.