For many of the analysis services that the RRC cores provide, e.g. Research Informatics Core (RIC) and Genome Research Core (GRC), accurate sample information is required from the researcher in order to determine proper grouping of sample for various analyses. In order to reduce the burden of work on RRC analysts and subsequent cost to the researcher, we would ask that you use the following guidelines when preparing your sample information.

Basic Guidelines

Sample information should be provided via an Excel spreadsheet, comma separated values (CSV), or tab separated values file. Excel is the preferred format.
All spreadsheets should have clear headers on the first row.
First column of any sample information spreadsheet should be the sample ID provided when it was submitted for sequencing or any other data acquisition service. If you are unclear about the ID of the samples, please contact the core to which you submitted your samples or the Research Informatics Core for a sample list for your project.
Sample IDs and group or factor names should start with an alphabetic character (A-Z, a-z) and consist of alphanumeric characters (A-Z, a-z, 0-9) and underscores (_) only. Please avoid using spaces or special characters. If more information is needed to describe the differences between conditions, you can include additional columns as notes with this information.
Do NOT merge or span cells in the spreadsheet.
Do NOT use colors or special formatting, e.g. italics or bold characters, to indicate experimental groups. For most analyses, we will be converting the spreadsheet to a plain text format that will be used by our analysis tools. In that case, all formatting will be lost during conversion.
If you are providing samples from multiple technologies (e.g., RNA-seq and ATAC-seq, or whole transcript RNA-seq and 3' RNA-seq), please indicate these differences in additional columns in the spreadsheet.
Use consistent values, including letter case, when designating a level/group for a factor.
- For example, if providing gender information use M/F or Male/Female, such as the following example:
  
  SampleID Gender
  
  GC_001 M
  
  GC_002 F
  
  GC_003 M
- Please do NOT mix values, such as this example:
  
  SampleID Gender
  
  GC_001 M
  
  GC_002 F
  
  GC_003 Male
If providing numerical values do NOT include non-numerical characters. For example, provide 100 and NOT ~100 or 100 mg.
- Units, if applicable, should be indicated in column headers.
- If you need to indicate number ranges, code your ranges, e.g., high, medium, low, and include a second tab in your Excel file with data definitions, e.g., high = 10-20 mg.
Do NOT combine factors. If a combined comparison is needed, it is easier to combine separate factors than split existing factors.
- For example, provide information like this:
  
  SampleID Gender Treatment
  
  GC_001 Male Control
  
  GC_002 Female Control
  
  GC_003 Male Drug
  
  GC_004 Female Drug
- Instead of this:
  
  SampleID Group
  
  GC_001 Male-Control
  
  GC_002 Female-Control
  
  GC_003 Male-Drug
  
  GC_004 Female-Drug
If subsets of the data should be analyzed separately, if possible please provide a factor indicating the subset rather than separate spreadsheets. For example:

SampleID Set Treatment

GC_001 1 Control

GC_002 1 Drug

GC_003 2 Control

GC_004 2 Drug

SampleID	Gender
GC_001	M
GC_002	F
GC_003	M

SampleID	Gender
GC_001	M
GC_002	F
GC_003	Male

SampleID	Gender	Treatment
GC_001	Male	Control
GC_002	Female	Control
GC_003	Male	Drug
GC_004	Female	Drug

SampleID	Group
GC_001	Male-Control
GC_002	Female-Control
GC_003	Male-Drug
GC_004	Female-Drug

SampleID	Set	Treatment
GC_001	1	Control
GC_002	1	Drug
GC_003	2	Control
GC_004	2	Drug

Providing Genomic Information

Please use the following guidelines when providing genomic coordinates.

Include separate columns for chromosome, start, and end positions.
Include strand, if appropriate
Include important identifiers, such as gene or locus name, in additional columns
If you have a nucleotide sequence include this as well in a separate column.
Please clearly indicate the genome build or accession number for the coordinates, e.g. mm10, hg19 (genome builds), or NC_000913.3 (NCBI accession number).

Chromosome (mm10)	Start	End	Strand	Gene
chr1	1000101	1000150	+	ABCD
chr2	2001010	2002001	+	EGFH
chr3	5010430	5010600	-	IJKL

Providing sample information - uic-ric/uic-ric.github.io GitHub Wiki

Basic Guidelines

Providing Genomic Information

⚠️ GitHub.com Fallback ⚠️

Providing sample information - uic-ric/uic-ric.github.io GitHub Wiki

Basic Guidelines

Providing Genomic Information

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️