Sharing data that has already been uploaded to NCBI's Sequence Read Archives - MariaAlvBla/NCBI-Tutorial GitHub Wiki


Once data is archived, NCBI's SRA creates a summary table of the sample's metadata (e.g., the sample's geographic location and sample type) and technical metadata (e.g., the sequencer used) for all samples in a BioProject or BioSample. There are no standard nomenclatures for many of these fields, so integrating metadata from different projects is difficult. For example, the word amplicon can be written as amplicon, Amplicon, or AMPLICON, and this creates a problem when combining several projects where the meaning of these values is identical, but they are named differently, hindering the large scale analysis of genetic data.

For the Datathon, we have created a custom table with fixed nomenclatures for categories essential for large-scale analysis. Following this standard, nomenclature facilitates researchers' sharing and combining of data. *This tutorial will guide users on accessing and transforming existing NCBI SRA metadata tables to this standard nomenclature.


Table of contents


  1. Accessing the metadata associated with a BioProject
    1. By searching the accession code
    2. By tracking it in "my submissions"
  2. Downloading the summary table of a BioProject or a BioSample
  3. Transfering metadata to the Datathon's customized table
    1. Opening NCBI's summary table
    2. Filling in the customized table


Accessing the metadata associated with a BioProject


To access the genetic data of either an entire BioProject or a specific BioSample within a BioProject, one must first access the BioProject containing the data. From the BioProject, one can choose to access all the Experiments of either the BioProject or the BioSample. This will be further explained below. If you already know how to access *Experiments jump to downloading the summary table).

Accessing the data associated with a BioProject can be achieved in two ways: 1) if you remember the accession code and 2) if you don't remember it. Please select the option that better suits your case.

By searching the accession code


To find the public display of the genetic data of your BioProject, follow the next steps:


  1. Access NCBI's homepage and write the accession code of the BioProject (PRJNA#) you are looking for.




  1. Click on the correct BioProject in the displayed list.




  1. The BioProject's public information will be displayed. To find the genetic information related to the BioProject, click the number on the right of the SRA Experiments



Jump to Downloading the BioProject's summary table.

By tracking it in "my submissions"


To find the public display of the genetic data of your BioProject, follow the next steps:


  1. While being logged in, go to NCBI's homepage.


  1. Click Submit




  1. At the Submission Portal, click on My submissions.




  1. Find the submission with the title corresponding to the BioProject you are looking for and click the accession code (PRJNA#).




  1. The BioProject public information will be displayed. To find the genetic information related to the BioProject, click the number on the right of the SRA Experiments



Jump to Downloading the BioProject's summary table


Downloading the summary table of a BioProject or a BioSample


Follow these steps to access and download the summary table of the attribute and metadata information of all the genetic data included in a BioProject or BioSample.


  1. Click on the name of any of the Experiments displayed after accessing the SRA Experiments in the previous section.




  1. At the public display of the Experiment, click All Runs for either the BioProject or the BioSample you are interested in. If you select the Runs of the BioProject or BioSample, the summary table will include the information of the selected Experiment and any other experiments included in the same BioProject or BioSample.




  1. To download all the runs contained in the BioProject or BioSample, click on Metadata at the Total option.




Alternatively, you can download specific Runs by checking the boxes corresponding to the runs you want to download and clicking on Metadata at the Selected option.




Transfering metadata to the Datathon's customized table


After downloading NCBI's summary table, it must be opened with specific settings to display its information correctly. After opening it, some of its data needs to be transferred to a customized table created for the Datathon to ensure that all shared data has a consistent format.


Opening NCBI's summary table


If you followed the steps in the previous section, you would now have a text file (.txt) with the name SraRunTable. Instead of opening it with a text processor, open it with software such as Open Office Calc or Excel.

When using Open Office Calc, you need to check the settings of the Separator Options to avoid shifting the column's values.

In the following picture, the settings are wrong, and the columns with values with spaces on them have been separated and shifted to the right. Instead of University of Copenhagen in one cell, it has been split into three columns:



After you corrected the separation options, which will depend on the import's language, the columns with values that had spaces on them shouldn't be separated anymore. In our example, the university's full name is now in one cell.



Filling in the customized table


Now you can download and open the following custom table:

Already_uploaded_Data_Dataton.xlsm *This table contains Excel Macros, and they should be manually habilitated. For that, before opening the table, open the file's Properties, and in the General tab, tick the Unblock box at the Security section.


This table includes the names of selected columns from NCBI's summary table. Please transfer only the information from these selected columns, and contact the MICODA team if you would like to add more columns to the file. Makes sure you have carefully read the instructions provided in the table before filling in the data, and keep in mind that some of the columns have drop-down menus.

If you want to include your data in MiCoDA's database, please send the version of Already_uploaded_Data_Dataton.xlsm, that you submitted to NCBI, to [email protected]. When sending it, please change the file's name to include the last name of the first three authors of the data in the following manner "Last name author 1_Last name author 2_Last name author 3_Already_uploaded_Data_Dataton.xlsm", and include in the email the full name of each author and their contact emails.

Note: If you want to share the information of several BioProjects, please fill in a separate table for each BioProject.

⚠️ **GitHub.com Fallback** ⚠️