Returning data to NIPH - folkehelseinstituttet/mobagen GitHub Wiki

Table of contents

Background

Contacts and more information

This page is made as an extension to the official Returning data to MoBa page. The link is mainly concerned about returning results from the research itself, while this page is about the genetics-related scans that very few MoBa projects perform at all.

Contact persons etc. are the same regardless of what kind of data is to be returned. However, for genetic data there is a lot of scanner-specific nitty-gritty details that we try to address here. For these details, the official NIPH page actually refers to this one.

Some historical details

The Norwegian Institute of Public Health NIPH manages one of the largest birth cohort studies in the world: The Norwegian Mother, Father and Child Cohort Study, MoBa.

The biological material from MoBa has been used in various research projects over the last decades, for instance in projects performing genetic sequencing.

In the written agreements between NIPH and the research projects there has always been a clause stating that NIPH can obtain a copy of the analysis results. However, due to a legal limitation NIPH was not allowed to receive such data before a legislative change opened for this possibility in 2019. This has created a backlog of genetic data that is being returned and hence this page.

Definitions

In the following

  • Biological material is the batch of biological samples provided by NIPH for use in the research project in question.
  • Lab is the facility that has scanners to produce digital data from biological material.
  • Analysis results are the results from analysis performed by the lab on the biological material.
  • In situations where a research project relies on samples from multiple cohorts, the regulations described below pertains solely to results generated in samples from cohorts controlled by NIPH.

Why we need the analysis results

Biological samples are a scarce resource in any cohort. It is important to use these samples wisely, and to make sure the vast amount of research funding and goodwill from participants’ results in the best possible research output. Consequently, reusing analysis results from previous research projects instead of redoing the same set of analyses spending more of this scarce resource is a necessity. NIPH aims to make results from analyses from NIPH controlled cohorts available to the research community as quickly as possible after result from the analysis is ready. In order to achieve this goal, with limited available staff, we are forced to set some strict rules for retrieving data, as described below.

  1. Any project with analysis results from biological samples in any NIPH controlled cohort are required to transfer a copy of the raw/unedited data as described below.
  2. All samples delivered to the project from the biobank have to be accounted for in the returned results. If a sample for some reason does not have any associated analysis results, the reason for this need to be listed in an associated document.
  3. Documentation describing details of the performed analysis.

The research project will receive the required connection files to link biological samples with phenotypes in their project after the data transfer is complete and content controlled by one of our data handlers. We understand that projects are eager to get started with their analyses which is why we prioritize this task.

Will my data be available to everybody?

Analysis results transferred back to NIPH will be made available to other acknowledged research projects as soon as possible. While NIPH acknowledges the time and resources associated with generating new data, efficient sharing of data is beneficial for all projects long term. This is exemplified by the 100.000+ genotyped samples in MoBa now available to the research community a result of several contributing projects over the last decade.

Before retrieved data is made generally available to other research projects, proper quality control (QC) and accompanying documentation is necessary. This is crucial to reduce the risk of errors in downstream analyses and to ensure easily available information for users with varying technical skills. (In certain cases, raw data (the original scan) might be available before a full MoBa Genetics QC has been done)

Sharing data

Any data sharing between two projects have to go through NIPH. Direct sharing between two projects is forbidden. All projects with the required approvals will obtain data directly from NIPH. There is a small exception concerning sharing of QC-data.

What we need from you

First, please make sure we do not receive analysis results from other samples than the ones you received from NIPH. In situations where projects include samples from multiple sources, make sure only results from the samples sent from the NIPH biobank is returned to NIPH. NIPH is not allowed to receive data on samples not in cohorts controlled by the institution.

The information below is structured in two sections: Information about your project, and data you will receive from the lab you use to analyze the data (GWAS/EWAS, Telomere etc). You should tell the lab in advance, what is expected from them.

Project information

General information about your project:

  • Project name
  • Contact person for feedback concerning data-delivery
  • NIPH project number (PDBxxxx)
  • NIPH biobank retrieval/batch number you received with the samples (nnnn)

Info to order/get from the lab

Different analyses necessarily produce different output. It is generally difficult to list every single piece of information in a detailed list. As a general rule NIPH want a copy of any information provided by the lab in an unedited form.

Please send the data provided by the lab in a strongly encrypted archive (.zip, .7z, .gzip, .gpg) after having checked that they contain the necessary information(see below).

See How to send us data concerning encryption.

If the lab run their own analysis (e.g. estimating sex or poor quality samples), that is of course OK - but it is not a replacement for the raw data. We often see that during such 'enhancing' analysis, certain information get lost (like raw data or plate/location information).

Common info

Much of this could be found in the lab's own QC-report.

  • Information about run date
  • Chip (version and year). If the chip is a custom chip, what extra markers does it have and what standard chip is it based on.
  • A link to the chip’s documentation, typically a manifest file
  • Samplesheet(s) that at minimum maps chip-positions to the sample-id you got from the NIPH biobank (typically 6-digit numbers)
  • Analyzed data for all samples. If you got 1222 samples, we will also need data about exactly 1222 samples. Sometime samples are of poor quality or tests failed for some reason. This is OK, but we still need a description of which samples could not be analyzed and why.

The sections typical data needed for different analysis types.

snpArray/GWAS details

In case of using different chips, don't mix the data. Note that the term GWAS is used loosely below - since these are really raw-data suitable for GWAS analysis rather then the GWAS itself.

  • Raw data files (.idat-files) that themselves can be identified by the chip-positions. These will normally have some companion files (like .xml files).
  • Plink-files that corresponds to the idat-files
  • Clusterinfo (.egt file) used to generate plink-files
  • Plate Information: What samples were on what plate (usually found on the samplesheet)

Methylation (EWAS) details

  • Raw data files (.idat-files) that themselves can be identified by the chip-positions. These will normally have some companion files (.xml or .jpg files).
  • Plate Information: What samples were on what plate (usually found on the samplesheet)

How to send us data

Password/pass-phrase

As stated before, the archive/compressed file must be encrypted with a strong passphrase before you send them to us. (The pass-phrase should be at least 20 characters, typos are welcomed …)

  • If the files came pre-packed and strongly encrypted from the lab, all is well
  • If not, just use a software like 7zip to make an archive. 7zip works on most platforms and let you encrypt the data as a bonus. You can also use zip, gpg and most standard encryption methods.
  • If you are using uncommon/licensed encryption software: Check before sending that we actually will be able to de-crypt.

The pass-phrase should be sent to us using a secure channel. Email is not secure!

We suggest:

  • Signal Can be downloaded for desktop and PC. It has many advantages like being free/open source, encrypts messages and has the option for messages to time out
  • Voice (direct conversation or phone)
  • SMS

Contact us to get signal/phone number. If you want, we can initiate the SMS/Signal connection (and create the password)

File transfer

Our prefered solution is that we send you a link so you can deliver data directly to our secure server. The link will work unless your data already is on TSD. If so, see next section.

Upload voucher

We can provide you with a voucher/link that lets you upload data. Remember to make an encrypted archive/zip file (see See How to send us data concerning encryption.)

Please let us know rougly what the size of the file you will upload, some of these file can be really big - and we need to order more storage if you file is huge (>1TB).

TSD

If you allready have your data on TSD you have at least 3 choices:

  1. Contact TSD and make them transfer files to p229. Note that doing so will not by itself give other projects access to the data. p229 and p229mobagenetics/p229methylation are different directories - p229 is a pure administrative area where nobody has access.
  2. Include Gutorm Høgåsen as (temporary) 'associated member' of the project. You can then use the publication portal to share data with us, passing us a link. Note that if you use the publication portal for anything else, make sure that the password is extra strong and that the data is removed as soon as we acknowledge the data.
  3. Use the TSD project sharing functionality. This might be overkill if you want to do just one transfer, and you might get charged for disk-space.

Download voucher

If you have access to a service that lets us download data, we are happy to do that. Just make sure to give us necessary credentials/links.