Enter neutralization data (plasma or serum samples) - Jefflier/covid-drdb-payload GitHub Wiki

For plasma or serum, the samples were obtained from study subjects before or after SARS-CoV-2 infection or after immunization with any kind of vaccine. The neutralization titers are the most important data. Other metadata are also required because they reflect the subject conditions of plasma neutralization. The subject history for example contains infection variant, vaccine, dosage, disease severity, age, infection or immunization date, etc.

Steps

  1. Read the paper and annotate key information
  2. Extract neutralization data
  1. Format data
  1. Submit data

Read the paper and annotate key information

Please pay attention to the information below in Abstract, Methods and materials, Result, or supplementary content:

  • Paper metadata
    • first author's name
    • DOI, for publications without DOI, please provide the URL
    • year of publication
  • Neutralizing potency
    • the most important data is 50% neutralization concentration (IC50)
    • we also record IC80, IC90... etc if available
  • Neutralization assay
    • Variants and mutations of SARS-CoV-2 used in the assay
    • assay type, for example live virus, pseudovirus
    • assay procedure and details
  • infection metadata
    • infection date
    • infection species
    • infection SARS-CoV-2 variant (PANGOLIN or WHO name)
  • immunization metadata
    • vaccine name
    • shot number, 1st shot, 2nd shot, 1st booster shot, 2nd booster shot
    • vaccination date

Extract neutralization data

Heterogeneous data represetation

Different papers are using different methods to represent the neutralization titers. Some of them are easy for extracting, others need more work. Also, some papers don't provide neutralization titers but fold change only. The neutralization titers are considered finer than fold change data.

  • Table of plasma conditions, titer, and variants
  • dose-response curve of each plasma
  • neutralization titer figure with paired points
  • neutralization titer figure with unpaired points
  • neutralization titer figure with average values (geomean, mean, etc), we call it aggregate data
  • neutralization titer from the body text
  • neutralization titer fold change between control and test variants

Despite different representations, the key information is the same:

  • plasma and the subject exposure metadata
  • control and test variants and their mutations
  • neutralization assay type
  • neutralization titer values
  • LLOQ (lower limit of quantification)

Granularity

In general, there are four levels of granularity for neutralization titers:

Aggregation \ Pairing Yes No
No I II
Yes III IV

The "aggregation" means the convalescent plasmas (CP) or vaccinee plasmas (VP) titer value is the average value from subjects sharing the same exposure conditions. An example of aggregated data is the study only reports the geometric mean titer (GMT).

The "pairing" means the plasma test on different variants can be linked to the same subject. An example of unpaired data is the study provided a figure with individual points, but the points are not linked by lines, so we can't know if two points tested on different variants are the result of the same plasma sample.

Important note

The level I (no aggregation paired data) granularity provides the most detailed data and it is the most preferred data form. If level I is not achievable, level II (no aggregation unpaired data) or level III (aggregated data paired data) is acceptable although level II is still preferred over level III. Level IV is designated for data set where only fold changes are available. Unlike level I, II, and III data, which can be deposited into the rx_potency table, the level IV data can be only deposited into the rx_fold table.

Comment: Some papers may repeatedly report the same data using different representations. To reduce the duplication, we require to enter the data with finer granularity.

Source data files

Some journals published source data files with the paper, please try to find the source data files first. Some papers may provide raw data in supplementary tables, please download and check supplementary materials before entering neutralization data.

Figures

Most of the papers provide figures plotted with titer data points. Those individual data points can be extracted using image processing software and converted back to titer data. Extracting from figures is difficult. Before diving into the figures, try to find source data files (usually in Excel file form) or tables.

Summary tables

Summary tables regularly report the aggregated data.

A note on figures

Given the variety of data-processing and data-visualization programs available to researchers, the figures in papers differ in their style. The format of a figure can be rasterized or vectorized, which results in the need for data extraction techniques particular to each paper.

For rasterized figures, you can extract data using image editors (Adobe Illustrator, Adobe Photoshop). First, mark the points and measure the x-y coordinates. Second, use a formula to calculate actually IC50 values. You can also use the same method to get ULOQ from the figure if provided.

For vectorized figures, you can use the same method as rasterized figures. You can also find individual pars of each point in the layers panel, which will save you time to measure the x-y coordinates. You can also write scripts for image editors to measure and calculate data automatically.

Figures with paired points

It is common for papers to use lines to link/pair data points from the same plasma sample. Please use numeric suffixes like _1, _2 to distinguish different samples from the same figure, or add the section name like _figure1A_, _Fig2B_ to distinguish plasmas from different figures.

Some publications may overlay the paired points data with a box-plot. We prefer the paired points if you can measure the value, but if it's hard to distinguish points, you can enter the average value (geomean, mean, etc) in the box plot. Please ignore the confidence interval and p-values.

Dose-response curve

Please pay attention to the value at 50% neutralization. If the curve doesn't cross the 50% neutralization, that means the neutralization was not detected in the assay, please use ULOQ as the neutralization potency data.

Estimate date

For infection, if the patient was infected by the original variant or B.1 variant, the estimated date could be before 2020-09-30. For the Alpha variant and Beta variant, it's about between 2020-12-01 and 2021-04-01. For the Delta variant, it's about 2021-07-01. For the Omicron BA.1 variant, it's about between 2021-12-01 and 2022-02-28. For the Omicron BA.2 variant, it's about after 2022-03-01. This rule will be used if the paper didn't report the accurate date, the estimated date, the average date, the date range, or even the month. Because of the overlap of each wave, it's not possible to estimate the date correctly. You can also estimate the variant by infection date reported in the paper, if there's uncertainty please use Unknown variant. For animal model studies, the infection date can be close to the publication date.

For vaccination, because most of the vaccines were approved for emergency use around Jan 2021, We can assume the first shot was between 2021-01-01 and 2021-05-01. The second shot was about 1 month later. The first boost shot was about 6 months after the second shot. This rule will be used if the paper didn't report the accurate date, the estimated date, the average date, the date range, or even the month.

For breakthrough infection, some papers report average dates between each exposure, you can use them to estimate the dates of infection and vaccination.

Surrogate neutralization test

The surrogate neutralization test (sVNT) data are not neutralization titer but the percentage of neutralization. Normally, it tests on a fixed titer, for example 1:20, and compares the neutralization potency of different plasma samples or compares the difference against different variants. In this situation, when filling the rx_potecy table, the potency_type is NC, if the fixed titer is known it would be NCxx, for example NC20. The potency is the percentage value, the potency_unit is percent.

Format data

Please use this Excel template specifically for plasma to format the data.

In this section, we describe each table and its columns. The primary key or joint primary keys of a table are highlighted in bold.

Metadata tables

Common tables

Please read Enter neutralization data (metadata tables)

Neutralization potency tables

⚠️Note Please note that if there is potency(GMT), then you do not need to enter rx_fold table!!!

rx_potency

Column name Description Format Default Comment
ref_name RefID enter the 'ref_name' in the 'articles' table
rx_name freetext to describe plasma, you can provide infected variant name, vaccine name, dosages, etc to distinguish different plasmas
iso_name iso_name of tested virus, the name should be in isolates table
section Figure, table, supplementary content or paragraph number from where the data are extracted
assay_name Must be a value from the assay_name column in the 'assay' tables
potency_type NT50
potency Neutralization titer
cumulative_count number of data points share the same value
potency_upper_limit ULOQ (upper limit of quantification)
potency_lower_limit LLOQ (lower limit of quantification)
potency_unit NULL
date_added YYYY-MM-DD

rx_fold

Column name Description Format Default Comment
ref_name RefID
rx_name freetext to describe plasma, you can provide infected variant name, vaccine name, dosages, etc to distinguish different plasmas
control_iso_name iso_name of control virus, the name should be in isolates table
iso_name iso_name of test virus, the name should be in isolates table
section Figure, table, supplementary content or paragraph number from where the data are extracted
assay_name Must be a value from the assay_name column in the assays.csv table
potency_type NT50
fold_cmp IF test NT50 < LLOQ then use ">", else use "="
fold Fold change (control NT50 / test NT50)
resistance_level NULL
ineffective NULL
cumulative_count number of data points share the same value
date_added YYYY-MM-DD

ref_isolate_pairs

This table records which isolate is the control, and which isolate is the test. This table is used with the rx_potency table, if all data are in the rx_fold table, this table can be ignored.

Column name Description Format Default Comment
ref_name RefID
control_iso_name control iso_name from rx_potency table Most of time it's wild type virus or virus with D614G mutation
iso_name test iso_name from rx_potency table

subject_plasma

The concept of subject_name represents an individual or a group of people sharing the same infection or immunization (exposure) history

Column name Description Format Default Comment
ref_name RefID
subject_name Unique identifier to represent subjects sharing same exposure history Freetext
rx_name rx_name in rx_potency table
collection_date_cmp If the accurate plasma isolation dates are reported in the paper then use '=', else use '~' '~'
collection_date YYYY-MM-DD
location Country name where the plasma was isolated
cumulative_group Same as subject_name
section Figure, table, supplementary content or paragraph number from where the data are extracted

subject_infections

Column name Description Format Default Comment
ref_name RefID
subject_name subject_name from subject_plasma table
infection_date_cmp If the accurate infection dates are reported in the paper then use '=', else use '~'
infection_date YYYY-MM-DD
infected_var_name var_name from variants table If the infection variant name is not reported, use Unknown vaccine
location Country name
immune_status NULL
severity Mild, Moderate, Hospitalized, Non-Hospitalized
section Figure, table, supplementary content or paragraph number from where the data are extracted

subject_vaccines

Column name Description Format Default Comment
ref_name RefID
subject_name subject_name from subject_plasma table
vaccination_date_cmp If the accurate immunization dates are reported in the paper then use '=', else use '~'
vaccination_date YYYY-MM-DD
vaccine_name vaccine_name from vaccines table If the vaccine name is not reported, use Unknown vaccine
dosage 1st shot as 1, 2nd shot as 2, booster shot as 3, etc Integer
location Country name
section Figure, table, supplementary content or paragraph number from where the data are extracted

Submit data

If you're not familiar with programming, please skip this step and save the Excel file to the issue page. Please also mention to the admin the data file is ready to use. We will convert the data file into the database-friendly format, and check the consistency.

Please see how to submit the data in Enter neutralization data (submit data)