Enter neutralization data (plasma or serum samples) - Jefflier/covid-drdb-payload GitHub Wiki
For plasma or serum, the samples were obtained from study subjects before or after SARS-CoV-2 infection or after immunization with any kind of vaccine. The neutralization titers are the most important data. Other metadata are also required because they reflect the subject conditions of plasma neutralization. The subject history for example contains infection variant, vaccine, dosage, disease severity, age, infection or immunization date, etc.
Steps
- 2.1 Heterogeneous data represetation
- 2.2 A note on figures
- 2.3 Figures with paired points
- 2.4 Dose-response curve
- 2.5 Estimate date
- 2.6 Surrogate neutralization test
Read the paper and annotate key information
Please pay attention to the information below in Abstract
, Methods and materials
, Result
, or supplementary content:
- Paper metadata
- first author's name
DOI
, for publications without DOI, please provide the URLyear
of publication
- Neutralizing potency
- the most important data is
50% neutralization concentration (IC50)
- we also record IC80, IC90... etc if available
- the most important data is
- Neutralization assay
- Variants and mutations of SARS-CoV-2 used in the assay
- assay type, for example
live virus
,pseudovirus
- assay procedure and details
- infection metadata
- infection date
- infection species
- infection SARS-CoV-2 variant (PANGOLIN or WHO name)
- immunization metadata
- vaccine name
- shot number, 1st shot, 2nd shot, 1st booster shot, 2nd booster shot
- vaccination date
Extract neutralization data
Heterogeneous data represetation
Different papers are using different methods to represent the neutralization titers. Some of them are easy for extracting, others need more work. Also, some papers don't provide neutralization titers but fold change only. The neutralization titers are considered finer than fold change data.
- Table of plasma conditions, titer, and variants
- dose-response curve of each plasma
- neutralization titer figure with paired points
- neutralization titer figure with unpaired points
- neutralization titer figure with average values (geomean, mean, etc), we call it
aggregate data
- neutralization titer from the body text
- neutralization titer fold change between control and test variants
Despite different representations, the key information is the same:
- plasma and the subject exposure metadata
- control and test variants and their mutations
- neutralization assay type
- neutralization titer values
- LLOQ (lower limit of quantification)
Granularity
In general, there are four levels of granularity for neutralization titers:
Aggregation \ Pairing | Yes | No |
---|---|---|
No | I | II |
Yes | III | IV |
The "aggregation" means the convalescent plasmas (CP) or vaccinee plasmas (VP) titer value is the average value from subjects sharing the same exposure conditions. An example of aggregated data is the study only reports the geometric mean titer (GMT).
The "pairing" means the plasma test on different variants can be linked to the same subject. An example of unpaired data is the study provided a figure with individual points, but the points are not linked by lines, so we can't know if two points tested on different variants are the result of the same plasma sample.
Important note
The level I (no aggregation paired data) granularity provides the most detailed data and it is the most preferred data form. If level I is not achievable, level II (no aggregation unpaired data) or level III (aggregated data paired data) is acceptable although level II is still preferred over level III. Level IV is designated for data set where only fold changes are available. Unlike level I, II, and III data, which can be deposited into the rx_potency
table, the level IV data can be only deposited into the rx_fold
table.
Comment: Some papers may repeatedly report the same data using different representations. To reduce the duplication, we require to enter the data with finer granularity.
Source data files
Some journals published source data files with the paper, please try to find the source data files first. Some papers may provide raw data in supplementary tables, please download and check supplementary materials before entering neutralization data.
Figures
Most of the papers provide figures plotted with titer data points. Those individual data points can be extracted using image processing software and converted back to titer data. Extracting from figures is difficult. Before diving into the figures, try to find source data files (usually in Excel file form) or tables.
Summary tables
Summary tables regularly report the aggregated data.
A note on figures
Given the variety of data-processing and data-visualization programs available to researchers, the figures in papers differ in their style. The format of a figure can be rasterized or vectorized, which results in the need for data extraction techniques particular to each paper.
For rasterized figures, you can extract data using image editors (Adobe Illustrator, Adobe Photoshop). First, mark the points and measure the x-y coordinates. Second, use a formula to calculate actually IC50 values. You can also use the same method to get ULOQ from the figure if provided.
For vectorized figures, you can use the same method as rasterized figures. You can also find individual pars of each point in the layers
panel, which will save you time to measure the x-y coordinates. You can also write scripts for image editors to measure and calculate data automatically.
Figures with paired points
It is common for papers to use lines to link/pair
data points from the same plasma sample. Please use numeric suffixes like _1
, _2
to distinguish different samples from the same figure, or add the section name like _figure1A_
, _Fig2B_
to distinguish plasmas from different figures.
Some publications may overlay the paired points data with a box-plot. We prefer the paired points if you can measure the value, but if it's hard to distinguish points, you can enter the average value (geomean, mean, etc) in the box plot. Please ignore the confidence interval and p-values.
Dose-response curve
Please pay attention to the value at 50% neutralization. If the curve doesn't cross the 50% neutralization, that means the neutralization was not detected in the assay, please use ULOQ as the neutralization potency data.
Estimate date
For infection, if the patient was infected by the original variant or B.1 variant, the estimated date could be before 2020-09-30. For the Alpha variant and Beta variant, it's about between 2020-12-01 and 2021-04-01. For the Delta variant, it's about 2021-07-01. For the Omicron BA.1 variant, it's about between 2021-12-01 and 2022-02-28. For the Omicron BA.2 variant, it's about after 2022-03-01. This rule will be used if the paper didn't report the accurate date, the estimated date, the average date, the date range, or even the month. Because of the overlap of each wave, it's not possible to estimate the date correctly. You can also estimate the variant by infection date reported in the paper, if there's uncertainty please use Unknown variant
. For animal model studies, the infection date can be close to the publication date.
For vaccination, because most of the vaccines were approved for emergency use around Jan 2021, We can assume the first shot was between 2021-01-01 and 2021-05-01. The second shot was about 1 month later. The first boost shot was about 6 months after the second shot. This rule will be used if the paper didn't report the accurate date, the estimated date, the average date, the date range, or even the month.
For breakthrough infection, some papers report average dates between each exposure, you can use them to estimate the dates of infection and vaccination.
Surrogate neutralization test
The surrogate neutralization test (sVNT) data are not neutralization titer but the percentage of neutralization. Normally, it tests on a fixed titer, for example 1:20, and compares the neutralization potency of different plasma samples or compares the difference against different variants. In this situation, when filling the rx_potecy
table, the potency_type
is NC
, if the fixed titer is known it would be NCxx
, for example NC20
. The potency
is the percentage value, the potency_unit
is percent
.
Format data
Please use this Excel template specifically for plasma to format the data.
In this section, we describe each table and its columns. The primary key or joint primary keys of a table are highlighted in bold.
Metadata tables
Common tables
Please read Enter neutralization data (metadata tables)
Neutralization potency tables
⚠️Note Please note that if there is potency(GMT), then you do not need to enter rx_fold table!!!
rx_potency
Column name | Description | Format | Default | Comment |
---|---|---|---|---|
ref_name | RefID | enter the 'ref_name' in the 'articles' table | ||
rx_name | freetext to describe plasma, you can provide infected variant name, vaccine name, dosages, etc to distinguish different plasmas | |||
iso_name | iso_name of tested virus, the name should be in isolates table |
|||
section | Figure, table, supplementary content or paragraph number from where the data are extracted | |||
assay_name | Must be a value from the assay_name column in the 'assay' tables |
|||
potency_type | NT50 | |||
potency | Neutralization titer | |||
cumulative_count | number of data points share the same value | |||
potency_upper_limit | ULOQ (upper limit of quantification) | |||
potency_lower_limit | LLOQ (lower limit of quantification) | |||
potency_unit | NULL |
|||
date_added | YYYY-MM-DD |
rx_fold
Column name | Description | Format | Default | Comment |
---|---|---|---|---|
ref_name | RefID | |||
rx_name | freetext to describe plasma, you can provide infected variant name, vaccine name, dosages, etc to distinguish different plasmas | |||
control_iso_name | iso_name of control virus, the name should be in isolates table |
|||
iso_name | iso_name of test virus, the name should be in isolates table |
|||
section | Figure, table, supplementary content or paragraph number from where the data are extracted | |||
assay_name | Must be a value from the assay_name column in the assays.csv table |
|||
potency_type | NT50 | |||
fold_cmp | IF test NT50 < LLOQ then use ">", else use "=" | |||
fold | Fold change (control NT50 / test NT50) | |||
resistance_level | NULL |
|||
ineffective | NULL |
|||
cumulative_count | number of data points share the same value | |||
date_added | YYYY-MM-DD |
ref_isolate_pairs
This table records which isolate is the control, and which isolate is the test. This table is used with the rx_potency
table, if all data are in the rx_fold
table, this table can be ignored.
Column name | Description | Format | Default | Comment |
---|---|---|---|---|
ref_name | RefID | |||
control_iso_name | control iso_name from rx_potency table |
Most of time it's wild type virus or virus with D614G mutation | ||
iso_name | test iso_name from rx_potency table |
subject_plasma
The concept of subject_name
represents an individual or a group of people sharing the same infection or immunization (exposure) history
Column name | Description | Format | Default | Comment |
---|---|---|---|---|
ref_name | RefID | |||
subject_name | Unique identifier to represent subjects sharing same exposure history | Freetext | ||
rx_name | rx_name in rx_potency table |
|||
collection_date_cmp | If the accurate plasma isolation dates are reported in the paper then use '=', else use '~' | '~' | ||
collection_date | YYYY-MM-DD |
|||
location | Country name where the plasma was isolated | |||
cumulative_group | Same as subject_name |
|||
section | Figure, table, supplementary content or paragraph number from where the data are extracted |
subject_infections
Column name | Description | Format | Default | Comment |
---|---|---|---|---|
ref_name | RefID | |||
subject_name | subject_name from subject_plasma table |
|||
infection_date_cmp | If the accurate infection dates are reported in the paper then use '=', else use '~' | |||
infection_date | YYYY-MM-DD |
|||
infected_var_name | var_name from variants table |
If the infection variant name is not reported, use Unknown vaccine |
||
location | Country name | |||
immune_status | NULL |
|||
severity | Mild, Moderate, Hospitalized, Non-Hospitalized | |||
section | Figure, table, supplementary content or paragraph number from where the data are extracted |
subject_vaccines
Column name | Description | Format | Default | Comment |
---|---|---|---|---|
ref_name | RefID | |||
subject_name | subject_name from subject_plasma table |
|||
vaccination_date_cmp | If the accurate immunization dates are reported in the paper then use '=', else use '~' | |||
vaccination_date | YYYY-MM-DD |
|||
vaccine_name | vaccine_name from vaccines table |
If the vaccine name is not reported, use Unknown vaccine |
||
dosage | 1st shot as 1, 2nd shot as 2, booster shot as 3, etc | Integer | ||
location | Country name | |||
section | Figure, table, supplementary content or paragraph number from where the data are extracted |
Submit data
If you're not familiar with programming, please skip this step and save the Excel file to the issue page. Please also mention to the admin the data file is ready to use. We will convert the data file into the database-friendly format, and check the consistency.
Please see how to submit the data in Enter neutralization data (submit data)