Gateway to Research dataset - nestauk/discovery_utils GitHub Wiki
Gateway to Research dataset
Gateway to Research (GtR) is the UKRI portal onto publicly funded research.
We're collecting an updated data snapshot every week.
Raw data
To load the data tables, first initialise the GtR data getter class
from getters import gtr
# Initialise class to access the most recent data version
Gtr = gtr.GtrGetter()
The getters provide access to four raw data tables:
- projects: Titles and abstracts of projects funded by the UKRI
- funds: Data about the amount of funding for each project
- organisations: List of organisations linked to the projects
- persons: List of people linked to the projects
In the following we provide simple schemas, ie column names and descriptions. Note that some columns contain no information (likely an API artefact) but their names are kept for the sake of completeness.
Projects
Gtr.projects
Information about 150,000+ projects funded by the UKRI (as of autumn 2024).
Column | Description |
---|---|
links |
Links to other GtR data tables |
ext |
n/a |
id |
Unique database entry id |
outcomeid |
n/a |
href |
API link |
created |
Timestamp indicating when the record was created |
updated |
n/a |
identifiers |
Project reference numbers (can be more than one) |
title |
Title the project |
status |
Current status of the project: 'Active' or 'Closed' |
grantCategory |
Type of grant, such as 'Research Grant', 'Studentship', 'Collaborative R&D' and others |
leadFunder |
Lead funding organisation, usually a research council or Innovate UK |
leadOrganisationDepartment |
Department in the funded organisation leading the project |
abstractText |
Description of the project |
techAbstractText |
Technical description of the project (if available) |
potentialImpact |
Statement describing the potential impact of the project or research (if available) |
healthCategories |
n/a |
researchActivities |
n/a |
researchSubjects |
Research categories |
researchTopics |
Research categories (unclear what's the difference with researchActivities |
rcukProgrammes |
n/a |
start |
n/a |
end |
n/a |
participantValues |
Organisations participating in the project (if available) |
Links
Projects are linked to other types of data (via the links
field in the table above). The possible relationships and corresponding endpoints (ie, other tables) are listed below.
Relationship | Endpoint |
---|---|
FUND | funds |
COFUND_ORG | organisations |
COLLAB_ORG | organisations |
FELLOW_ORG | organisations |
LEAD_ORG | organisations |
PARTICIPANT_ORG | organisations |
COI_PER | persons |
FELLOW_PER | persons |
PI_PER | persons |
PM_PER | persons |
RESEARCH_COI_PER | persons |
RESEARCH_PER | persons |
STUDENT_PER | persons |
SUPER_PER | persons |
TGH_PER | persons |
TRANSFER | projects |
TRANSFER_FROM | projects |
STUDENTSHIP | projects |
STUDENTSHIP_FROM | projects |
ARTISTIC_AND_CREATIVE_PRODUCT | outcomes/artisticandcreativeproducts |
COLLABORATION | outcomes/collaborations |
DISSEMINATION | outcomes/disseminations |
FURTHER_FUNDING | outcomes/furtherfundings |
IMPACT_SUMMARY | outcomes/impactsummaries |
IP | outcomes/intellectualproperties |
KEY_FINDING | outcomes/keyfindings |
POLICY | outcomes/policyinfluences |
PRODUCT | outcomes/products |
PUBLICATION | outcomes/publications |
RESEARCH_DATABASE_AND_MODEL | outcomes/researchdatabaseandmodels |
RESEARCH_MATERIAL | outcomes/researchmaterials |
SOFTWARE_AND_TECHNICAL_PRODUCT | outcomes/softwareandtechnicalproducts |
SPIN_OUT | outcomes/spinouts |
Funds
Gtr.funds
Information about project funding
Column | Description |
---|---|
links |
Links to other GtR data tables |
ext |
n/a |
id |
Unique database entry id |
outcomeid |
n/a |
href |
API link |
created |
Timestamp indicating when the record was created |
updated |
n/a |
start |
Timestamp when the funding period started |
end |
Timestamp when the funding period ended |
valuePounds |
Amount of funding |
category |
Funding category, can be one of three values: INCOME_ACTUAL, EXPENDITURE_ACTUAL and SUPPLEMENTARY_UNIT_AWARDS. Most of the entires are INCOME_ACTUAL |
type |
n/a |
Organisations
Gtr.organisations
Column | Description |
---|---|
links |
Links to other GtR data tables such as projects or persons |
ext |
n/a |
id |
Unique database entry id |
outcomeid |
n/a |
href |
API link |
created |
Timestamp indicating when the record was created |
updated |
n/a |
name |
Name of the organisation |
regNumber |
n/a |
website |
n/a |
addresses |
Address of the organisation (if available) |
Persons
Gtr.persons
Column | Description |
---|---|
links |
Links to other GtR data tables such as projects and organisations |
ext |
n/a |
id |
Unique database entry id |
outcomeid |
n/a |
href |
API link |
created |
Timestamp indicating when the record was created |
updated |
n/a |
firstName |
First name |
otherNames |
Other names |
surname |
Last name |
email |
n/a |
orcidId |
ORCID if available |
Linked data
The getters class also does some data wrangling to provide access to useful linked data:
- projects_funds: GTR.projects table with added
start
, end,
funds_id,
funds_category,
currencyCodeand
amount` - projects_persons: Project
id
andtitle
;persons_rel
, which indicates the relationship between the project and the person, and all other fields from GTR.persons table - projects_organisations: Project
id
andtitle
;organisations_rel
, which indicates the relationship between the project and the person, and all other fields from GTR.organisations table - persons_organisations: Joined up GTR.persons and GTR.organisations tables
- projects_enriched: Same as
projects_funds
but addedurl
as well