Gateway to Research dataset - nestauk/discovery_utils GitHub Wiki

Gateway to Research dataset

Gateway to Research (GtR) is the UKRI portal onto publicly funded research.

We're collecting an updated data snapshot every week.

Raw data

To load the data tables, first initialise the GtR data getter class

from getters import gtr

# Initialise class to access the most recent data version
Gtr = gtr.GtrGetter()

The getters provide access to four raw data tables:

  • projects: Titles and abstracts of projects funded by the UKRI
  • funds: Data about the amount of funding for each project
  • organisations: List of organisations linked to the projects
  • persons: List of people linked to the projects

In the following we provide simple schemas, ie column names and descriptions. Note that some columns contain no information (likely an API artefact) but their names are kept for the sake of completeness.

Projects

Gtr.projects

Information about 150,000+ projects funded by the UKRI (as of autumn 2024).

Column Description
links Links to other GtR data tables
ext n/a
id Unique database entry id
outcomeid n/a
href API link
created Timestamp indicating when the record was created
updated n/a
identifiers Project reference numbers (can be more than one)
title Title the project
status Current status of the project: 'Active' or 'Closed'
grantCategory Type of grant, such as 'Research Grant', 'Studentship', 'Collaborative R&D' and others
leadFunder Lead funding organisation, usually a research council or Innovate UK
leadOrganisationDepartment Department in the funded organisation leading the project
abstractText Description of the project
techAbstractText Technical description of the project (if available)
potentialImpact Statement describing the potential impact of the project or research (if available)
healthCategories n/a
researchActivities n/a
researchSubjects Research categories
researchTopics Research categories (unclear what's the difference with researchActivities
rcukProgrammes n/a
start n/a
end n/a
participantValues Organisations participating in the project (if available)

Links

Projects are linked to other types of data (via the links field in the table above). The possible relationships and corresponding endpoints (ie, other tables) are listed below.

Relationship Endpoint
FUND funds
COFUND_ORG organisations
COLLAB_ORG organisations
FELLOW_ORG organisations
LEAD_ORG organisations
PARTICIPANT_ORG organisations
COI_PER persons
FELLOW_PER persons
PI_PER persons
PM_PER persons
RESEARCH_COI_PER persons
RESEARCH_PER persons
STUDENT_PER persons
SUPER_PER persons
TGH_PER persons
TRANSFER projects
TRANSFER_FROM projects
STUDENTSHIP projects
STUDENTSHIP_FROM projects
ARTISTIC_AND_CREATIVE_PRODUCT outcomes/artisticandcreativeproducts
COLLABORATION outcomes/collaborations
DISSEMINATION outcomes/disseminations
FURTHER_FUNDING outcomes/furtherfundings
IMPACT_SUMMARY outcomes/impactsummaries
IP outcomes/intellectualproperties
KEY_FINDING outcomes/keyfindings
POLICY outcomes/policyinfluences
PRODUCT outcomes/products
PUBLICATION outcomes/publications
RESEARCH_DATABASE_AND_MODEL outcomes/researchdatabaseandmodels
RESEARCH_MATERIAL outcomes/researchmaterials
SOFTWARE_AND_TECHNICAL_PRODUCT outcomes/softwareandtechnicalproducts
SPIN_OUT outcomes/spinouts

Funds

Gtr.funds

Information about project funding

Column Description
links Links to other GtR data tables
ext n/a
id Unique database entry id
outcomeid n/a
href API link
created Timestamp indicating when the record was created
updated n/a
start Timestamp when the funding period started
end Timestamp when the funding period ended
valuePounds Amount of funding
category Funding category, can be one of three values: INCOME_ACTUAL, EXPENDITURE_ACTUAL and SUPPLEMENTARY_UNIT_AWARDS. Most of the entires are INCOME_ACTUAL
type n/a

Organisations

Gtr.organisations
Column Description
links Links to other GtR data tables such as projects or persons
ext n/a
id Unique database entry id
outcomeid n/a
href API link
created Timestamp indicating when the record was created
updated n/a
name Name of the organisation
regNumber n/a
website n/a
addresses Address of the organisation (if available)

Persons

Gtr.persons
Column Description
links Links to other GtR data tables such as projects and organisations
ext n/a
id Unique database entry id
outcomeid n/a
href API link
created Timestamp indicating when the record was created
updated n/a
firstName First name
otherNames Other names
surname Last name
email n/a
orcidId ORCID if available

Linked data

The getters class also does some data wrangling to provide access to useful linked data:

  • projects_funds: GTR.projects table with added start, end, funds_id, funds_category, currencyCodeandamount`
  • projects_persons: Project id and title; persons_rel, which indicates the relationship between the project and the person, and all other fields from GTR.persons table
  • projects_organisations: Project id and title; organisations_rel, which indicates the relationship between the project and the person, and all other fields from GTR.organisations table
  • persons_organisations: Joined up GTR.persons and GTR.organisations tables
  • projects_enriched: Same as projects_funds but added url as well