Datasets - nestauk/discovery_utils GitHub Wiki

Datasets that we have included in past and ongoing data-driven horizon scanning projects. Datasets marked with * indicate datasets that we haven't yet used but are planning to in the future.

Dataset name Insight Short description Coverage Size Status
Crunchbase Companies (startups), investments Database on global companies, including investment rounds Global 3.4M+ companies In the pipeline: Updated weekly
Gateway to Research Research funding Data on research projects funded by UKRI UK 100K+ research projects In the pipeline: Updated weekly
UK Parliamentary Debates (Hansard) Policy discourse Records of parliamentary debates in the UK UK Full record of House of Commons and House of Lords debates In the pipeline: To be turned on
NIHR Open Data Research funding Data on medical research projects funded by NIHR UK ~10K research projects Not added to the pipeline
OpenAlex Research publishing A database of academic publications Global 250M+ academic works Not added to the pipeline (work in progress)
Google Patents Patents Database of patent publications from 100+ international patent offices Global 120M+ patents Not added to the pipelline (work in progress)
Media Cloud* Media discourse Global database of news stories across multiple media sources Global 1B+ news stories (headlines or full text) To be explored
360Giving* Philanthropic funding Data on grants funding in the UK UK 1M+ grants To be explored