Paperboy - HealthHackAu2013/wiki GitHub Wiki

Search medical publications intelligently

Team name & bio

  • Tan Nguyen - scientist, leader, superstar
  • Ken Pang - scientist
  • Lyndon Maydwell - coder extraordinaire - ([email protected])
  • Fred Michna - GIS guru ([email protected])
  • Lilly Ryan - novice coder, idea bouncer
  • Bronwyn Dixon - Team Science Librarian

Image reproduced for this project with permission from the work of Bradley David Santos http://www.bradleydavidsantos.net/ The Problem

When medical researchers search for peer reviewed articles and scientific papers in academic databases such as PubMed and Google Scholar, they are currently faced with an extremely long list of results which can take days to trawl through, much of which is irrelevant. What is missing in the current methods of literature searches is the ability to determine or visualise the connections between particular articles. This can become very difficult for anyone who is jumping into a new field of research, such as new PhD students. What is needed is a more intuitive way of displaying search results which will cluster results together based on user-identified search parameters and allow visual connections to be drawn between relevant article.

Although there are currently some applications available to cluster PubMed searches based on keywords such as Knalij and Pured-MCL, they are often lacking in functionality such as the ability to zoom in on specific clusters and being able to visualise which institutions are having the most impact on a specific field in terms of research and publication output.

This information is useful not only for new students who want to identify which institutions are currently driving the research in their desired field, but also could aid senior medical researchers to identify potential collaborators from around the world.

The Solution

We are aiming to enable researchers to plot academic paper search results on a heat map of the world, indicating volume of output based on university location. Points on this map will provide more information about articles when clicked. We are also planning to create network diagrams of search results, which visually demonstrate relationships between papers, and easily highlight or eliminate papers that have been retracted after publication.

Application/Relevance

Paperboy will allow researchers of all levels to quickly reach relevant search results when looking for academic papers via PubMed. At a glance, researchers will be able to understand the citation relationships between research papers that come out of institutions. The application also will also provide an avenue for potential students to find institutions or labs to join based on research output in a specific field that they are interested in. Senior researchers could also use this tool to identify potential collaborators or new research opportunities around the world. Taken together, we believe that Paperboy will eliminate the need for scrolling through hundreds of pages of mostly irrelevant results when attempting to explore specific areas of research interest.

Datasets

We are scraping PubMed search results. The test dataset is 'Type I interferon IFN dsRNA'. We are geocoding the institution of the first author of each paper through the Google Maps Geocoding API. We are creating lines to connect these papers on a map based on whether they cite one another. We are searching through the test dataset based on keywords in the title and abstract. All datasets can be found in the GitHub repo below.

Links

Tech stack

  • Ruby
  • Meteor
  • D3
  • MongoDB
  • Handlebars
  • MiniMongo
  • Google Geocoding
  • PubMed
  • Javascript
  • GitHub
  • Dropbox
  • Leaflet JS library

Tradeoffs/analysis

Deprecation issues with some design. Make stuff just work more easily. Scraping rather than PubMed API.

Future functionality

Note: Due to wide applicability of 'literature search' across disciplines. This project would benefit from going to other hack events.

  • Live API of Pubmed.
  • Cluster view, Bubble view of key terms, Flagging retraction, Heatmap of density of papers.
  • Time filtering
  • Other literature databases for other disciplines.