Five Minute Search, Citizen Science, and Guides - petermr/CEVOpen GitHub Wiki
Note: From Open Virus wiki, moved 11 Oct 2021. SW
- Citizen Science Reader
- Citizen Science Reader: A Reusable Module with Open-source Software
- Above as article in LIBER CS Guide book
- Writeup a pitch to Open Energy Community
- Idea! 5 Min Search for GenR
CEV Blog / https://github.com/petermr/CEVOpen/wiki/CEVOpen-Blog-Ideas
Table of contents generated with markdown-toc
A writeup for sharing with the wider community and the Open Energy Community
In words of PMR:
- researcher (includes citizens) formulates a question (selector 1) they are interested in. e.g.
- invasive plants in 2015-present
- climate change and nesting sea-birds
- energy accounting in Scandinavia (I'm guessing this makes sense!)
- battery materials without lithium
- The key thing is to include some specificity. "climate change" and "nesting seabirds" are too broad.
- download 100 papers into an interactive browser/selector. Choose a section of interest:
- introduction
- methodology
- tables
- diagrams
- funders and collaborators (politically very useful) (References are generally not initially going to be useful for citizens)
- triage those that look interesting and useful. selector 2. This can be done with a button or even swiping (cf Tindr).
Note: 1. For CS we'll need to make sure there is a component of active participation for the persons involved as this is a CS prerequisite. 2. SW will look at the idea from the perspective of a Research Library or Public Library providing providing public facing access to academic literature and Open Science resources.
to Google docs (24 Aug 2021, SW) https://docs.google.com/document/d/19dhwzMHhVJqF_Z3DaZc8gNsOnqbLh31_p8aSuzuKhp8/edit?usp=sharing
By Team OpenVirus, GitHub: OpenVirus
The module is designed to enable a hands-on activity for the easy use of modern Open Access infrastructures involving - finding, using, and sharing scientific literature. OpenVirus has developed open-source software to search across multiple research literature repositories such as Europe PMC or bioRxiv (pronounced "bio-archive") enabling an automated Literature Survey within minutes - presenting the user with a summary of findings and allowing the download of the full articles.
OpenVirus is an example of an open search framework based on text data mining (TDM). Having 'open search' systems is important as search engines are the gateway to scientific knowledge. Search engines can be gamed to bias certain outcomes or be based on faulty algorithms. OpenVirus is built using Open Science methods so all parts of the system are open and verifiable, even down to a specific 'search results' query.
The module is for researchers using Citizen Science in their research project to add a module to engage the participants in conversations in formulating research questions and consulting on what is known about a topic in the existing scientific literature corpus. The activity starts with the participants creating a simple ten word dictionary of terms related to their topic and a bot retrieving a sample of 100 papers from Open Access repositories.
OpenVirus is a project that aims to develop knowledge resources and tools to help tackle the COVID-19 outbreak. The software can be used for any subject and not only COVID-19.
Despite over $100 Billion being spent on medical research worldwide, much knowledge is behind publisher paywalls. Moreover it is usually badly published, dispersed without coherent knowledge tools. This particularly disadvantages the Global South. The project aims to use modern tools, especially Wikidata (and Wikipedia), text mining, with semantic tools to create a modern integrated resource of all current published information on viruses and their epidemics. It relies on collaboration and gifts of labour and knowledge - to find out more read the Getting Started Guide or the How Can I Help section.
OpenVirus practices Open Notebook Science which means there is no insider knowledge and all work is open and licenced for the freedom of reuse.
Video link: Example short video showing an application of OpenVirus.
Image: A simple one slide image summarising the process? A, B, C, D. Review > Dictionary > Summary > Review (repeat).
The activity will involve using the '5 minute literature search' from OpenVirus with a web browser to search open research literature repositories and sort the results to get up-to-the-minute relevant research related to the questions asked.
The activity can be used to share results on a public webpage and to update the search as often as is required - say once a week.
In this example the Citizen Science project is looking at the topic of 'zero-carbon plans'.
Note: Can we create and host this example search? or have a list of other searches already done?
OpenVirus carries out two types of search:
- Firstly, it searches repositories on the net and retrieves the papers;
- Secondly, it then analyzes the local full text copies of the papers that have been downloaded. The result is a swift and verifiable literature survey that might have once taken days, weeks, or months to complete if done manually.
- The researcher (includes citizens) formulates a question they are interested in,
- e.g. What ‘zero-carbon’ plans for tackling the problem of climate change are reliable enough for further adoption in cities or regions around the world, for example in: green energy, transport, and housing? Many policies are being implemented and rely on such unverified plans, for example the EU's 'A European Green Deal' but if we want to ensure the public, industry, and governments buy into them - such plans must be 'Open Science proof' AKA open and available for scrutiny - verifiable, reproducible, and reusable.
- e.g. A simpler version of the same question could be 'What zero-carbon plans can be used for the future of my local schools, city public transport, or municipal buildings, etc.'?
- From the 'research questions' a dictionary of terms important to the topic need to be made. Ten dictionary terms is a good start. These terms are then input into OpenVirus in the browser and it goes off and collects the top one hundred research papers from your repository of choice - we use Europe Pubmed Central as the default as it aggregates many other sources, but many other literature repositories could be used.
- Our example dictionary terms for 'zero-carbon' plans would be: rapid decarbonisation; zero-carbon; low-carbon; energy planning; decarbonisation; low energy transport; policy and planning; policy; low energy housing; low energy city planning; low energy schools.
- OpenVirus local search on the downloaded papers after giving the papers a scan read. The local searches can be focused on paper sections - introductions, findings, etc., or on content types illustrations or tables - informed by what is thought to be the most yielding in the papers.
- Refine and repeat depending on what looks useful. The dictionary of initial search terms should be updated as well as reviewing the local full-text search.
- OpenVirus downloads the full-text of the papers, as well as PDF copies. It also makes a summary of the results of the frequency of the terms. The whole search results package can then be published and shared online.
- In addition Wikidata can be used in relation to the dictionary terms being used and this allows for more advanced semantic queries to be carried out as well as being able to retrieve multilingual Wikipedia pages of say English terms used in papers.
Two max.
END of Article
CEVOpen can be used on different levels as a web service in the way that a search engine is used or to dig into the tooling and acquire further Open Science skills.
Note: Need to get in a part about Wikidata and being multilingual
CEVOpen looks to develop guides and badges that can reward participants and demonstrate their learning. These badges will include badges for:
- Text Data Mining (TDM)
- Research literature survey competence
- Github use
- please add suggestions
- Citizen Science Reader can be carried out remote or in person.
- CEVOpen can be customised to search most open repositories.
- The toolkit can be used in conjunction with other open-infrastructures for example for knowledge graph building, creating custom searches to deal with difficult to manage digital content such as OCR PDFs, as well as integration with machine learning and AI systems. Any other ideas appreciated.
- The project is in an alpha stage and welcomes engagement and support (July 2021)
- CEVOpen could be used to provide a 'citizen dashboard' maintained by local citizen experts in a variety of topics to show off daily examples of top research related to a given topic.
Deadline: 19.8.21
800 words
Climate change is too large a subject for most people to comprehend, so we need to look at particular aspects, where a Citizen scientist is already informed. They know climate change is important, but to measure it they need to see its effect on their discipline. For example, what effect does climate affect the population of insects, the tree cover the rainfall, and so on. What technologies can we muster to help counter the effects - energy production and storage, investment, the built environment? We imagine our citizens who want to ask: "how does climate affect X?", "What can Y do to help in mitigation?".
So we've set up a system, AMI, which can help people find knowledge in the scientific literature. There are several million scientific articles published every year so there will be many examples in every field. For example, we are already working on invasive plant species and would want to ask: "find articles discussing the effect of climate on plant invasions".
We find there are 1600 possible articles in the last 2 years. That's a lot to read, but our pyami system is able to filter it to a readable amount - perhaps by plant type, or geographical region. And it lets us select papers with the most relevant collection of information.
This is much more precise than normal search engines, which normally only give an overview and also don't allow downloading of articles. It's also more friendly since you can select just those parts of the paper you want to read. And there's an overview of a collection of papers in a single table.
The system also allows you to show the data in different ways, for example, plotted on maps or in interactive tables.
And it's extensible, which is where we can help a wide range of disciplines. If you can give AMI a start - a general query - and then select the most promising articles, this selection can be fed back into the system. We call this a "snowball" - start off within a small amount of precise knowledge and re-run the query, The number of relevant answers gets larger - like a snowball.
This gives the CS a knowledge environment that is constantly improving as you do more searches. It's a human-machine symbiosis.
So what is your interest? Let's explore "climate change and X" together with AMI.
How do we explore? We've found that online workshops - or short hackathons - are very productive. We install the software - which is open, and takes a minute or two - and start querying. We then collect the initial results, tabulate and display them, and see what the promising next directions are. Then we repeat - the snowball process. For some citizens, it may be their first encounter with the scientific literature. And they'll find that they can understand a lot of it. And something they can take away and keep working on - as part of an online community.
For the Open Science blog I edit I am planning to change the blog direction in 2022 to support Open Science coomunities and global knowledge equity.
To support these communties I am trying to work out the idea of useful content packages for them. Figuring that these communities need users and to profile their servics and programmes I am selecting format to help with those priorities.
I would like to fit the '5 Minute Search' idea into this and get community experts to maintain a lit search that is regularly updated and maybe runs automatically to pull the latest papers.
Adding '5 minute search' would be part of a package including: 1. Blogs; 2. GenR endorsing services and programmes; 3. Lit search; 4. Guides; 5. Barcamps/hacks.
I would then work on getting institutions, funders, startups - to sponsor 'content packages', guides, events.