User Stories - WEHI-RCPStudentInternship/data-commons GitHub Wiki

This page will display the outcomes of user stories, enhancing our comprehension of how individual users will interact during various stages of data processing.

User stories

#1 - Searching for datasets that are part of the collaboration

Before

As a new member of this collaboration without a data commons, I have to email one or more of my collaborators to ask for a list of all datasets that is part of the collaboration.
I might wait a few days to a few weeks if people are away to get a response. I may have to remove duplicates or receive contradicting information that I have to clarify.

After

As a new member of this collaboration with a data commons, I can easily see a list of all datasets that is part of the collaboration through the dataset registry.
I can search on specific topics and I can easily see which datasets I have full or partial access to.
I can do this in a matter of minutes.

#2 - Finding raw data from summarised data in a Data Portal like cBioPortal

Before

As a member of this collaboration without a data commons, I have to email my collaborator after finding an interesting dataset that is visualised in a data portal.
I might wait a few days to a few weeks if that person is away to get the raw data.

After

As a member of this collaboration with a data commons, I can easily find a link from the data portal of the dataset I am interested in to the dataset registry record for that dataset.
The dataset record in the dataset registry points to where the raw data can be found.
I can also easily copy a snippet of code in R or Python that allows me to access the data quickly if I have access to the dataset.
I can do this all within a matter of minutes.

#3 - Finding summarised data from raw data on storage like VAST

Before

As a member of this collaboration without a data commons, I have to email my collaborator after finding an interesting dataset that I found the raw data for.
I might wait a few days to a few weeks if that person is away to get the summarised or visualised data.

After

As a member of this collaboration with a data commons, I can easily find a link from the raw data folder of the dataset I am interested in to the dataset registry record for that dataset.
The dataset record in the dataset registry points to where the summarised / visualised data can be found in one or more data portals, if they exist.
I can choose one of these links and can then access the summarised / visualised data on the data portal within minutes.
I can even download the summarised data that makes up the visualisation on the data portal.

#4 - Adding a private dataset into the Data Commons

Before

As a member of this collaboration without a data commons, I will let me collaborators know via email that I have a private dataset ready for analysis / sharing.
It might take me a few minutes to email everyone in the collaboration as I would have to find the email list of collaborators and I might accidentally miss one of my colleagues when sending the email.
It might take a few days to load it into a data portal as I have to install it myself and then load it into the data portal.

After

As a member of this collaboration with a data commons, I can easily create a dataset record in the data registry that lets people know about the private dataset, where it is located, who can access it, along with other valuable metadata.
I can use command line to load it into the dataset registry, or I can use a web-based application.
I can follow the instructions on the dataset registry on how to upload the private dataset into the appropriate data portal, and I can use the tools created in Data Commons to help me transform and load this data in the right format into the appropriate data portal.
It might only take me a few hours to load it into the data portal.

#5 - Adding a public dataset into a data portal

Before

As a member of this collaboration without a data commons, I will let me collaborators know via email that I have a public dataset added into a data portal.
It might take me a few minutes to email everyone in the collaboration as I would have to find the email list of collaborators and I might accidentally miss one of my colleagues when sending the email.
It might take a few days to load it into a data portal as I have to run one myself or I have to get permission to load it into a public data portal.

After

As a member of this collaboration with a data commons, I can easily create a dataset record in the data registry that lets people know about the public dataset, where the raw data is located on a public registry and provides a direct link to the dataset on the data portal.
I can do this in a few minutes either using command line tools or a web interface.
I can follow the instructions on the dataset registry on how to upload the public dataset into the appropriate data portal , and I can use the tools created in Data Commons to help me transform and load this data in the right format into the appropriate data portal.
It might only take me a few hours to load it into the data portal.

#6 - Grouping datasets together within the Data Commons

Before

After

As a member of this collaboration with a data commons, I can group datasets together who have the same samples / patients so that I can easily see all the datasets for a single group of patients/samples, like from a clinical trial.

#7 - Moving from one data portal to another to view the same samples/patients for different dataset types

Before

As a member of this collaboration without a data commons, I have to email my collaborators after finding an interesting dataset that is visualised in a data portal and ask them if there is another dataset type visualised in another data portal that has the same samples/patients.
I might wait a few days to a few weeks if a key person is away to get an answer.
Sometimes the answer may be that I will have to download the raw or processed data myself and look at it individually.
This could take days to do.

After

As a member of this collaboration with a data commons, I can easily find a link from the data portal of the dataset I am interested in to the dataset registry record for that dataset.
I can then see all the other datasets that are grouped together as they share the same patients/samples.
The dataset record in the dataset registry shows me all the other datasets that are grouped together.
I can choose the related dataset record for the type of data that I am interested in.
The dataset record I chose in the dataset registry points to where the summarised / visualised data can be found in one or more data portals, if they exist.
I can choose one of these links and can then access the summarised / visualised data on the data portal within minutes.
I can even download the summarised data that makes up the visualisation on the data portal.
I can do this all within a matter of minutes.

Obsolete User Stories

The user stories previously on this website are now obsolete