User Stories - WEHI-RCPStudentInternship/data-commons GitHub Wiki
This page will display the outcomes of user stories, enhancing our comprehension of how individual users will interact during various stages of data processing.
User stories
#1 - Searching for datasets that are part of the collaboration
Before
- As a new member of this collaboration without a data commons, I have to email one or more of my collaborators to ask for a list of all datasets that is part of the collaboration.
- I might wait a few days to a few weeks if people are away to get a response. I may have to remove duplicates or receive contradicting information that I have to clarify.
After
- As a new member of this collaboration with a data commons, I can easily see a list of all datasets that is part of the collaboration through the dataset registry.
- I can search on specific topics and I can easily see which datasets I have full or partial access to.
- I can do this in a matter of minutes.
#2 - Finding raw data from summarised data in a Data Portal like cBioPortal
Before
- As a member of this collaboration without a data commons, I have to email my collaborator after finding an interesting dataset that is visualised in a data portal.
- I might wait a few days to a few weeks if that person is away to get the raw data.
After
- As a member of this collaboration with a data commons, I can easily find a link from the data portal of the dataset I am interested in to the dataset registry record for that dataset.
- The dataset record in the dataset registry points to where the raw data can be found.
- I can also easily copy a snippet of code in R or Python that allows me to access the data quickly if I have access to the dataset.
- I can do this all within a matter of minutes.
#3 - Finding summarised data from raw data on storage like VAST
Before
- As a member of this collaboration without a data commons, I have to email my collaborator after finding an interesting dataset that I found the raw data for.
- I might wait a few days to a few weeks if that person is away to get the summarised or visualised data.
After
- As a member of this collaboration with a data commons, I can easily find a link from the raw data folder of the dataset I am interested in to the dataset registry record for that dataset.
- The dataset record in the dataset registry points to where the summarised / visualised data can be found in one or more data portals, if they exist.
- I can choose one of these links and can then access the summarised / visualised data on the data portal within minutes.
- I can even download the summarised data that makes up the visualisation on the data portal.
#4 - Adding a private dataset into the Data Commons
Before
- As a member of this collaboration without a data commons, I will let me collaborators know via email that I have a private dataset ready for analysis / sharing.
- It might take me a few minutes to email everyone in the collaboration as I would have to find the email list of collaborators and I might accidentally miss one of my colleagues when sending the email.
- It might take a few days to load it into a data portal as I have to install it myself and then load it into the data portal.
After
- As a member of this collaboration with a data commons, I can easily create a dataset record in the data registry that lets people know about the private dataset, where it is located, who can access it, along with other valuable metadata.
- I can use command line to load it into the dataset registry, or I can use a web-based application.
- I can follow the instructions on the dataset registry on how to upload the private dataset into the appropriate data portal, and I can use the tools created in Data Commons to help me transform and load this data in the right format into the appropriate data portal.
- It might only take me a few hours to load it into the data portal.
#5 - Adding a public dataset into a data portal
Before
- As a member of this collaboration without a data commons, I will let me collaborators know via email that I have a public dataset added into a data portal.
- It might take me a few minutes to email everyone in the collaboration as I would have to find the email list of collaborators and I might accidentally miss one of my colleagues when sending the email.
- It might take a few days to load it into a data portal as I have to run one myself or I have to get permission to load it into a public data portal.
After
- As a member of this collaboration with a data commons, I can easily create a dataset record in the data registry that lets people know about the public dataset, where the raw data is located on a public registry and provides a direct link to the dataset on the data portal.
- I can do this in a few minutes either using command line tools or a web interface.
- I can follow the instructions on the dataset registry on how to upload the public dataset into the appropriate data portal , and I can use the tools created in Data Commons to help me transform and load this data in the right format into the appropriate data portal.
- It might only take me a few hours to load it into the data portal.
#6 - Grouping datasets together within the Data Commons
Before
After
- As a member of this collaboration with a data commons, I can group datasets together who have the same samples / patients so that I can easily see all the datasets for a single group of patients/samples, like from a clinical trial.
#7 - Moving from one data portal to another to view the same samples/patients for different dataset types
Before
- As a member of this collaboration without a data commons, I have to email my collaborators after finding an interesting dataset that is visualised in a data portal and ask them if there is another dataset type visualised in another data portal that has the same samples/patients.
- I might wait a few days to a few weeks if a key person is away to get an answer.
- Sometimes the answer may be that I will have to download the raw or processed data myself and look at it individually.
- This could take days to do.
After
- As a member of this collaboration with a data commons, I can easily find a link from the data portal of the dataset I am interested in to the dataset registry record for that dataset.
- I can then see all the other datasets that are grouped together as they share the same patients/samples.
- The dataset record in the dataset registry shows me all the other datasets that are grouped together.
- I can choose the related dataset record for the type of data that I am interested in.
- The dataset record I chose in the dataset registry points to where the summarised / visualised data can be found in one or more data portals, if they exist.
- I can choose one of these links and can then access the summarised / visualised data on the data portal within minutes.
- I can even download the summarised data that makes up the visualisation on the data portal.
- I can do this all within a matter of minutes.
Obsolete User Stories
The user stories previously on this website are now obsolete