4. Week 4 Learning and Application Notebook 10.4.22 - aboatwr/R4R GitHub Wiki

What we discussed this week:

this week's lesson

  • Data management
  • Data management plans
  • Data Storage

How can I apply what I've learned to my research:

  • I started a GitHub repository for a new project I am working on. I am trying to make sure all my code and data is organized from the start.
    • So far I have four main folders:
      • data, code, relevant papers, relevant new articles.
    • Each folder has its own readme with links to and descriptions of each file in the folder.
  • Last week I collected a lot of observational data and this week I will begin merging, analyzing it, and looking at descriptive statistics. Are code management plans a part of the data management plan?
  • Data Storage: I am planning to store my code on Github and my raw and cleaned data on Box. I also backup my data files to a hard drive every (most) weeks.
  • For data management: I started using DMPTool which has built in templates that has all the questions you need to consider while creating a data management plan.
  • In my plan, I want to include a variable-naming and file-naming process so that I am not just making it up as I go
  • Should data management plans be posted on GitHub?

Challenges:

  • Time for a data management plan: I think it is hard to find time to create a data management plan before working with data. Because of deadlines, I always feel so rushed to get to the analysis part, and dont end up organizing my data or making some sort of plan before I start working with it. In the discussion some people mentioned that they end up organizing their data after they decide the project is ready to be published-- but this can lead to problems if you realize that there was an error with your data at the last minute.
  • I realized I cannot make a Github wiki for my new repository unless its public (or I pay a fee) which is too bad. I was hoping to write a wiki while I work on the project so that it is ready to go when I make it public. For now, I am writing information I would include on a wiki in my README files.

things I want to look into more when I have time:

  • look into MLFlow for managing parameter values (mentioned by Carlos)
  • zenodo (research object tool)