Documentation required for publishing work - sparklabnyc/resources GitHub Wiki
Research documentation
When publishing research, it is important to make documentation available so that readers can understand the details of the research design that the work reports. Research documentation provides the context for understanding the results of a given research output. There is no standard form for this documentation, and its location and format will depend on the type of research output produced. For academic materials, this documentation often takes the form of a structured methodological appendix. For policy outputs or products, it may be appropriate to include an informative README webpage or document. The most important process for preparing this documentation will be retaining and organizing the needed information throughout the life of the project, so that the team will not have to search through communications or data archives for small details at publication time.
What to include?
Research documentation should include all the information that is needed to understand the underlying design for the research output. This can include descriptions of:
- Methods of sampling
- Populations of interest that informed the study
- Power calculations and pre-analysis plan
- Field work, including data collection or experimental manipulation, such as study protocols and monitoring or quality assurance information
- Data completeness, including non-observed units or quantities that were planned or "tracking" information
- Data collection
- Statistical approaches
All of the research documentation taken together should broadly allow a reader to understand how information was gathered, what it represents, what kind of information and data files to expect, and how to relate that information to the results of the research. Research documentation is not a complete guide to data; however, it does not need to provide the level of detail or instructions that would enable a reader to approach different research questions using the same data.
Structuring research documentation as a publication appendix
If you are preparing documentation to accompany the publication of an academic output such as a working paper or journal article, the most common form of research documentation is a structured supplemental appendix. Since there is unlimited space and you may have a large amount of material to include in a documentation appendix, organization is essential. It is appropriate to have several appendices that cover different aspects of the research. Each appendix should include relevant references. Supplementary exhibits should be numbered to correspond with the appendix they pertain to. For example, Appendix A may include information about the study population and data, such as the total number of units available for observation, the number selected or included for observation, the number successfully included, and descriptive statistics about subgroups, strata, clusters, or other units relevant to the research. It could be accompanied by a tracking dataset with full information about the process. Appendix B might include information about an intended experimental manipulation in one section, and information about implementation, take-up, and fidelity in a second section. It could be accompanied by a dataset with key indicators. Appendix C might include data collection protocols and definitions of constructed variables and comparisons with alternative definitions, and be accompanied by data collection instruments and illustrative figures.
Documentation for publications
The two most common types of documentation used in research are note citations and parenthetical citations. You might also see terms like “footnotes,” “endnotes,” or “references” when learning about documentation practices.
Documentation begins as soon as you start researching, and it continues throughout the writing process: drafting, revising, and editing. Therefore, you need to maintain a careful record of the sources you use and the exact material you take from them.
What to document
- Any direct quotation, even a single phrase or keyword, must be identified according to the work, and the exact place in that work
- Any paraphrase or summary of another individual’s written work, or from an oral report or presentation
- Any opinion (verbal or written), without which you could not have reached your views without the help of another source
- Any statistical data that you have not compiled yourself
- Any visuals that you have not prepared yourself
- Any software programs that you did not develop yourself.
In general, you should maintain documentation at both the project level and the file level. The following examples are not meant to be an exhaustive list, but to illustrate the type of information you should try to document. Project-level documentation includes information about the processes used throughout the project, including how you and your collaborators are collecting, organizing, and analyzing your data. File-level documentation includes details related to individual files.
A good rule of thumb is to always document more than you think is necessary.
How to document
Documentation can be maintained in a variety of forms. Some common forms of documentation are:
- Readme - A Readme file is a text file located in a project-related folder that describes the contents and structure of the folder and/or a dataset so that a researcher can locate the information they need.
- Data Dictionary - Also known as a codebook, a data dictionary defines and describes the elements of a dataset so that it can be understood and used at a later date.
- Protocol - A protocol describes the procedure(s) or method(s) used in the implementation of a research project or experiment.
- Lab Notebook - For research groups that use them, lab notebooks are often the primary record of the research process. They are used to document hypotheses, experiments, analyses, and interpretations of experiments. For information about keeping a lab notebook, see this page from Stanford's Office of Technology Licensing.
- Metadata - Metadata is data about data. There are different types of metadata, including descriptive metadata (information about the content of your data), structural metadata (information about the physical structure of your data, including file format), and administrative metadata (information about how and when your data was created). Metadata often conforms to a specific scheme- a set of standardized rules about how the metadata is organized and used.
Example
How to Document -To demonstrate the value of documentation, let's use an example data set
In the organization section, we discussed giving our data unique and descriptive names. The variable names do indeed give important information about what is in each column. However, additional information may still be necessary to understand the contents of the data and how it was collected and analyzed.
The contents of a file named examplestudy_participant01_version01.csv
day | temp_f | hr_rest | spo2 |
---|---|---|---|
1 | 97.5 | 55 | 97 |
2 | 97.6 | 52 | 98 |
3 | 97.5 | 49 | 97 |
4 | 97.5 | 58 | 98 |
5 | 97.4 | 56 | 98 |
Below is a simple data dictionary for the file examplestudy_participant01_version01.csv. It includes the name of each variable in the file (which do not have spaces and special characters), the variables name written out in plain language, and information about the attributes of each variable (including units).
Variable | Full Name | Description |
---|---|---|
day | day | The day (out of 5) the measure was collected. Days are consecutive. |
temp_f | body temperature (Fahrenheit) | The body temperature of the participant, measured in degrees Fahrenheit. Body temperature was taken using a non-contact forehead thermometer |
hr_rest | heart rate (resting) | The resting heart rate of the participant, measured in beats per minute. Heart rate was taken using a fingertip pulse oximeter. |
spo2 | Oxygen saturation | Pulsatile oxygen saturation, measured in percentage. Sp02 was taken using a fingertip pulse oximeter. |
This data dictionary does not contain information about the steps used to collect the data in this file, the software tools used to analyze the data, or other details that would be necessary to understand or build upon this data. Much of this information would be recorded in a protocol.
Again, a good rule of thumb is to document more than you think is necessary. Even if you think you'll remember what your variables represent, what procedures you applied, what software you used, and other details, document it all.