Project structure - PADOH-DHI/contributor-guide GitHub Wiki

A project's structure is how its components (programs, documentation, even the sections of code inside the files) are organized. As files or lines of code are added, finding a specific part becomes harder and harder. But a good structure can make this easier, much like categorizing and alphabetizing books on a shelf.

There is no one right way to set up a project. As with code style, the most important thing is to be consistent.

Examples

Below are some suggested structures different kinds of projects. These would work for any language, but if the project's language has a standard strcuture, it's best to use it. That way, newcomers won't need to decipher the layout before they can start contributing.

Parts common to all structures (like README.md) are only documented in the first.

Analysis report

This example is for regularly producing a report with analysis.

report-name/
├ README.md
├ LICENSE
├ CONTRIBUTING
├ project_master.sas
├ directions.md
├ input-data/
|   └ data_dictionary.md
├ output-data/
|   └ data_dictionary.md
├ sas/
└ report/
    └ figures/

README.md

This is the first file people will see in the repository. It should give a brief description of the project and whom to contact with any questions.

If it's a Markdown (.md) file, then GitHub will show the formatted version.

LICENSE

This is the distribution license for the code. Without a license, anyone so much as copying and pasting the code risks being sued by you for copyright infringement. Which means they won't do it.

CONTRIBUTING

This lists the rules contributers should follow. Not necessary, but it's a very good idea if more than one person is working on the project. Common topics are naming conventions, pull-request etiquette, and what information to give in bug reports.

project_master.sas

The more automated a project is, the better. As much output as possible should be created by this one program. It should use the programs in sas to analyze data in raw-data, store the results in output-data, create the figures in report/images, and then update the documents in report.

To avoid sharing sensitive information or server paths, have this program read that type of data from a configuration file. Then each program run during the same session will have those settings.

SAS is just used as an example. The same concept applies to any language.

directions.md

For that which can't be automated, give detailed instructions for the user. This could include things like who to request data from or how to fill out approval forms.

input-data/

The input data sits here. All data and statistics in the report should be be based on what's in these files. Other than replacing them with updated files, they shouldn't be edited or replaced.

input-data/data-dictionary.md lists each input data file, describing the structure and any recodes used.

Because these are never edited and often periodically replaced with recent data, including them in version control would have no benefit beyond backing them up.

output-data/

This is where to save data sets created using files in input-data. output-data/data-dictionary.md describes them just like with input-data/data-dictionary.md.

Again, these aren't really "edited," and will often be in formats which are illegible as text. So a version control system wouldn't do much good for these. Again, you should still include the directory itself in the version-controlled repository.

sas/

The sas/ directory holds the SAS programs doing the analysis for the report. Similarly, the project may also have R/ or Python directories.

report/

This contains the actual narrative text of the report, whether it's in Word documents, a spreadsheet, or R-Markdown files. Charts and other images for the report are saved in the report/figures/ directory.

Web pages

This example is for creating web pages for a website (or subsite).

website-name/
├ README.md
├ LICENSE
├ CONTRIBUTING
├ project_master.R
├ Rmd/
└ output/
    ├ pages/
    ├ images/
    └ documents/

Rmd/

This directory is for the R-markdown documents used to generate the HTML pages. Instead of opening and running them individually, it should be possible to create them in bulk using project_master.R and save the results to output/.

output/

Obviously, the output from the R-markdown files is saved here. The layout of this directory mirrors that of a website:

  • HTML documents go in output/pages/.
  • Image files go in output/images/.
  • Other files go in output/documents/.

This makes it simple to know where to save each file on the actual website server.

SAS resources

This example is for creating SAS data sets, functions, and miscellaneous files to use in other programs. For R and Python packages, please see Other Links below.

report-name/
├ README.md
├ LICENSE
├ CONTRIBUTING
├ project_master.sas
├ raw-data/
└ sas/
|   ├ functions/
|   |   └ functions.md
|   ├ datasets/
|   |   └ datasets.md
|   └ macros/
|       └ macros.md
└ tests/
    └ run_all_tests.sas

In general, the final output of this type of project will not be saved in the project itself. Instead, the resources will be copied to a location (specified in a config file) where multiple people can access them.

sas/functions/

These are programs creating data sets with compiled SAS functions. Details on the parameters and return values or effects of each function should be described in functions.md. Use the official SAS function documentation as a reference.

If your project creates a lot of functions, consider grouping them into separate packages and writing a different documentation file for each package.

sas/datasets/

This directory is for programs to create permanent data sets, not the data sets themselves. datasets/datasets.md will be the data dictionary for them, similar to output-data/data-dictionary.md in the report example above. If a data set has a lot of fields or needs a lot of detail, it could be a good idea to give it its own description file.

sas/macros/

These programs define SAS macros which will be shared either as an autocall library or compiled macro facility. sas/macros/macros.md gives details on what each macro does and how to use it. As usual, break it up into multiple files if needed.

tests/

This is a very important directory. Others won't use resources they don't trust, and nobody should trust anything that isn't tested. Programs in this directory should test each function, data set, and macro offerred by the project. The run_all_tests.sas program should live up to its name. You never know which piece will break after a change.

Other links