Rmarkdown custom knit hook to compile a multi part document - lmmx/devnotes GitHub Wiki

The following notes accompany a blog post I've written on this undocumented RStudio/rmarkdown feature which may be more explanatory:

...and another on my current 'modular' code notebook:

An example of an Rmarkdown YAML header which threads together a document from component Rmarkdown subsections:

---
title: "Multi-part analysis"
knit: (function(inputFile, encoding) { for (section in list.files(pattern="analysis-part-.*?.Rmd")) { rmarkdown::render(section, encoding = encoding, quiet=TRUE) }; rmarkdown::render(inputFile, encoding = encoding, output_file = paste0(dirname(inputFile),'/README.md')) })
output:
  md_document:
    variant: markdown_github
    includes:
      after_body: [
        analysis-part-i.md,
        analysis-part-ii.md,
        analysis-part-iii.md
      ]
---
  • The title: field becomes a top-level header (#) in the output markdown

  • The knit: field (a currently undocumented hook) replaces rmarkdown::render with a custom function specifying parameters for the rendering.

    Yes, unfortunately it has to be a huge one-liner unless you have your own R package [with the function exported to the namespace] to source it from (as package::function). Here's the above more clearly laid out:

(
function(inputFile, encoding) {
  for (section in list.files(pattern="analysis-part-.*?.Rmd")) {
    rmarkdown::render(section, encoding = encoding, quiet=TRUE)
  };
  rmarkdown::render(inputFile, encoding = encoding,
                    output_file = paste0(dirname(inputFile),'/README.md'))
})
  • Firstly, every section's Rmarkdown file is rendered into markdown [with the same name by default]
    • Each of these files are 'included' after the 'body' (cf. the header) of this README, if they're in the includes: after_body:[...] list.
    • The quiet=TRUE parameter silences the standard "Output created: ..." message following render() which would trigger the RStudio file preview — presumably not wanting to see intermediate markdown.
  • After component files are processed in the for loop, the final README markdown is rendered ('includes' appends their processed markdown contents), and this full document is previewed.
  • All Rmd files here contain a YAML headers, the constituent files having only the output:md_document:variant field:

output: md_document: variant: markdown_github


  ...before their sub-section contents:

Part 1: Comparison of cancer types surveyed

Comparing cancer types in this paper to CRUK's most up to date prevalence statistics (February 4th 2015).


## Alternative modular setup

One of the problems custom knit functions can also solve is the time it takes for large manuscripts to compile.

*E.g.*, if using [`knitcitations`](https://github.com/cboettig/knitcitations), each reference is downloaded even if the bibliographic metadata has already been obtained. Along with generating individual figures etc., the time to 'compile' an Rmarkdown document can therefore scale exorbitantly when writing a moderately sized document, breaking the proper flow of writing and review.

A modular structure is the only rational way of doing this, but isn't described anywhere for Rmarkdown's dynamic documents.

A 'main' knit function as above would lack the first step of compiling each `.Rmd` ⇒ `.md`), so that `.md` files would just be `included` (instantly):

```YAML
knit: rmarkdown::render(inputFile, encoding = encoding, output_file = paste0(dirname(inputFile),'/README.md')) })

Much more sensibly, the edited Rmarkdown component files (subsections) wouldn't need to be re-processed, e.g. have all references and figures generated - this would be done per file, each one with custom knit: hooks:

---
knit: (function(inputFile, encoding) { rmarkdown::render(inputFile, encoding = encoding, quiet=TRUE) })
output:
  md_document:
    variant: markdown_github
---

The idea would be to follow what this Software Carpentry video describes re: makefiles for reproducible research papers.

The example above creates a README.md file suitable for display in a standard GitHub repo (though it's not really a good idea to have sprawling READMEs). It could just as easily be tweaked to give a paper.pdf, using a PDF YAML header instead for the final .md.pdf step.

via Software Carpentry

For what it's worth, my current YAML header for a manuscript in PDF is:

---
bibliography: references.bib
mainfont: Arial
output:
  pdf_document:
    latex_engine: xelatex
---

... and in the top matter (after the YAML, before the markdown, for the LaTeX engine & R):

`\fontsize{10}{16}`
`r library(knitr)`
`r library(knitcitations)`
`r options("citation_format" = "pandoc")`

The way to provide titles programmatically for primary and secondary level headings is to put these at the top of the Rmarkdown body:

The way to provide titles programmatically for primary and secondary level headings is to put these at the top of the Rmarkdown body:

```{r echo=FALSE, results='asis', comment=''}
cat("#", rmarkdown::metadata$title)
```
```{r echo=FALSE, results='asis', comment=''}
cat("##", rmarkdown::metadata$title)
```

etc.

  • A little proof of concept one-click-knit Rmarkdown repo I've made from this is here. All code is self-contained in the documents, and all the documents get sewn together when the main 'Analysis.Rmd' is 'knitted' (from within the documents sub-directory until I start using Konrad Rudolph's modules package).

    The analysis is a brief first time use of dplyr, and cross-references cancers in the recent Tomasetti & Vogelstein 'bad luck' paper in Science against Cancer Research UK's published prevalence statistics.

  • One further thing you can do here is pre-process the Rmarkdown with brew, as in this example, to programmatically set YAML headers, e.g. to dynamically pass a list of files to the include: after_body: field with something like

    includes: 
      after_body: <%= paste('[',paste(list.files(pattern = "analysis-part-*.md"), collapse=','),']') %>

    (or something similar to generate the expected format). A commenter on my GitHub wiki has suggested ending the 'main' document with a code chunk {r child: list.files(...)} as an alternative to brew - see here.

  • See also: this post describing my current workbook (not currently shareable in its entirety), which writes a fresh YAML header for the final HTML output programmatically, and compiles this one last time as Rmarkdown.

⚠️ **GitHub.com Fallback** ⚠️