Pandoc Filters - jgm/pandoc GitHub Wiki

Pandoc provides an interface for users to write programs (known as filters) which act on the intermediate AST. For more info see the filter tutorial and the Lua filter tutorial.

This page collects together third party filters which can be used to add functionality to pandoc.

Writing Filters

Filters can be written in any programming language. Pandoc wrappers and interfaces are available in the following programming languages to facilitate modification of the AST:

language link description
Python pandocfilters a library for writing pandoc filters in python.
Python panflute a pythonic alternative to pandocfilters, with batteries included. It reconstructs pandoc AST in an internal panflute AST which makes it more seamless in interacting with the AST. (@jgm recommended this in pandoc discuss)
Python pantable specialized in writing filter for tables based on panflute, which provides a lossless conversion between an internal structure and panflute AST.
PHP pandocfilters-php a port of the python pandocfilters module to PHP to make writing filters in PHP easier.
Node.js pandoc-filter-node a Node.js module for writing pandoc filters in JavaScript.
Perl Pandoc::Elements a CPAN module for writing pandoc filters in Perl.
Groovy groovy-pandoc a library for writing Pandoc filters in Groovy.
Ruby paru a Ruby gem to write pandoc filters in Ruby.
Lua pandoc's official documentation Pandoc includes a lua interpreter by default so is quite lightweight
Elixir Panpipe a library for writing pandoc filters in Elixir
.NET PandocFilters a NuGet package for writing Pandoc filters in .NET languages
OCaml ocaml-pandoc An OCaml library for writing pandoc filters.

Other tools:

  • vimhl, a vim plugin that makes vim syntax highlighting engine available in pandoc.
  • pandoc-jats, a Lua custom writer for Pandoc generating JATS XML.
  • 2bbcode, a Lua custom writer for BBCode.
  • pandocmeta.lua, a simple Lua package that converts Pandoc metadata types into a, possibly multi-dimensional, table.

Written Filters

See github.com/pandoc/lua-filters and https://github.com/pandoc-ext for some select filters written in Lua. Some other known 3rd party filters:

Document (DOCX/ODT) related

  • Because DOCX and ODT files cannot use templates, we are limited in how we can transform metadata into document content. Several paru filters can help to solve this, given a metadata format involving authors with affiliation/correspondence fields and institute information: README; and individual filters: simplifyMetadata, prependInstitute, prependKeywords, prependAbstract, prependComments --- filters combined: prependAll.
  • pandoc-odt-filters: filters that improve ODT output --- creates sequences in image and table captions (for automatic list-of-figures and list-of-tables), corrects links to images and tables, corrects bibliography style, custom styles to headers and spans, better list styles and real smallcaps. Some of the filters are configurable.
  • commentary: a Pandoc filter and command line tool that preserves native-style comments + metadata between Markdown/docx conversions.

Images related

  • pandoc-svg, a pandoc filter to convert svg files to pdf by Jerome Robert.
  • diagrams-pandoc for inserting images expressed in the Haskell diagrams DSL.
  • mermaid-pandoc for inserting images expressed in mermaid syntax
  • r-pandoc for inserting plots expressed in the R language
  • paru-screenshot.rb for automatically taking a screen shot of a web page and including that shot as an image in a markdown file.
  • pandoc-plot to generate and embed figures based on code blocks in documents, using a variety of toolkits (e.g. Matplotlib, MATLAB, gnuplot, ggplot2, etc.). Easy integration with Haskell libraries (e.g. Hakyll)
  • pandoc-figure to transform specific div to complex pandoc>=3.0 figures

Numbering related

  • Numerical reference to sections, using a specified sign (by default #) in internal links. Metadata can configure special sign and whether links should be preserved or converted to plain text.
  • pandoc-fignos, for numbering figures and figure references.
  • pandoc-eqnos, for numbering equations and equation references.
  • pandoc-tablenos, for numbering tables and table references.
  • pandoc-crossref, for numbering and cross-referencing figures, equations and tables
  • pandoc-numbering, for numbering and cross-referencing any kinds of things such as examples, theorems, exercises and so on
  • pandoc-ling, for formatting, numbering and cross-referencing linguistic examples
  • pandoc-listof, for creating lists of any kinds (deprecated)
  • pandoc-amsthm: a pandoc amsthm package to define the use of amsthm through YAML front matter, target at HTML and LaTeX outputs. For HTML, CSS counter is used and defined in a template (by the YAML variables). For LaTeX amsthm package is used and defined in a template (by the YAML variables). - definitionlist-filter.lua, for converting some definition lists to theorem-like (amsthm) Environments and some references to cref tags in LaTeX

Math related

  • mathjax-pandoc-filter rendering math to SVG using mathjax-node
  • asciimathml-pandocfilter: to add read support for AsciiMathML syntax through conversion into LaTeX
  • pandoc-unicode-math replaces Unicode math symbols and greek letters like ∀, ∈, →, λ, or Ω in math environments by equivalent Latex commands like \forall, \in, \rightarrow, \lambda, or \Omega.
  • SugarTeX is a more readable LaTeX language extension and transcompiler to LaTeX. Fast Unicode autocomplete in Atom editor via SugarTeX Completions for Atom.
  • pandoc-logic-proof provides a way to write logic proofs in pandoc markdown and produce attractive output.

LaTeX related

Include/transclude related

  • Include Files: finds all the inline code blocks with attribute include, and replaces their contents with the contents of the file given
  • code-includes.lua Include code from source files. Keep your examples and documentation compiled and in-sync. Similar to the above except you don't have to install Haskell and you can select by line number.
  • transclude.lua Include content from another file just like in AsciiDoc and ReST.
  • include-files.lua Filter to include other files in the document.
  • include.py: Panflute filter to allow file includes. See doc.
  • pandoc-include-plus is another pandoc filter which supports "include" files. Key features:
    • Included files can include other files, recursively.
    • Paths to images are adjusted as needed to ensure that everything "just works".
    • Option to automatically promote or demote headings in included files.

RAW related

Tables related

  • pandoc-csv2table for including referenced csv files in markdown as markdown rendered tables.
  • pandoc-placetable lightweight implementation of the idea behind the above pandoc-csv2table (e.g. doesn't necessarily require pandoc as a cabal dependency)
  • ickc/pantable: CSV Tables in Markdown: Pandoc Filter for CSV Tables: a Python alternatives to the above 2 filters, using panflute, with some enhancements (e.g. auto-width, fractional width, etc.)
  • Creating a link table at the end of your document.
  • pandocsql run SQL queries on tables, generating other tables
  • pandoc-linear-table Creating tables with cells that contain a lot of content can be difficult to do in standard Markdown. This Pandoc filter extends Markdown syntax to make the job easier.

Text related

  • pandoc-abbreviations allows the use of arbitrary abbreviations, defined in an abbreviations file or in the source document's YAML header, which are replaced on processing. Useful for maintaining consistency of terminology etc.
  • pandoc-acronyms is a filter for managing acronyms. It replaces acronyms like "FAQ" at first use with the full text "frequently asked questions (FAQ)". It is installed using pip.
  • count-para.lua add numbering to paragraphs to allow for detailed citation (in scientific context). Proposal to replace page-number referencing, which does not work with adaptive design.
  • pandoc-lang automatically detects the (natural) language of text, as well as the programming language of code blocks
  • pandoc-mustache replaces variables like {{varname}} in a pandoc document with their values, which are stored in a separate YAML file.
  • pandoc-quotes.lua and the older pandoc-quotes replace non-typographic, quotation marks with typographic ones for languages other than US English.

Typesetting related

  • columns provides multiple columns support in HTML and LaTeX/PDF output.
  • first-line-indent provides smart first-line indents in HTML and LaTeX/PDF output.

Running Code related

  • R-pandoc for generating R plots
  • filter_pandoc_run_py for executing python codes written in code blocks and also embedding print output and pyplot figures
  • pandoc-plot to generate and embed figures based on code blocks in documents, using a variety of toolkits (e.g. Matplotlib, MATLAB, gnuplot, ggplot2, etc.). Easy integration with Haskell libraries (e.g. Hakyll)
  • Knitty: is a Pandoc filter for reproducible reports via Jupyter and Pandoc (Stitch's fork that is a Knitr-RMarkdown-like lib). Insert Python code (or other Jupyter kernel code) to the Markdown document or write in plain Python/Julia/R/any-kernel-lang with block-commented Markdown and have code's results in the Pandoc output document.
  • pandocsql which uses an in-memory SQLite database. It creates tables from tables in the document and executes queries in code blocks, showing the results as tables.
  • pannb, a pandoc filter to control the output from ipynb input, this includes metadata block, filter out Python code, and converting all raw-blocks to native pandoc AST. The 3 can be mixed and matched.
  • pandoc-query, a pandoc filter that
    • defines a simple language for querying a collection of Pandoc documents and formatting the output, and
    • provides a way to embed queries in a Pandoc document, so that when the document is converted to a new format, the query is replaced with the results. Not a standalone filter, but a filter you can use as part of an application.
  • run-code-inline reads text (e.g., Markdown or Asciidoc) from stdin, echoes it to stdout, simultaneously running any commands and inserting the output immediately after the command. Useful for writing tutorials and software documentation. Not Pandoc-specific, but useful as part of a Pandoc toolchain.

Citation related

  • pandoc-manubot-cite allows citing persistent identifiers directly like @doi:10/c7np or @pubmed:29618526. Removes the need for a reference manager by supporting DOIs, PubMed IDs, URLs, ISBNs, Wikidata IDs, and the hundreds of other ID types registered with https://identifiers.org. Written in Python. Available on PyPI.
  • pandoc-url2cite allows citing certain persistent identifiers directly (URLs, ISBNs, and DOIs). Basically a less opinionated and simpler version of pandoc-manubot-cite. Written in TypeScript. Available on npm.
  • pandoc-zotxt.lua looks up sources for citations in Zotero.
  • recursive-citeproc handles self-citing bibliographies.

Others

  • Adding support for indexing with the syntax (# term, subterm) in html and latex
  • Adding non-breaking spaces inside a URL to preserve formatting
  • toc-css Lua filter changing the appearance of the Pandoc basic HTML table of contents by some CSS and vanilla Javascript.
  • lablinkfix updates links to the Swedish Labour Movement Archives and Library catalogues.
  • second-date changes date metadata to a different strftime format using python's dateutil.
  • pandoc_abnt allow to specify the source of images and tables, and automatically corrects Alineas according to Brazilian's standard for Academic writings (ABNT NBR 14724:2011).
  • nheengatu provides several resources for publishing multimedia content through formats such as LaTeX, HTML and EPUB.
  • pandoc-select-links is a Pandoc filter that takes an input document and returns a new document that contains only the links from the input document. The implementation is just a few lines of code, and provides a simple example of how to use the query function.
  • pandoc-select-code is a Pandoc filter that extracts just the code blocks from an input document. You might use this, for example, to extract sample code from a tutorial.