Pandoc - TEALSK12/tealsk12.github.io GitHub Wiki

Pandoc Links

Preparation

Prior to conversion ensure the following in the source Word document:

  • No Text Boxes — Pandoc (and even Word save-as-text and control-a/select-all) will ignore content inside text boxes. Hoist all text box content out of the text box and into regular text.
  • Properly Anchored Tables — All tables should be anchored to flow with the document text (text-wrapping: none).
  • No Extraneous Paragraphs — Many novice Word users will enter empty paragraphs (hitting the return key) as a crude way to create desired space in the document. Find and remove these, or you'll need to remove the blank headers in the converted Markdown.

Conversion Command Line

The following command converts a Word document to Markdown:

pandoc -t markdown_github --extract-media=. -o Lesson-303.md Lesson-303.docx
  • -t markdown_github — Select GitHub-flavored Markdown for the target type
  • --extract-media=. — All extracted images will be stored in ./media/....
  • -o Lesson-303.md — Name of target file
  • Lesson-303.docx — Name of source Word document

In addition to the above switches, you may wish to include the following options:

  • --include-before-body — Markdown blob to include before the body of the converted document.
  • --include-after-body — Markdown blob to include after the body of the converted document.