File conversion - electricbookworks/constitution GitHub Wiki
Apart from the English language version, we've worked from the official DOJ PDF versions of the constitution.
Our conversion process developed as a work in progress. This is a record of that process.
-
Open PDF in Acrobat Pro
-
Delete prelim and endmatter pages (we'll create those manually later in markdown)
-
Crop pages to remove header and footer
-
Save as Word (.docx)
-
Convert .docx to markdown (.md) with pandoc:
- at the command line, navigate to the folder containing the files to convert (using
cdor in Windows 10, typepowershellinto the address bar of the file explorer when in the relevant folder) - Run this command (changing
fileto the name of your file):pandoc -S -f docx -t markdown file.docx --output=file.md
- at the command line, navigate to the folder containing the files to convert (using
-
Save markdown file in relevant language folder as
scrub-en.mdwhile we work on it (whereenis each language code). -
Run the scrub file through a batch regex search-and-replace.
-
Clean up markdown manually (in a good code editor like Sublime Text 3 with the MarkdownEditing package installed and its syntax set to MultiMarkdown).
-
Divide into separate files per chapter/schedule/annexure and create prelims, copy-pasting from PDF over a copy of the English-version prelims.