Wiki; GitHub Reconstruction - HWRM/KarosGraveyard GitHub Wiki

In May 2020 all one thousand plus pages of the original Karos Graveyard were fully reconstructed on GitHub. The process is documented below:

  1. Downloaded the Karos Graveyard Offline CHM, with content from September 2007.
  2. Opened the KarosGraveyard.CHM file with 7zip, and extracted all the HTML files.
  3. 18 HTML files had odd encoding and had to be resaved in UTF-8. Some punchuation characters turned into question marks, and were later manually fixed.
  4. Used Pandoc v2.9.2.1, to convert the HTML pages to GitHub's Markdown format.
  5. Ran the following command in PowerShell to test convert one page:
pandoc FileName.html -f html-native_divs-native_spans -t gfm --wrap=none -s -o FileName.md
  1. Ran the following command in PowerShell to convert all pages:
gci -r -i *.html |foreach{$md=$_.directoryname+"\"+$_.basename+".md";pandoc -f html-native_divs-native_spans -t gfm --wrap=none -s $_.name -o $md}
  1. Opened the directory containing all Markdown pages in Visual Studio Code. Ran a series of find and replace operations on all pages as documented below.
_Format_
Action To Take:
    Text to Find
    Text to Replace
    Text to Find
    Text to Replace

_RegEx Replace_
Change Header:
	^Homeworld 2 : \[
	**Homeworld Remastered Karos Graveyard** : [
	\*\*\[Function Reference\]\(FunctionReference.html\)\*\* :: \[Scope Reference\]\(ScopeReference.html\) :: \[Variable Reference\]\(VariableReference.html\)\n\n
	nothing
Remove extra leading spaces:
	^  - 
	- 
Remove link picture icons: (turn RegEx off for first operation)
	http://wiki.hw2.info/images/url.png
	url.png
	!\[([^\[]+)\]\(url.png\)
	nothing
Change comments heading:
	 Comments \\\[\[Hide comments/form\]\(.+
	# Comments
	There are .+ comments on this page. .+
	# Comments
	There is .+ comment on this page. .+
	# Comments
	There are no comments on this page. \\\[\[Add comment\]\(.+
	# Comments
Change footer:
	\[Page History\]\(.+\n
	nothing
	\[Valid XHTML 1.0 Transitional\]\(.+\n\n
	nothing
	Page was generated in .+ seconds
		# Page Status
		Updated Formatting? Initial  
		Updated for HWRM? Initial  

_Normal Replace_
Fix internal links:
	.html)
	)
	/edit.html "Create this page")
	)
	/edit "Create this page")
	)
	http://wiki.hw2.info/
	nothing
Make old links usable:
	(http://
	(http://web.archive.org/*/
  1. Ran many additional find and replace operations as needed. The results of which can be seen in the Revision History during May and June 2020.
  2. Some multiple-line code formatting did not get converted well. Opened the directory containing all original HTML pages in Visual Studio Code. Ran a series of RegEx find operations on all pages as documented below. Then manually corrected the code in the corresponding markdown files.
^[^t]*?</tt>
manual fix
^[^>]*?</tt>
manual fix
"code"
manual fix
  1. As of this writing, this conversion appears to have worked very effectively. The most notable loss seems to be the loss of HTML indent formatting. It did not appear possible to properly convert the old inconsistent HTML indents into Markdown bullet trees. Attempts produced significant inconsistencies and issues. Hence some pages may need to be manually formatted with Markdown bullet trees as needed.

Comments

Page Status

Updated Formatting? Yes
Updated for HWRM? Yes

⚠️ **GitHub.com Fallback** ⚠️