Sanitizing - novoid/lazyblorg GitHub Wiki
Following table does state how the Org-mode elements are sanitized
during HTMLization as implemented in /lib/htmlizer.py →
sanitize_and_htmlize_blog_content(…).
- HTML chars
- HTML characters
- see
sanitize_html_characters(…) - replaces
<,>,&, and m-dash with their HTML representations
- see
- Int. L.
- Internal Links
- see
sanitize_internal_links(…) - Replaces all internal Org-mode links of type =id:foo= or
[[id:foo][bar baz]]with their relative paths to those blog articles
- see
- Ext. L.
- External Links
- see
sanitize_external_links(…) - Replaces all external Org-mode links of type
[[foo][bar]]with<a href“foo”>bar</a>= and re-writes normal URLs as HTML tags as well.
- see
- Text Format
- Transforms simple text formatting syntax into HTML entities
- see
htmlize_simple_text_formatting(…) - see help on basic text formatting
- see
- URL Ampersand
- fixing something I broke above
- see
fix_ampersands_in_url(…) -
sanitize_html_characters(…)(mentioned above) is really dumb and replaces ampersands in URLs as well. This method finds those broken URLs and fixes them. - If this method of fixing something that should be done in a correct way in the first place smells funny, you are right. However, this seemed to be the more efficient way regarding to implementation. Fix it, if you like :-)
- NOTE: Does not replace several ampersands in the very same URL. However, this use-case of several ampersands in one URL is very rare.
- see
- Pandoc
- Org-mode to HTML conversion using pandoc
- I introduced pandoc as a fall-back for converting not yet supported Org-mode elements. This turned out very fine: great performance, great results. I might even think about moving self-implemented HTMLization to pandoc.
- I am using the Python package pypandoc.
- Templates involved
- lazyblorg templates that are involved in the HTMLization process
| Element | HTML chars | Int. L. | Ext. L. | Text Format | URL Ampersands | Pandoc | Templates involved |
|---|---|---|---|---|---|---|---|
| Paragraph | x | x | x | x | x | #PAR-CONTENT# |
|
| Horizontal ruler | |||||||
| Heading | x | x | x | x | x |
#SECTION-TITLE#, #SECTION-LEVEL#
|
|
| List items | x | x | x | x | x | #CONTENT# |
|
| HTML block | x | #NAME# |
|||||
| Verse block | x | x | x | x | #NAME# |
||
| Example block | x | #NAME# |
|||||
| Colon block | x | #NAME# |
|||||
| Quote block | x | x | x | x | #NAME# |
||
| Src block | x | #NAME# |
|||||
| Table | x | x | |||||
| LaTeX block | x | x | |||||
| Others | x | x |