07 Special characters and escaping characters - practicalseries/GitHub-Wiki-Design-and-Implementation GitHub Wiki
GitHub uses the Segoe UI font (pronounced seg-o-ee) as standard across all its Wiki pages (at least it does on Windows based browsers). Segoe is a large font with a very great number of characters all of which can be accessed by using escape codes.
โEscapingโ a character is a term that applies to special characters that would otherwise be used to format the text in some way (asterisks for example), it allows the character to be displayed as a character rather than being interpreted as a formatting instruction.
This โescapingโ process takes different forms for different languages (Markdown, HTML &c.), but always results in the true character being rendered.
This escaping process allows the full range of the Segoe UI font characters to be accessed (not just the ones that can be entered from a keyboard). Thing like this:
โโโโ โ โ โ โฌ
There is a spreadsheet with the full Segoe UI character set here:
Itโs a big list, there are 40,000 characters in it (not all of which render on GitHub, but 26,684 of them do).
These are also listed in Appendix C of this Wiki.
Markdown has a mechanism for displaying characters that would otherwise be use to format text (asterisks for example), this mechanism is called โescaping the characterโ. It is also possible to use a Unicode value to display a specific character (in either decimal or hexadecimal format), plus all the HTML symbol codes (the ones that begin with an ampersand) are also supported.
With Markdown, to display a literal character (i.e. to make the character appear in the text rather than format the text), precede it with a backslash character \
.
For example if the following Markdown text were used:
Markdown and GitHub output |
---|
|
* Without a backslash this is rendered as a list.
|
|
|
Section 8 explains about lists. The point here is that if we wish to display the asterisk as an asterisk, we need to escape it (by adding a backslash):
Markdown and GitHub output |
---|
|
\* With a backslash it renders as an asterisk.
|
|
* With a backslash it renders as an asterisk. |
The following characters can all be โescapedโ by placing a backslash before the character:
Character | Name | Escape symbol |
---|---|---|
\ |
Backslash |
|
` |
Backtick |
|
* |
Asterisk |
|
_ |
Underscore |
|
{โ} |
Braces |
|
[โ] |
Brackets |
|
<โ> |
Angle brackets |
|
(โ) |
Parentheses |
|
# |
Hash sign |
|
+ |
Plus sign |
|
- |
Minus sign (hyphen) |
|
. |
Full stop |
|
! |
Exclamation mark |
|
| |
Pipe |
|
Table 7.1 โ Markdown escapable characters |
Like Markdown, HTML has reserved characters, mostly the less than <
and greater than signs >
. These can also be escaped in HTML by using escape sequences.
There are several of these reserved characters in HTML:
Less than |
< |
Replacement code: |
|
Greater than |
> |
Replacement code: |
|
Ampersand |
& |
Replacement code: |
|
Double quotation mark |
" |
Replacement code: |
|
Single quotation mark |
' |
Replacement code: |
|
Table 7.2 โ HTML reserved characters and escape sequences |
---|
HTML provides a series of escape sequences (sometimes called symbol codes) that start with an ampersand &
followed by a meaningful group of characters (well, meaningful in a way, some require a degree of interpretation) and ending with a semicolon ;
. For example, the escape sequence for a less than symbol <
is <
.
Escape sequences always start with the ampersand character &
and end with a semicolon ;
.
Basically, HTML escape sequences are a group of characters that are translated by the browser into a specific symbol.
Whenever the browser comes across the sequence of characters <
, it will display a less than sign <
.
There are many other HTML escape sequences for characters not accessible via the key board μ
for example displays the Greek Mu character ยต. Appendix A contains a full list of all HTML escape sequences.
All of these HTML escape sequences work in Markdown, just put them in the text and GitHub will display them correctly.
The escape sequences of the previous section are one way of displaying reserved and non-keyboard characters. These escape sequences are intended to be intuitive mnemonics for the symbols they represent (that said, I usually have to look them up). The problem is that not every character has one.
It is possible to use the Unicode value of the character as an โescape codeโ (as opposed to an escape sequence).
Every character that can be displayed has a Unicode value (a number), specifically a value given using the Unicode transformation format-8 (UTF-8๐ 1). For example, the letter โAโ has a Unicode value of 65
, โBโ 66
&c., there is a full list on Wikipedia: https://en.wikipedia.org/wiki/List_of_Unicode_characters.
Appendix C contains a spreadsheet with the full character set.
In HTML and GitHub Flavoured Markdown, any character can be entered by using its Unicode value as an escape code. HTML escape codes are preceded by the ampersand and hash characters &#
and finished with a semicolon ;
.
Continuing the previous example Unicode value for the letter โAโ is 65
(decimal). To enter the letter โAโ in HTML using an escape code, use the following:
โโโA
GitHub Markdown accepts the use of both HTML escape sequence and escape codes.
The following tables gives a list of common escape sequence and escape codes Appendix A has a complete list of all HTML escape sequences and codes (they mostly all work in GitHub Markdown, there are some exceptions though, these are listed in section 7.2.1).
Mathematical | HTML | Code | athematical | HTML | Code | ||
---|---|---|---|---|---|---|---|
ร | Multiplication sign | × | × | โฉ | Intersection | ∩ | ∩ |
รท | Division sign | ÷ | ÷ | โซ | Integral | ∫ | ∫ |
โ | Minus sign | − | − | โ | Almost equal to | ≈ | ≈ |
ยฑ | Plus/minus sign | ± | ± | โ | Not equal to | ≠ | ≠ |
โ | Fraction slash | ⁄ | ⁄ | โก | Identical to | ≡ | ≡ |
โ | N-array product | ∏ | ∏ | < | Less than | < | < |
โ | N-array summation | ∑ | ∑ | > | Greater than | > | > |
โ | Square root | √ | √ | โค | Less than or equal to | ≤ | ≤ |
โ | Infinity | ∞ | ∞ | โฅ | Greater than or equal to | ≥ | ≥ |
HTML reserved |
HTML | Code | HTML reserved | HTML | Code | ||
< | Less than | < | < | " | Quotation mark | " | " |
> | Greater than | > | > | ' | Single quote | ' | ' |
& | Ampersand | & | & | ||||
Miscellaneous |
HTML | Code | Miscellaneous | HTML | Code | ||
โ | Leftwards arrow | ← | ← | ยฆ | Broken vertical bar | ¦ | ¦ |
โ | Upwards arrow | ↑ | ↑ | ยฐ | Degree sign | ° | ° |
โ | Rightwards arrow | → | → | ยท | Middle dot | · | · |
โ | Downwards arrow | ↓ | ↓ | โข | Bullet | • | • |
โ | Left right arrow | ↔ | ↔ | ||||
โโโ |
Spacing |
HTML | Code | โโโ | Spacing | HTML | Code |
โโโ | Em space |   |   | โ โ | Space |   | |
โโโ | Number space |   |   | โโ โ | Em/4 space |   |   |
โโโ | En space |   |   | โโโ | Punctuation space |   |   |
โโโ | Em/3 space |   |   | โโโ | Thin space |   |   |
โ โ | Non-breaking space | |   | โโโ | Hair space |   |   |
Currency |
HTML | Code | Currency | HTML | Code | ||
$ | Dollar | $ | $ | ยข | Cent sign | ¢ | ¢ |
ยฃ | Pound sign | £ | £ | ยฅ | Yen | ¥ | ¥ |
โฌ | Euro sign | € | € | ยค | Curren | ¤ | ¤ |
Numbers |
HTML | Code | Numbers | HTML | Code | ||
ยน | Superscript one | ¹ | ¹ | ยฝ | Fraction one half | ½ | ½ |
ยฒ | Superscript two | ² | ² | ยผ | Fraction one quarter | ¼ | ¼ |
ยณ | Superscript three | ³ | ³ | ยพ | Fraction three quarters | ¾ | ¾ |
Punctuation |
HTML | Code | Punctuation | HTML | Code | ||
ยก | Inverted exclamation mark | ¡ | ¡ | โฆ | Horizontal ellipsis | … | … |
ยฟ | Inverted question mark | ¿ | ¿ | โพ | Overline | ‾ | ‾ |
โ | Left double quote | “ | “ | ยง | Section sign | § | § |
โ | Right double quote | ” | ” | ยถ | Paragraph sign | ¶ | ¶ |
โ | Double low-9 quote | „ | „ | ยฉ | Copyright sign | © | © |
โ | Left single quote | ‘ | ‘ | ยฎ | Registered trademark sign | ® | ® |
โ | Right single quote | & rsquo; | ’ | โข | Trademark sign | ™ | ™ |
โ | Single low-9 quote | ‚ | ‚ | ยฌ | Not sign | ¬ | ¬ |
โ | Lozenge | ◊ | ◊ | ยต | Micro sign | µ | µ |
ยซ | Left double angle quote | « | « | โฐ | Per mille sign | ‰ | ‰ |
ยป | Right double angle quote | » | » | โฒ | Prime (straight quote) | ′ | ′ |
โน | Single left angle quote | ‹ | ‹ | โณ | Double prime (straight quote) | ″ | ″ |
โบ | Single right angle quote | › | › | โ | Dagger | † | † |
โ | En dash | – | – | โก | Double dagger | ‡ | ‡ |
โ | Em dash | — | — | ||||
Greek small letters |
HTML | Code | Greek capital letters | HTML | Code | ||
ฮฑ | Alpha | &alpha | α | ฮ | Alpha | &Alpha | Α |
ฮฒ | Beta | β | β | ฮ | Beta | Β | Β |
ฮณ | Gamma | γ | γ | ฮ | Gamma | Γ | Γ |
ฮด | Delta | δ | δ | ฮ | Delta | Δ | Δ |
ฮต | Epsilon | ε | ε | ฮ | Epsilon | Ε | Ε |
ฮถ | Zeta | ζ | ζ | ฮ | Zeta | Ζ | Ζ |
ฮท | Eta | η | η | ฮ | Eta | Η | Η |
ฮธ | Theta | θ | θ | ฮ | Theta | Θ | Θ |
ฮน | Iota | ι | ι | ฮ | Iota | Ι | Ι |
ฮบ | Kappa | κ | κ | ฮ | Kappa | Κ | Κ |
ฮป | Lambda | λ | λ | ฮ | Lambda | Λ | Λ |
ฮผ | Mu | μ | μ | ฮ | Mu | Μ | Μ |
ฮฝ | Nu | ν | ν | ฮ | Nu | Ν | Ν |
ฮพ | Xi | ξ | ξ | ฮ | Xi | Ξ | Ξ |
ฮฟ | Omicron | ο | ο | ฮ | Omicron | Ο | Ο |
ฯ | Pi | π | π | ฮ | Pi | Π | Π |
ฯ | Rho | ρ | ρ | ฮก | Rho | Ρ | Ρ |
ฯ | Sigma 1 | ς | ς | ฮฃ | Sigma | Σ | Σ |
ฯ | Sigma 2 | σ | σ | ||||
ฯ | Tau | τ | τ | ฮค | Tau | Τ | Τ |
ฯ | Upsilon | υ | υ | ฮฅ | Upsilon | Υ | Υ |
ฯ | Phi | φ | φ | ฮฆ | Phi | Φ | Φ |
ฯ | Chi | χ | χ | ฮง | Chi | Χ | Χ |
ฯ | Psi | ψ | ψ | ฮจ | Psi | Ψ | Ψ |
ฯ | Omega | ω | ω | ฮฉ | Omega | Ω | Ω |
Table 7.3 โ HTML common escape sequences and codes |
The escape codes listed above use decimal numbers for the Unicode characters A
where 65
is the decimal value of the Unicode number for โAโ.
The escape codes can also be given in hexadecimal format and this will work within GitHub Markdown and Wiki pages. The hexadecimal equivalent of 65
is 41
. To use the hexadecimal number in an escape code, precede it with &#x
and follow it with a semicolon ;
. Thus:
A
and A
both display the โAโ character.
Tip
Being able to use hexadecimal notation is useful; simply because Unicode char-acters are generally given in hexadecimal format. U+0041
is Unicode for โAโ.
Markdown (and GitHub Markdown) ignore multiple spaces. In the following example, the two words โTESTโ are separated by five spaces:
Markdown, HTML equivalence and GitHub output |
---|
|
TEST TEST
|
|
TEST TEST |
Table 7.4 โ Markdown ignores multiple consecutive spaces |
Markdown simply ignores the multiple, consecutive spaces.
Markdown does not, however, ignore the non-breaking space character ,
this has exactly the same spacing as a normal space character, but will always be rendered by Markdown.
This is the same example with five non-breaking spaces between the two words โTESTโ
Markdown, HTML equivalence and GitHub output |
---|
|
TEST TEST
|
|
TEST TEST |
Table 7.4 โ Markdown does not ignore multiple consecutive non-breaking spaces |
Markdown supports several such space characters; this is a full list โ the spaces are bounded by full blocks to give an idea of the width of each type of space, the second column shows four of each type of space to emphasize the different relative sizes:
Single Space | Four Spaces | Name | Escape sequence |
---|---|---|---|
โโโ |
โโโโโโ |
Em space |
|
โโโ |
โโโโโโ |
Number space |
|
โโโ |
โโโโโโ |
En space |
|
โโโ |
โโโโโโ |
Em/3 space |
|
โ โ |
โ โ |
Non-breaking space |
|
โโ โ |
โโ โ โ โ โ |
Em/4 space |
|
โโโ |
โโโโโโ |
Punctuation space |
|
โโโ |
โโโโโโ |
Thin space |
|
โโโ |
โโโโโโ |
Hair space |
|
Table 7.6 โ Different spaces and relative widths |
These different size spaces are used extensively in the PracticalSeries Wiki page headings and tables of contents to ensure that the gaps between the heading numbers and the heading text are consistent.
The size of the gap between the heading number on the left and the heading text on the right, depends on how many numbers there are (90.10.20 has six numbers, 1.2.4 only has three. Both are valid section numbers, but the first will have a smaller space between the last full stop and the heading text).
The width of each type of space depends where the space is used. The space in a heading (all headings are different) is generally larger than the same space used in body text. Similarly, if the text is in a sidebar or footer, the spacings are again different for headings and body text.
The following tables give the width in pixels of each different type of space character for all headings and body text in both the main page area and sidebars/footers (there is no difference between space sizes in sidebars and in footers, they are the same).
Space witdths in pixels for the main page | |||||||
---|---|---|---|---|---|---|---|
H1 |
H2 |
H3 |
H4 |
H5 |
H6 |
Body text |
|
Em space |
32.00 |
24.00 |
20.00 |
16.00 |
14.00 |
13.60 |
16.00 |
Number space |
17.77 |
13.33 |
11.10 |
8.87 |
7.77 |
7.53 |
8.63 |
En space |
16.00 |
12.00 |
10.00 |
8.00 |
7.00 |
6.80 |
8.00 |
Em/3 space |
10.63 |
8.00 |
6.63 |
5.33 |
4.67 |
4.53 |
5.33 |
Normal space |
8.80 |
6.60 |
5.47 |
4.40 |
3.87 |
3.73 |
4.37 |
Non-breaking space |
8.80 |
6.60 |
5.47 |
4.40 |
3.87 |
3.73 |
4.37 |
Em/4 space |
8.00 |
6.00 |
5.00 |
4.00 |
3.50 |
3.40 |
4.00 |
Punctuation space |
7.70 |
5.80 |
4.80 |
3.87 |
3.37 |
3.27 |
3.47 |
Thin space |
6.40 |
4.83 |
4.00 |
3.20 |
2.80 |
2.70 |
3.20 |
Hair space |
4.00 |
3.00 |
2.50 |
2.00 |
1.77 |
1.70 |
2.00 |
Two blocks โโ |
47.00 |
35.00 |
30.00 |
24.00 |
21.00 |
21.00 |
24.00 |
Table 7.7 โ Space widths in the main page (in pixels) |
Space witdths in pixels for sidebars and footers | |||||||
---|---|---|---|---|---|---|---|
H1 |
H2 |
H3 |
H4 |
H5 |
H6 |
Body text |
|
Em space |
24.00 |
18.00 |
15.00 |
12.00 |
10.50 |
10.20 |
12.00 |
Number space |
13.33 |
10.00 |
8.30 |
6.67 |
5.83 |
5.63 |
6.47 |
En space |
12.00 |
9.00 |
7.50 |
6.00 |
5.27 |
5.10 |
6.00 |
Em/3 space |
8.00 |
6.00 |
4.97 |
4.00 |
3.50 |
3.40 |
4.00 |
Normal space |
6.60 |
4.93 |
4.10 |
3.30 |
2.90 |
2.80 |
3.30 |
Non-breaking space |
6.60 |
4.93 |
4.10 |
3.30 |
2.90 |
2.80 |
3.30 |
Em/4 space |
6.00 |
4.50 |
3.73 |
3.00 |
2.63 |
2.53 |
3.00 |
Punctuation space |
5.80 |
4.33 |
3.60 |
2.90 |
2.53 |
2.43 |
2.60 |
Thin space |
4.83 |
3.60 |
3.00 |
2.40 |
2.10 |
2.03 |
2.40 |
Hair space |
3.00 |
2.23 |
1.87 |
1.47 |
1.33 |
1.27 |
1.50 |
Two blocks โโ |
35.00 |
27.00 |
23.00 |
18.00 |
16.00 |
16.00 |
18.00 |
Table 7.8 โ Space widths in sidebars and footers (in pixels) |
Note
All widths in the above tables are measured using the Edge browser with page magnification set to 100% on a monitor set to its native resolution (2560 ร 1440 px).
For some reason, some HTML escape sequences do not work in GitHub Wiki Markdown.
This is true only when the escape sequences are between HTML tags, i.e. in a table <table>โฆ</table>
or between <p>โฆ</p>
tags for example.
Mainly this affects some of the special space characters:
Name | Non-functional Esc sequence |
Replacement Esc code (dec) |
Replacement Esc code (hex) |
---|---|---|---|
Number space |
|
|
|
Em/3 space |
|
|
|
Em/4 space |
|
|
|
Punctuation space |
|
|
|
Hair space |
|
|
|
Table 7.9 โ Escape sequences that do not work in GitHub Markdown HTML |
Important
This is only a partial list of the most common escape sequences, a full list is available in Appendix A.2
The alternate decimal and hexadecimal escape codes work everywhere.
Note
The above escape sequences work perfectly well with just Markdown, it is only when they are inside an HTML tag that problems occur.
To complicate things, it is only Wiki Markdown that is affected, all the escape sequences work perfectly well in repository Markdown, see section 5.6.
Emojis and emoticons are pictograms that can be embedded in text to convey some form of emotion, smiley face symbols, that sort of thing. They are popular with teenagers and the intellectually challenged.
GitHub supports a full set of Unicode emojis and these can be pasted directly into a Wiki or Markdown page, they can be entered using short name abbreviations or they can be entered as either decimal escape codes &#โฆ;
or hexadecimal escape codes &#xโฆ;
.
There is a standard version of the short names that can be used for emojis, these are managed by the Unicode CLDR (Common Locale Data Repository), available here: https://cldr.unicode.org/.
The Unicode CLDR provides a full list of all emoji characters, their Unicode character (or string of characters) and the formal short form name, the list is available here: https://unicode.org/emoji/charts/full-emoji-list.html.
GitHub allows short names to be used, these are surrounded by a colon :
before and after, thus, the crossed fingers emoji is displayed in Markdown with the short name:
โโโ:crossed_fingers:
it looks like this: ๐ค
The problem with this approach is that GitHub, in its wisdom, decided not to use the standardised (Unicode CDLR) short names, it uses its own versions with slightly different names.
I thought at first this was so that GitHub could use shorter names than the standard CLDR, for example where the CLDR has the name grinning face
(๐)and GitHub just has :grinning:
.
This argument falls down with the CLDR: smiling face with hearts
and the GitHub :smiling_face_with_three_hearts:
(๐ฅฐ). So Iโve no idea why GitHub have differed.
Appendix B contains a full list of all the emojis. For completeness, it shows both the GitHub short name and the standardised CDLR short name, the decimal escape code and the hexadecimal escape code.
It is possible to insert comments in Markdown text.
Comments are visible in Markdown, but are not displayed when the page is rendered (on a web browser).
Comments in Markdown are identical to those in HTML.
Any text between <!--
and -->
is a comment and will not be displayed:
Markdown, HTML equivalence and GitHub output |
---|
|
Comments <!-- Like this --> are not displayed
|
|
<p>Comments <!-- Like this --> are not displayed</p>
|
|
Comments are not displayed |
Table 7.10 โ Body text examples |
I think Iโm one of the only people who bother putting comments in their Markdown.
Footnotes:โโโโโ
Note
๐ 1โUTF-8 is a Unicode character set that is backwards compatible with the old 7-bit ASCII char-acters that those of us of a certain age will remember. The 8 means it uses 8-bit blocks (bytes to most people, but octets in the Unicode standard) to represent characters, it can have up to 4 bytes and can represent all Unicode characters (there is a lot of them, โbout a million).โฉ
UTF-8 is the standard character set for web pages and E-mail.