Editorial Guidelines - DCMLab/ddd GitHub Wiki

Editorial Guidelines

1. General points

  1. We are not interested in a digital edition that reproduces the sources as closely as possible but in an edition that represents the content of the texts. As such, we do not transcribe title pages, tables of contents, indices etc. Those can be automatically generated from metadata and structural markup.
  2. Table of Contents and other types of non-content pages (i.e. index, corrections) will not be transcribed.
  3. In documents which contain a "corrections" page, the correction will be carried out in the transcription, if possible.
  4. Sometimes there is something printed on the pages that stems from the printing or digitization procedure and which is not part of the text written by the author (e.g. "Riemann, Harmonielehre, 3", "Digitized by Google", etc.). We do not transcribe this.
  5. Separators, such as long lines or a sequence of asterisks between chapters or illustrative images are not to be transcribed.

2. Segmentation and use of Structure Types

with element Type = TextRegion

  • page-number (should only contain an arabic or roman numeral in transcription, disregarding graphical elaborations such as "- 6 -" or "1.")
  • heading (for all headings regardless of level)
  • paragraph (the beginning of a new paragraph is often indicated by the space at the beginning of the line.
  • paragraph-continued (continued tags are to be used if the selected region does not contain the end of the unit, but is going to be continued after a floating element, list or page-break.)
  • footnote (each footnote receives its own markup.)
  • footnote-continued (if a footnote is interrupted by a float or continued on the next page)
  • list (for vertically structured enumerations with two or more elements)
  • list-continued
  • caption
  • other (mostly has been used for title pages and such)

with element Type = Graphic OR Music OR Math OR Table

  • floating (graphics and musical examples are only segmented if they are floating, meaning not in line with the text, otherwise see "6. Placeholders")
  • floating-continued
  • missing (Where a part of the original page appears to be missing from the scan, it will be indicated by using the structure type "missing", to be later supplemented from a different source => Weitzmann 1860, Oettingen)

3. Emphases

  1. We do transcribe the following textual markup: italics, boldface, and l e t t e r s p a c e d, possibly nested, e.g. "This is really, r e a l l y important". A change of font in the source (e. g. roman font interjected into Fraktur text), will not be represented as textual markup.
  2. Text alignment, such as centralized text, is also not represented in the transcription.
  3. In terms of textual markup on emphases, headings, notenames etc., the transcription will try to follow the stylistic choices of the original where possible, except when it interferes with one of the guidelines here mentioned.
  4. Drop caps get ignored in Transcription.

4. Punctuation

  1. No spaces before commas (","), semicolons (";"), colons (":"), question marks ("?"), exclamation marks ("!"), etc.
  2. Spaces around dashes ("-" and "--") as in the source.
  3. Punctuation after a highlighted unit should not be included in the textStyle tag, unless it is actually highlighted as well. If the highlighting concerns longer units (phrases, sentences, paragraphs), punctuation (with the exception of quotation marks) within the unit may be included in the highlighting up to before the final punctuation mark.
  4. Ratios such as "80:81", often appearing as "80 : 81" in texts, should be represented without spaces around the ":" - at least until a way to use non-breakable spaces be discovered.
  5. When a source, due to being written in Fraktur font, uses a double line as a hyphen/dash, the transcription will modernize it to a single dash. (i. e. "C dur=Tonart" becomes "C dur-Tonart")
  6. All double quotationmarks will be unified to the following symbols: opening: (Double low-9 quotation mark, U+201E); closing: (Right double quotation mark, U+0201D), as found on Transkribus' virtual keyboard.
  7. Correspondingly, single quotation marks become (Single low-9 quotation mark) and (right single quotation mark).
  8. Only the beginning and end of a quote will be indicated by a quotation mark, disregarding the antiquated citation style where each line begins with an opening quotation mark.
  9. Symbols indicating an omission in a quote will be transcribed as they appear in the text, i. e. as periods seperated by spaces. . . . . .

5. Musical expressions

  1. For musical alteration symbols, the unicode characters ♭ (not "b"), ♮, ♯ (not "#") and � (double sharp) are used (Unicode: U+266D, U+266E, U+266F, U+1D12A).
  2. Musical time signatures in the text will be transcribed using unicode fractions, as generated by the tool https://qaz.wtf/u/fraction.cgi?text=4%2F4. For the sake of unity, the "over and under" form will be used in all circumstances, even where a single codepoint form happens to exist. So "³⁄₄" will be used instead of "¾", since the latter does not have a comparable form in most common musical timesignatures.
  3. Roman numerals, as used in harmonic analysis, will be signified using the normal letters V, V, v and i in combination.
  4. Subscript/superscript as in source.
  5. Lines written above or below notes (notation used by: Kunkel, Oettingen) are represented using one } per line (for above) and one { per line for below, always immediately after the name of the note. For example, c{ would be a c with one line below it, while c}} would be a c with two lines above it. Other modifiers to the right side of the note name (e.g. superscript), always comes after the brackets.
  6. For primes added after notenames, the unicode symbols , and are used (see Transkribus virtual keyboard). A double prime is neither two single primes (′′), nor a closing quoation mark (). The low equivalent of primes, however, is represented using one or several single low-9 quotation marks (not a comma). See this example line from Oettingen, p. 13[=image 21]:

image

6. Placeholders

  1. For music examples or graphical illustrations which appear in line with the text, the placeholders $$MUSIC and $$GRAPHIC are used respectively. They are written directly into the transcription text.

  2. If specific musical symbols and expressions are difficult or impossible to transcribe using Transkribus (other than the ones mentioned above), we will also signify them with $$GRAPHIC and at a later stage replace them with appropriate LaTeX markup which should be powerful enough to cover all cases. For example, Riemann (1880), p. 37 has these symbols inline: , which we can transkribe as $\bcancel{g}^{\text{7}}$, $\bcancel{a}^{\text{VII}}$;

  3. If the expression is mathematical in nature, such as a fraction or a matrix, $$MATH is to be used.