Project Description: Phase 1 - WaxCylinderRevival/frus-dates-project GitHub Wiki
-
Timeframe
- October 2016 to September 2017
-
Staff
- Metadata Specialist: Amanda Ross (@WaxCylinderRevival)
- Search Prototype Designer: Joe Wicentowski (@joewiz)
- Backlog Publishers: Virginia Kinniburgh, Stephanie Eckroth
-
By the Numbers
- Between October 21, 2016* and August 16, 2017, we've added at least one dateline to 49,916 historical documents (a 29.93% increase).
- As of August 30, 2017, 99.55% of historical documents (excluding attachments) in the FRUS digital archive have at least one dated dateline.
Category | October 12, 2016 | August 16, 2017 | Change | [August 30, 2017] |
---|---|---|---|---|
Total Documents | 192,930 | 233,684 | +40,754 documents (21.12% increase) | --- |
I. Editorial Notes | 7,765 (4.02%) | 7,916 (3.39%) | +151 editorial notes (0.02% increase) | --- |
II. Historical Documents | 185,165 (95.98%) | 225,768 (96.61%) | +40,603 historical documents (21.93% increase) | --- |
IIa. Historical Documents w/at least 1 dateline
|
166,774 (90.00%) | 216,690 (95.98%) | +49,916 (29.93% increase) | --- |
IIb. Historical Documents (excluding attachments) w/at least 1 dateline
|
--- | --- | --- | [225,113 (99.71%)] |
IIc. Historical Documents (excluding attachments) w/at least 1 dateline//date
|
--- | --- | --- | [224,751 (99.55%)] |
[* October 21, 2016 is the date of the first FRUS-dates-project commit. October 12, 2016 is the date of the first query-based analysis of the FRUS corpus.]
As of August 2017, the majority of non-editorial note documents have now been given a date/date range, including those “undated” by FRUS compilers past.
-
We attempted to establish the most discrete date/date range per document, using:
- Document header
- Document content
- Chapter or subchapter headings with dates/dateTimes
- Dates of sibling documents within the same chapter or subchapter
- Outside research
- Logical rules
- Volume date spans
-
We used the same clues to generate date ranges for imprecise dates such as “April 1976”
-
We also identified non-Gregorian dates within the text, declared the original calendar used, and converted to Gregorian/UTC. Examples include:
<dateline>Dated the <date when="1947-10-09" calendar="tibetan-phugpa"
>25th of the 8th month of Tibetan Fire-Pig Year [1947]</date>.</dateline>
<dateline>
<date when="1865-06-18" ana="#date_undated-inferred-from-document-content"
calendar="masonic-anno-lucis">24th day of the 3d month,
in the year of light 5865</date>
</dateline>
-
Each
date
touched has received an@ana
tag alerting editors to the reason/source behind the date range, in order to maintain an analytical history behind machine-readable date assignment. Examples include:#date_apparent-typo-based-on-document-content
#date_apparent-typo-based-on-document-scan
#date_apparent-typo-based-on-outside-research
#date_editorial-correction
#date_imprecise-inferred-from-date-rules
#date_imprecise-inferred-from-document-content
#date_imprecise-inferred-from-document-scan
#date_imprecise-inferred-from-document-content-and-sibling-dates
#date_imprecise-inferred-from-outside-research
#date_imprecise-inferred-from-sibling-dates
#date_undated-inferred-from-chapter-heading
#date_undated-inferred-from-document-content
#date_undated-inferred-from-document-content-and-sibling-dates
#date_undated-inferred-from-document-head
#date_undated-inferred-from-document-scan
#date_undated-inferred-from-sibling-dates
#date_undated-inferred-from-outside-research
-
These date/date ranges and the
@ana
reasoning can be revised/updated as needed. -
We leveraged
placeName
+date
to add appropriate time zone adjustments, when needed.- We relied on https://www.timeanddate.com/ to identify appropriate historic time zones, which have shifted greatly throughout the FRUS publication span.
-
From there, we took the values of the manually established
@when
|@from
,@to
|@notBefore
,@notAfter
attributes to devise a minimum dateTime (div/@frus:doc-dateTime-min
) and maximum dateTime (div/@frus:doc-dateTime-ax
) for each document. The search prototype works on thediv/@frus:doc-dateTime-min
–div/@frus:doc-dateTime-max
range. -
The documents should appear in the chronologically sort by the first day of their estimated or known date/date range. Where time is not known, they are sorted as being at 12:00 a.m. of that day.
(For more on completed work and future development, please visit Issue Tracking)
Previous: Introduction | Next: Phase 2