EDTF (Extended Date Time Format) - UVicLibrary/Vault GitHub Wiki
EDTF or Extended Date/Time Format is a standardized way of expressing dates/times as a text string. From the Library of Congress website:
The Extended Date/Time Format (EDTF) was created by the Library of Congress with the participation and support of the bibliographic community as well as communities with related interests. It defines features to be supported in a date/time string, features considered useful for a wide variety of applications.
At time of writing, Hyku (out of the box) does not support EDTF. We have integrated EDTF into Vault using the edtf ruby gem, created by Sylvester Keil.
For example, Vault saves dates in EDTF notation (e.g. ["1913/1921"]
) but displays them in humanized form to users (see image below). This conversion is handled by the edtf gem.
Human-readable label on a work's display page
We've also added the ability to sort works by title and date created in the following pages:
- public collection pages - example
- search results (sort results in ascending/descending order by title/date) - example
- search results within a collection - example
Dates are saved in Solr according to the ISO 8601 specification (see "Working with Dates") and can specify year, month, date, and time (UTC) for sorting and faceting purposes. Solr requires you to specify month, day, and time in its notation even if there's no month or day precision specified by the person entering the metadata.
General Principle: we sort according to the earliest possible specified date. For example, 1981
would save into Solr as 1981-01-01T00:00:00Z
, or midnight on Jan. 1, 1981.
For single dates, as opposed to date ranges or intervals, we simply copy the first value in date_created_tesim
to the year_sort_dtsi
and year_sort_dtsim
fields.
If the date is an interval, we want to sort on the earliest date in the interval by ascending/descending order. This earliest date is saved into year_sort_dtsi
. However, when we filter search results (using facets), we want to save every year in the interval into an array (see example in table below).
Note: Single quotes mean that these values are strings.
date_created_tesim is entered by a user. Based on that, Vault generates year_sort_dtsi and year_sort_dtsim when it (re)indexes a work.
Description | date_created_tesim | year_sort_dtsi | year_sort_dtsim |
---|---|---|---|
1. Two days within the same year | [ '1981-04-01/1981-05-02' ] | '1981-04-01T00:00:00Z' | [ '1981-04-01T00:00:00Z' ] |
2. Interval that straddles 2 years; no month or day precision |
[ '1981/1982' ] | '1981-01-01T00:00:00Z' | [ '1981-01-01T00:00:00Z', '1982-01-01T00:00:00Z' ] |
3. Interval that straddles 2 years with day precision | [ '1995-12-01/1996-03-30' ] | '1995-12-01T00:00:00Z' | [ '1995-12-01T00:00:00Z', '1996-03-30T00:00:00Z' ] |
4. Interval that straddles multiple years with day precision | [ '1901-02-04/1910-12-09' ] | '1901-01-01T00:00:00Z' | [ '1901-01-01T00:00:00Z', '1902-01-01T00:00:00Z', '1903-01-01T00:00:00Z'... '1910-12-09T00:00:00Z' ] |
Sorted in ascending order (oldest to newest): 4, 2, 1, 5
Sorted in descending order (newest to oldest): 5, 1, 2, 4
X's are replaced with 0 — so 19XX sorts as midnight on Jan. 1, 1900 and 195X sorts as midnight on Jan. 1, 1950.
All other notation (#, ~ , ?, % ) is ignored — so 1950% sorts as midnight on Jan. 1, 1950.
Hyku saves almost every metadata field as a multiple, i.e. an array with one or more string values (see below). However, Solr can't sort on a multiple field like tesim
. To get around this, we've modified Vault to create two new indexing fields called title_sort_ssi
, year_sort_dtsi
, and year_sort_dtsim
.
Note that these extra data fields will show up only in the Solr interface and not in the rails console.
-
controllers/catalog_controller
- Blacklight configuration for sort fields -
indexers/work_indexer
- where Vault actually creates and indexes those fields
Solr sort alphabetically on title_sort_ssi
, which is a duplicate of the first value in title_tesim
. title_sort_ssi
is a dynamic field with a single string value.
In work_indexer#generate_solr_document
:
solr_doc['title_sort_ssi'] = object.title.first unless object.title.first.nil?
First we handle the special characters and check whether date_created_tesim
is a single date or an interval.
if solr_doc['date_created_tesim']
date = Date.edtf(solr_doc['date_created_tesim'].first.gsub(/~|#/,'').gsub('X','0')) # Account for special characters; see https://github.com/UVicLibrary/Vault/issues/36
if date.class == EDTF::Interval
solr_doc['year_sort_dtsim'] = solrize(date)
solr_doc['year_sort_dtsi'] = solrize(date).first
else # date.class == Date
solr_doc['year_sort_dtsim'] = solr_string(date)
solr_doc['year_sort_dtsi'] = solr_string(date)
end
end
2 functions in work_indexer
do the heavy lifting:
-
solr_string
converts an EDTF date to a Solr-formatted datetime string. -
solrize
uses the first date of anEDTF::Interval
and creates a Solr-formatted datetime string.
# Returns formatted string with time set to midnight; e.g. Wed, 01 Jan 1913 => "1913-01-01T00:00:00Z"
# https://lucene.apache.org/solr/guide/7_7/working-with-dates.html
def solr_string(edtf_date)
date_time = edtf_date.beginning_of_day.to_s.split(" ") - ["UTC"] # => ["1913-01-01", "00:00:00"]
"#{date_time[0]}T#{date_time[1]}Z"
end