Changes in Version 4.0 - psu-libraries/scholarsphere GitHub Wiki

In the migration from version 3 to version 4, there will be some changes to metadata and access policies.

Metadata

All of the metadata from works will be migrated, as well as most of the metadata from collections; however, no
additional metadata from file sets will be migrated beyond what is outlined below.

Property Works Collections
Title
Subtitle
Description
Identifier
Rights see below see below
Visibility
Embargoes see below see below
Creators see below see below
NOID
Keyword
Resource Type see below
Contributor
Publisher
Subject
Language
Based Near
Related URL
Source
Date Published see below see below
Date Created see below see below
DOI
Permissions
Depositor

Cardinality

Most fields that are multiple, will remain multiple with a few exceptions:

  • any multiple titled resources will be denied prompting the original resource to be updated
    • this is a safety catch since multiple titles are allowed currently in version 3
  • descriptions are concatenated into one field via line breaks
  • published dates are concatenated into one field via line commas

File Sets

  • titles will not be migrated, only the name of the original file
  • versions will not be migrated, only the most recent one
  • NOIDs will be migrated so that any existing links to file sets will redirect to their works
  • permissions will not be migrated
  • depositors will not be migrated

Work Type and Resource Type

The work type is a new field and will be mapped from the existing resource type. Work types map to resources types 1:1, so any existing resource type in version 3 will map to a work type in version 4.

For those version 3 works that have multiple resources types, we will map these to a single work type based on a mapping procedure defined in https://github.com/psu-stewardship/scholarsphere-4/issues/148.

Creators and Depositors

Both creators and depositors are mapped to Actor records in version 4.0. Records of creators that have current credentials in Penn State's identity management system will have their migrated records update with this information. All matches are done via the access account.

For depositors that have missing access account information, a depositors report is generated and any missing information is supplied when possible. For more information, see Updating Depositors

NOIDs

NOIDs for works, collections, and file sets are all migrated, but only for the purpose of supporting redirects from legacy URLs. The NOIDs themselves are not displayed or managed.

Embargoes

Embargoes are migrated, but only for works. If a work has file sets with embargoes, only the furthest embargo is used. For example, a work has two files. The work has an embargo for January 1, 2025, whereas one of the files has an embargo for March 2, 2025. The migrated embargo is March 2, 2025, and it is applied to the entire work and all its contained files.

Rights

Rights are not migrated for collections. Any work with multiple rights will be changed to have only one value. For more information see https://github.com/psu-stewardship/scholarsphere-4/issues/331.

Dates

In order to preserve the original creation dates from version 3, each resource's creation date will be migrated to a new field. These fields are as follows:

  • GenericWork: date_uploaded
  • FileSet: date_uploaded
  • Collection: create_date

Each of these fields will be migrated to the deposited_at field of the Work, FileResource, and Collection objects respectively. Rails will still apply the created_at and modified_at timestamps as it normally does, and any new resources will also have their deposited_at field set to the current time.

In the search and display interfaces, any references to a "creation date" or "deposit date" will always refer to the database's deposited_at field.

Access Policies

With the recent adoption of open access at Penn State, private or restricted works will not be allowed in version 4 of Scholarsphere. Any work that has a restricted visibility or is otherwise not viewable to the public or Penn State community, will be migrated to a draft state so that it can be updated for open access.

To that end, any metadata for a published work will always be viewable by the public. Only access to the files will be restricted based on a few factors.

Visibility

The concept of "visibility" is dropped in favor of "access."

Penn State Access

Works that are designated as Penn State access will have their files restricted to registered users of the Penn State community.

Embargoed Access

When a work is embargoed, only its files are restricted until the embargo is lifted. The depositor and anyone granted "edit" access via access controls will be able to view or download the files.

Collections

Note that while permissions are migrated for collections, any collection in version 4 is always open access since it is only metadata. Migrated permissions in version 4 will only be relevant for determining edit access. Permissions in collections have never been, nor will be, applied to their contained works. Works have their separate permissions regardless of any collection they are in.