Updating Depositors - psu-libraries/scholarsphere GitHub Wiki

In order to properly migrate resources from version 3 to version 4, we must have the depositor correctly identified. Every resource that has been created in Scholarsphere was done so by a person affiliated with Penn State. Each of these persons will have an access id, as well as a name.

Missing Depositors

In some cases, there are depositors who have a version 3 Agent record, but do not have an associated User record in the Scholarpshere 3 database. To find this, get a list of depositors from Solr and check for any missing user records:

users = {}

ActiveFedora::SolrService.query('depositor_ssim:*', fl: ['depositor_ssim'], rows: 100000).map do |hit|
  users[hit['depositor_ssim'].first] = nil
end

users.map { |user, value| users[user] = User.find_by(login: user) }

Create new user records for those depositors that don't have one:

users.select { |user, value| value.nil? }.keys.map do |user|
  u = User.create(login: user, email: user)
  u.populate_attributes
end

Depositors Report

All depositors should be linked to their corresponding Agent record. This is should be done via their access account. In some cases, there is a depositor id that will NOT have an Agent record. These are proxy deposit-type situations and are acceptable. However, there are some cases where the depositor is the actual creator Agent, and they need to be linked via their Penn State id, if they aren't already. This happens if the user omits or removes their Penn State id from creators form when adding a work.

As of April 24, 2020 all of the depositors with missing links between their User and Agent records have been updated. When running a migration, we need to re-run the depositor's report to ensure there aren't any new depositors that need to have this link made to their Agent record.

To run the report, go to the console of the jobs server

report = Scholarsphere::Migration::DepositorsReport.new

It will take a few moments to generate the report for all depositors. Alternatively, if you only need to check a specific list of depositors, you can supply one:

report = Scholarsphere::Migration::DepositorsReport.new('abc123', 'xyx456')

Once complete, you can see which users do not have linked agent record:

report.missing

This may also take a few moments while it searches for the Agents that might match a given depositor. Check the count of those missing their links with report.missing.count and you can save the report to json and examine the missing agents:

report.missing_depositors_report

Copy the resulting missing_depositors.json file to your computer and look through the results. Those depositors that are proxies for other users will have a name completely different than the names associated with the works or collections they have deposited. However, if a user has an agent identical or very similar to their user name, then it is likely that their agent record needs to be updated with their Penn State id.

As of Oct. 29, 2020 there are only 27 "missing" depositors, but these are acceptable because either we can't tell which creators is associated with the Penn State access id, or it was a proxy deposit and the proxy depositor was never indicated as such.

Updating an Agent

After running the depositor's report, you can update any agents as needed. This will link an agent with their associated user record. In some cases, you may need to merge multiple agents into on user record.

To update a given Agent with its Penn State id:

Agent.find(id).update(psu_id: psu_id)

Merging Agents

If there are two Agent records for a single depositor, you can merge the two by moving any aliases from the other record over to the principal one.

def merge_agents(principal_id, other_id)
  Agent.find(other_id).alias_ids.map do |id|
    Alias.find(id).update(agent_id: principal_id)
  end
end

To merge the two records:

merge_agents(agent_id, duplicate_agent_id)

Then, update the right agent with the Penn State id, if needed:

Agent.find(agent_id).update(psu_id: psu_id)

There is no need to remove the other Agent, these records will simply not be migrated.