Guide for new POD contributors - pod4lib/aggregator GitHub Wiki

This guide is aimed at new contributors and provides a high level overview of the initial process of contributing data to POD. More detailed documentation is found elsewhere in this wiki and will be linked to where appropriate.

Organization account creation and management

Each organization making data contributions to POD must have an organization account created by a POD administrator. At least one account owner must be identified for each organization. The account owner can then invite and manage other users for the organization, as well as manage other aspects of the organization's account, such as creating and deleting access tokens for use with the POD API.

Testing data contributions

When a stream is marked as the default it indicates to consumers that the stream contains the current complete set of your data. It is important not to mark a stream as default until it contains a complete set of data from your organization. It is often desirable when getting started as a POD contributor to create a new test stream that is not the default. You can contribute data to this stream for testing purposes without confusing data consumers. It will also help POD administrators to evaluate and identify any data quality and compatibility concerns before a complete set of records is contributed.

Data format recommendations

While POD will accept MARC in both compressed and uncompressed MARC binary and MARCXML, we recommend and prefer gzipped MARCXML or gzipped MARC binary. All contributed records must be valid MARC binary or valid MARCXML and encoded as UTF-8.

Delete files must be valid MARC binary (gzipped or not), MARC XML (gzipped or not), or a plain text newline delimited list of 001 ids. Plain text delete files must not be compressed. See more details on the Data requirements page.

Creating and contributing to a production data stream

Create a new non-default stream and send a complete set of your records to this stream. We'd generally prefer to receive files with no more than about 200,000 records per file, but this is not an enforced limit. Once a complete set of records is contained by the stream you should begin sending incremental adds, updates, and deletes to this stream. There is no strict rule about the frequency of updates, but most contributors send their updates once daily and this is generally preferred. POD calculates deltas for data consumers once per day so there is no benefit to sending updates more frequently.

Setting a stream as the default

Once a complete set of records is contained by the stream and incremental updates are successfully being added to your stream, contact a POD administrator. They will mark your stream as the default and start a job to generate the initial full dump of your records for data consumers. Once the full dump has been generated and the stream is marked as the default, daily deltas based on the updates received since the last delta will be calculated automatically once each day.