Parse Copyright Notices - kellnerd/musicbrainz-bookmarklets Wiki

Description

The userscript provides a text parser in the release relationship editor where you can paste credits or load the contents of the release annotation. It tries to extract copyright and legal information from the text input and assists the user to create relationships for these.

Relationships will be added at release level by default. Additionally you can create phonographic copyright relationships at recording level by ticking the checkboxes of the desired recordings.

The parser generally assumes that a copyright holder is a label entity, unless the name of the copyright holder matches one of the release artists. It automatically opens the appropriate auto-complete dialog for all unknown names and asks the user to select or create the correct entity. Confirming the dialog creates the relationship and lets the parser continue with the next credit (cancelling the dialog skips the creation of a relationship for the current credit).

Once the user has selected a match for a given name, the userscript caches the MBID of this match and will not ask the user to match the same name again. In case that an incorrect entity has been selected at some point, the CTRL key allows the user to bypass the cache and force a new search in order to overwrite the old cache entry.

Successfully parsed credit lines will be appended to the edit note, optionally they can also be removed from the input so that only the skipped lines remain.

Supported notice formats and relationship types

The userscript performs the following steps for all copyright notices:

  1. Parser: Recognizes specific patterns in copyright notices and tries to extract the type, the name of the copyright holder and the optional year (or multiple years) from the given input (as text). This step does neither differentiate between release and recording level relationships nor does it care whether the name belongs to a label or an artist.

  2. Mapper: Decides whether the copyright holder is an artist or a label and maps the type to an internal relationship type ID. Automatically fills the relationship dialogs with this ID, the credited name and the year (skipped if there are multiple unspecific years) before it waits for the user to select the correct target entity. If the cache already contains an entry for the correct entity type and the given name, the dialog can be confirmed automatically.

The following formats and relationship types are supported:

After these types, the parser expects the name of the copyright holder(s), which can be either labels or artists. Multiple names have to be separated by slashes. The parser extracts the entire text until it reaches a terminator, which can be the end of the line, a comma or a full stop. Therefore, the following special cases have to be taken into account, too:

If you want to know the exact details about the parser, have a look at the underlying regular expressions. You can find lots of tools which can explain them to you (e.g. https://regex101.com) or you can study this beautiful railroad diagram representation (where I have combined and annotated the expressions).

Collection of unsupported copyright notice formats

Entries for formats, which had caused issues previously, but are supported by now, have been ticked off in this list and added to the test cases.

The major problem is that the userscript has to reliably detect the end of the copyright holder's name. For the easy cases that was just a comma or a full stop, but we also need a special handling for company suffixes after a comma and/or dots which are part of the company suffix.

Version 2022.1.11 now detects "Inc." and "Ltd." (also without trailing dot), "LLC", "LLP", and " under " (for "X under exclusive license to Y") in addition to comma and full stop. Please let me know if you find more patterns which end the name of a copyright holder.

I should probably add some customization options to ignore certain special characters (terminators and split symbols) in artist and label names. Maybe a checkbox "Try to split multiple names" (enabled by default) or a text input "Split names by" (default: "[/&]") and a text input "Name terminators" (default: ",|\.| under ") would be a good solution.

Types

Company suffixes

Other critical suffixes which could possibly occur (i.e. containing dots or prefixed by a comma):

Name terminators

Other formats