Shot Detection - AudiovisualMetadataPlatform/amp_documentation GitHub Wiki

Shot Detection

Inputs
Output Formats
MGMs in AMP
Notes on Use
Use Cases and Example Workflows
Evaluating Shot Detection MGMs
AMP JSON Output

Shot detection is a method for finding transitions in video content and separating them into separate structural temporal units. (This is different from scene detection which attempts to identify single events composed of multiple shots. Scenes are hard to differentiate from shots because they are often defined by the semantics of the content.) Automated shot detection can support a number of purposes such as supporting copyright review by providing keyframes from each shot showing any secondary content or potential rightsholders (like art on walls, performers), and detecting scene changes in theater productions. Depending on the tool, types of shot transitions may include dissolve (fade out/in), cut, pan, zoom in, and zoom out.

Inputs

Video file

Output Formats

amp_shots: Start and end timecodes for all shot transitions detected in AMP JSON format (see below).
contact_sheets: [Image of middle frame thumbnails with labels and start and end timecodes for all shots detected.]

MGMs in AMP

PySceneDetect

PySceneDetect is an open-source application that offers two options for detecting transitions in video--content, which looks for changes in the content of frames, and threshold, which uses a set intensity level for frames and looks for changes in average intensity to detect transitions. The AMP implementation uses the content algorithm.

Parameters:

Threshold: Sensitivity threshold of the shot detection.

Azure Video Indexer

Azure Video Indexer is a proprietary video intelligence platform from Microsoft. Shot and scene detection is included as part of this platform. The AMP implementation uses the shot detection feature only.

Notes on Use

Azure Video Indexer can also perform scene detection, but it is not implemented in AMP because it performed very poorly during evaluation.
Each of the tools has at least one of the outputs checked by default. Checked outputs will display the workflow step and output in the dashboard when "[Show Relevant Results Only" is turned on.] See Tips for Creating a Workflow for an explanation of what each output option means.

Use Cases and Example Workflows

Use case 1: Copyright review

A rights specialist is reviewing a collection of videos that the archive would like to put online. The archive has the rights to the videos, but the specialist needs to check for any rights-protected content within the video, such as art on walls or performers who might have performance rights. The specialist could review the videos more efficiently if they could skim through a contact sheet of thumbnails showing all of the content within view of each shot instead of scrubbing through the entire video. They send each video through a shot detection MGM, which detects the shots and outputs a contact sheet for each video showing a thumbnail for each shot along with timecodes, in case the specialist needs to reference the video.

* *

Notes:

In this example, the rights specialist is using Azure for shot detection. Because Azure rolls all of its video tools into one service, the rights specialist must add the Azure Video Indexer step first and then add Azure Shot Detection to convert the shot transition data from Azure Video Indexer into AMP JSON for further conversion. The Azure Artifact OCR JSON box is unchecked because we do not need the additional VOCR file produced by Azure for this workflow.
The Contact Sheet by Shot step converts the shots in the AMP JSON format into contact sheet images.

Use case 2: Processing a new collection

A collection manager is processing a new collection which includes 3 boxes of unreviewed U-matic tapes. The tapes have been digitized and the CM needs to review and process the digital files. Based on their knowledge of the collection, they suspect that the tapes contain a lot of gaps and unused portions. They would like to be able to efficiently review the tapes' content without having to scrub empty tape, so they would like to review contact sheets that show individual shots. They send the files through shot detection, which detects these empty segments of video as their own thumbnails and includes the timecodes, so that the CM can easily skip over these segments and even cut them out for access copies.

Notes:

The collection manager is using the PySceneDetect shot detection MGM, which can take the video file input directly. (Even though it has "scene" in the name, it is only detecting shots.)
The Contact Sheet by Shot step converts the shots in the AMP JSON format into contact sheet images.

Evaluating Shot Detection MGMs

There is one test for evaluating AMP's shot detection MGMs: Precision/Recall of Shots.

Precision/Recall of Shots

This test inputs a structured list of timestamp ranges representing the start and end of each shot and compares it to the shot detection MGM output to find true positives, false positives, and false negatives and calculate precision, recall, and F1, accounting for a threshold of seconds before and after each transition.

Parameters

Analysis threshold: the number of seconds buffer (float) for counting a true positive (match between the ground truth and MGM output). For example, a 2-second threshold will consider a GT and MGM segment a match if both the start and end times for each fall within 2 seconds of each other.

Scores generated

Total GT shots
Total MGM shots
Count of true positives
Count of false negatives
Count of false positives
Precision
Recall
F1
Accuracy

Output comparison

This test outputs a table with the ground truth start and end time codes and label for each shot alongside the time codes and labels for MGM output. Time codes for true positives are listed on the same row while time codes for false positives and false negatives are listed on separate rows. Reviewing this comparison can help you see where in the audio the MGM was incorrect and decide how important these errors are to your use case. It can also be helpful to include the type of shot transition (ex. cut, dissolve, zoom in/out, pan) as a separate column in your ground truth, so you can see if the MGM identified certain transitions more easily than others to determine if the MGM will be appropriate for your use case or collection.

Example:

comparison	gt_start	gt_end	start	end	transition_type (optional)
true positive	0:00:00	0:00:40	0:00:00	0:00:40	cut
false negative	0:00:40	0:00:50			dissolve
false positive			0:00:40	0:00:56
false positive			0:00:56	0:00:58
true positive	0:00:50	0:01:05	0:00:59	0:01:06	cut
true positive	0:01:05	0:01:17	0:01:06	0:01:18	zoom in

Creating Ground Truth

Create a CSV with a minimum of two columns--start and end. Values for start and end should be recorded as hh:mm:ss or in seconds (with decimal). For best results, shots should start at the end of the previous one (ex. If a shot ends at 00:45:12, the next shot should start at 00:45:12.) Optionally, include the type of shot transition (ex. cut, pan, zoom in/out, dissolve) as a separate column to see how well the MGM handles different types of transitions.

Example:

start	end	transition_type
0:00:00	0:00:40	cut
0:00:40	0:00:50	dissolve
0:00:50	0:01:05	cut
0:01:05	0:01:17	zoom in

Sample Evaluation Use Cases

Use case 1: Copyright review

A rights specialist is reviewing a collection of videos that the archive would like to put online. The archive has the rights to the videos, but the specialist needs to check for any rights-protected content within the video, such as art on walls or performers who might have performance rights. The specialist could review the videos more efficiently if they could skim through a contact sheet of thumbnails showing all of the content within view of each shot instead of scrubbing through the entire video. They send each video through the shot detection MGM, which detects the shots and outputs a contact sheet for each video showing a thumbnail for each shot along with timecodes, in case the specialist needs to reference the video.

Success measures

Shot detection correctly detects as many transitions as possible.

Key metrics

High recall (As many correct transitions are detected as possible. False positives may clutter contact sheet results with more information than necessary, but will not negatively affect the reviewer's ability to see true positives.)

Qualitative measures

Review false negatives to see what types of transitions are being missed most frequently. Does one tool handle this type of transition better than others?

Use case 2: Processing a new collection

A collection manager is processing a new collection which includes 3 boxes of unreviewed U-matic tapes. The tapes have been digitized and the CM needs to review and process the digital files. Based on their knowledge of the collection, they suspect that the tapes contain a lot of gaps and unused portions. They would like to be able to efficiently review the tapes' content without having to scrub empty tape, so they would like to review contact sheets that show individual shots. They send the files through shot detection, which detects these empty segments of video as their own thumbnails and includes the time codes, so that the CM can easily skip over these segments and even cut them out for access copies.

Success measures

Shot detection correctly detects transitions before and after dead air.

Key metrics

High recall (As many correct transitions detected as possible)

Qualitative measures

Review results to see if transitions before and after empty tape are being detected.

AMP JSON Output

Element	Datatype	Obligation	Definition
media	object	required	Wrapper for metadata about the source media file.
media.filename	string	required	Filename of the source file.
media.duration	string	required	The duration of the source file..
shots	array	required	The list of shots in the video.
shots[*]	object	optional	A shot in a video.
shots[*].type	string	required	The type of shot, "scene" or "shot".
shots[*].start	string	required	The start time in seconds (s.fff).
shots[*].end	string	required	The end time in seconds (s.fff).

Sample output

Sample Output

{
    "media": {
        "filename": "myvideo.mp4",
        "duration": "45.35"
     },
    "shots": [
        {
            "type": "scene",
            "start": "0.0",
            "end": "45.35"
        },
        {
            "type": "shot",
            "start": "0.0",
            "end": "10.89"
        },
        {
            "type": "shot",
            "start": "10.89",
            "end": "19.4"
        },
        {
            "type": "shot",
            "start": "19.4",
            "end": "45.35"
        }
    ]
}

Attachments:

SD1.png (image/png)
SD2.png (image/png)
Screen Shot 2022-09-30 at 12.35.52 PM.png (image/png)
Screen Shot 2022-09-30 at 12.45.44 PM.png (image/png)
Screen Shot 2022-09-30 at 12.48.05 PM.png (image/png)
Screen Shot 2022-09-30 at 1.17.19 PM.png (image/png)\

Document generated by Confluence on Feb 25, 2025 10:39

Shot Detection - AudiovisualMetadataPlatform/amp_documentation GitHub Wiki

Shot Detection

Inputs

Output Formats

MGMs in AMP

PySceneDetect

Azure Video Indexer

Notes on Use

Use Cases and Example Workflows

*** ***

Evaluating Shot Detection MGMs

Precision/Recall of Shots

Parameters

Scores generated

Output comparison

Creating Ground Truth

Sample Evaluation Use Cases

Use case 1: Copyright review

Success measures

Key metrics

Qualitative measures

Use case 2: Processing a new collection

Success measures

Key metrics

Qualitative measures

AMP JSON Output

Sample output

Attachments:

* *