Shot Detection - AudiovisualMetadataPlatform/amp_documentation GitHub Wiki
- AMP: Audiovisual Metadata Platform
- Documentation
- For Collection Managers
- MGMs (Metadata Generation Mechanisms)
Shot Detection
- Inputs
- Output Formats
- MGMs in AMP
- Notes on Use
- Use Cases and Example Workflows
- Evaluating Shot Detection MGMs
- AMP JSON Output
Shot detection is a method for finding transitions in video content and separating them into separate structural temporal units. (This is different from scene detection which attempts to identify single events composed of multiple shots. Scenes are hard to differentiate from shots because they are often defined by the semantics of the content.) Automated shot detection can support a number of purposes such as supporting copyright review by providing keyframes from each shot showing any secondary content or potential rightsholders (like art on walls, performers), and detecting scene changes in theater productions. Depending on the tool, types of shot transitions may include dissolve (fade out/in), cut, pan, zoom in, and zoom out.
Inputs
Video file
Output Formats
- amp_shots: Start and end timecodes for all shot transitions detected in AMP JSON format (see below).
- contact_sheets: [Image of middle frame thumbnails with labels and start and end timecodes for all shots detected.]
MGMs in AMP
PySceneDetect
PySceneDetect is an open-source application that offers two options for detecting transitions in video--content, which looks for changes in the content of frames, and threshold, which uses a set intensity level for frames and looks for changes in average intensity to detect transitions. The AMP implementation uses the content algorithm.
Parameters:
- Threshold: Sensitivity threshold of the shot detection.
Azure Video Indexer
Azure Video Indexer is a proprietary video intelligence platform from Microsoft. Shot and scene detection is included as part of this platform. The AMP implementation uses the shot detection feature only.
Notes on Use
- Azure Video Indexer can also perform scene detection, but it is not implemented in AMP because it performed very poorly during evaluation.
- Each of the tools has at least one of the outputs checked by default. Checked outputs will display the workflow step and output in the dashboard when "[Show Relevant Results Only" is turned on.] See Tips for Creating a Workflow for an explanation of what each output option means.
Use Cases and Example Workflows
Use case 1: Copyright review
A rights specialist is reviewing a collection of videos that the archive would like to put online. The archive has the rights to the videos, but the specialist needs to check for any rights-protected content within the video, such as art on walls or performers who might have performance rights. The specialist could review the videos more efficiently if they could skim through a contact sheet of thumbnails showing all of the content within view of each shot instead of scrubbing through the entire video. They send each video through a shot detection MGM, which detects the shots and outputs a contact sheet for each video showing a thumbnail for each shot along with timecodes, in case the specialist needs to reference the video.
***
***Notes:
- In this example, the rights specialist is using Azure for shot detection. Because Azure rolls all of its video tools into one service, the rights specialist must add the Azure Video Indexer step first and then add Azure Shot Detection to convert the shot transition data from Azure Video Indexer into AMP JSON for further conversion. The Azure Artifact OCR JSON box is unchecked because we do not need the additional VOCR file produced by Azure for this workflow.
- The Contact Sheet by Shot step converts the shots in the AMP JSON format into contact sheet images.
Use case 2: Processing a new collection
A collection manager is processing a new collection which includes 3 boxes of unreviewed U-matic tapes. The tapes have been digitized and the CM needs to review and process the digital files. Based on their knowledge of the collection, they suspect that the tapes contain a lot of gaps and unused portions. They would like to be able to efficiently review the tapes' content without having to scrub empty tape, so they would like to review contact sheets that show individual shots. They send the files through shot detection, which detects these empty segments of video as their own thumbnails and includes the timecodes, so that the CM can easily skip over these segments and even cut them out for access copies.
Notes:
- The collection manager is using the PySceneDetect shot detection MGM, which can take the video file input directly. (Even though it has "scene" in the name, it is only detecting shots.)
- The Contact Sheet by Shot step converts the shots in the AMP JSON format into contact sheet images.
Evaluating Shot Detection MGMs
There is one test for evaluating AMP's shot detection MGMs: Precision/Recall of Shots.
Precision/Recall of Shots
This test inputs a structured list of timestamp ranges representing the start and end of each shot and compares it to the shot detection MGM output to find true positives, false positives, and false negatives and calculate precision, recall, and F1, accounting for a threshold of seconds before and after each transition.
Parameters
Analysis threshold: the number of seconds buffer (float) for counting a true positive (match between the ground truth and MGM output). For example, a 2-second threshold will consider a GT and MGM segment a match if both the start and end times for each fall within 2 seconds of each other.
Scores generated
- Total GT shots
- Total MGM shots
- Count of true positives
- Count of false negatives
- Count of false positives
- Precision
- Recall
- F1
- Accuracy
Output comparison
This test outputs a table with the ground truth start and end time codes and label for each shot alongside the time codes and labels for MGM output. Time codes for true positives are listed on the same row while time codes for false positives and false negatives are listed on separate rows. Reviewing this comparison can help you see where in the audio the MGM was incorrect and decide how important these errors are to your use case. It can also be helpful to include the type of shot transition (ex. cut, dissolve, zoom in/out, pan) as a separate column in your ground truth, so you can see if the MGM identified certain transitions more easily than others to determine if the MGM will be appropriate for your use case or collection.
Example:
comparison | gt_start | gt_end | start | end | transition_type (optional) |
---|---|---|---|---|---|
true positive | 0:00:00 | 0:00:40 | 0:00:00 | 0:00:40 | cut |
false negative | 0:00:40 | 0:00:50 | dissolve | ||
false positive | 0:00:40 | 0:00:56 | |||
false positive | 0:00:56 | 0:00:58 | |||
true positive | 0:00:50 | 0:01:05 | 0:00:59 | 0:01:06 | cut |
true positive | 0:01:05 | 0:01:17 | 0:01:06 | 0:01:18 | zoom in |
Creating Ground Truth
Create a CSV with a minimum of two columns--start and end. Values for start and end should be recorded as hh:mm:ss or in seconds (with decimal). For best results, shots should start at the end of the previous one (ex. If a shot ends at 00:45:12, the next shot should start at 00:45:12.) Optionally, include the type of shot transition (ex. cut, pan, zoom in/out, dissolve) as a separate column to see how well the MGM handles different types of transitions.
Example:
start | end | transition_type |
---|---|---|
0:00:00 | 0:00:40 | cut |
0:00:40 | 0:00:50 | dissolve |
0:00:50 | 0:01:05 | cut |
0:01:05 | 0:01:17 | zoom in |
Sample Evaluation Use Cases
Use case 1: Copyright review
A rights specialist is reviewing a collection of videos that the archive would like to put online. The archive has the rights to the videos, but the specialist needs to check for any rights-protected content within the video, such as art on walls or performers who might have performance rights. The specialist could review the videos more efficiently if they could skim through a contact sheet of thumbnails showing all of the content within view of each shot instead of scrubbing through the entire video. They send each video through the shot detection MGM, which detects the shots and outputs a contact sheet for each video showing a thumbnail for each shot along with timecodes, in case the specialist needs to reference the video.
Success measures
Shot detection correctly detects as many transitions as possible.
Key metrics
- High recall (As many correct transitions are detected as possible. False positives may clutter contact sheet results with more information than necessary, but will not negatively affect the reviewer's ability to see true positives.)
Qualitative measures
- Review false negatives to see what types of transitions are being missed most frequently. Does one tool handle this type of transition better than others?
Use case 2: Processing a new collection
A collection manager is processing a new collection which includes 3 boxes of unreviewed U-matic tapes. The tapes have been digitized and the CM needs to review and process the digital files. Based on their knowledge of the collection, they suspect that the tapes contain a lot of gaps and unused portions. They would like to be able to efficiently review the tapes' content without having to scrub empty tape, so they would like to review contact sheets that show individual shots. They send the files through shot detection, which detects these empty segments of video as their own thumbnails and includes the time codes, so that the CM can easily skip over these segments and even cut them out for access copies.
Success measures
Shot detection correctly detects transitions before and after dead air.
Key metrics
- High recall (As many correct transitions detected as possible)
Qualitative measures
- Review results to see if transitions before and after empty tape are being detected.
AMP JSON Output
Element | Datatype | Obligation | Definition |
---|---|---|---|
media | object | required | Wrapper for metadata about the source media file. |
media.filename | string | required | Filename of the source file. |
media.duration | string | required | The duration of the source file.. |
shots | array | required | The list of shots in the video. |
shots[*] | object | optional | A shot in a video. |
shots[*].type | string | required | The type of shot, "scene" or "shot". |
shots[*].start | string | required | The start time in seconds (s.fff). |
shots[*].end | string | required | The end time in seconds (s.fff). |
Sample output
Sample Output
{
"media": {
"filename": "myvideo.mp4",
"duration": "45.35"
},
"shots": [
{
"type": "scene",
"start": "0.0",
"end": "45.35"
},
{
"type": "shot",
"start": "0.0",
"end": "10.89"
},
{
"type": "shot",
"start": "10.89",
"end": "19.4"
},
{
"type": "shot",
"start": "19.4",
"end": "45.35"
}
]
}
Attachments:
SD1.png (image/png)
SD2.png (image/png)
Screen Shot
2022-09-30 at 12.35.52 PM.png
(image/png)
Screen Shot
2022-09-30 at 12.45.44 PM.png
(image/png)
Screen Shot
2022-09-30 at 12.48.05 PM.png
(image/png)
Screen Shot
2022-09-30 at 1.17.19 PM.png
(image/png)\
Document generated by Confluence on Feb 25, 2025 10:39