Facial Recognition - AudiovisualMetadataPlatform/amp_documentation GitHub Wiki

Facial Recognition

Inputs
Output Formats
Notes on Use
Use Cases and Example Workflows
AMP JSON Output

Facial recognition is the detection and identification of human faces in a video based on a model trained with images of known individuals.

Inputs

Video file

Output Formats

amp_faces: Face labels and timestamps of all instances of recognized faces in AMP JSON format (see below)
contact_sheet: Image of middle frame thumbnails with labels and timecodes for all instances of recognized faces.

MGMs in AMP

Dlib Face Recognition (Python)

Dlib Face Recognition is an open source Python library for detecting and manipulating faces in images. The AMP implementation allows a user to submit training images of the faces they wish to find in videos, then Face Recognition will detect and label similar faces, returning time stamps, bounding coordinates, and labels of faces it has detected. Face Recognition will only try to recognize and label faces submitted by the user for training for a specific workflow.

Parameters:

Reuse Previous Training Results: A flag indicating whether or not to use previous training results from the same training photos if it exists.
Face Match Tolerance: Tolerance level when matching faces, a lower value means a stricter match.

Notes on Use

Before running facial recognition, you must upload a zip file of face images for the people you want to find as a supplemental file. For each individual you want to identify, include a folder variety of images of the person's face (making sure no other faces are in the images), from different angles and ages, if possible. Name this folder with the label you want to display in the results. Compress all folders of individuals as a zip file and upload it as a supplementary file (category: face) at the collection or item level. (More details on uploading supplemental files are on the About Supplemental Files page.)
Each of the tools has at least one of the outputs checked by default. Checked outputs will display the workflow step and output in the dashboard when "[Show Relevant Results Only" is turned on.] See Tips for Creating a Workflow{rel="nofollow" style=""} for an explanation of what each output option means.

Use Cases and Example Workflows

Use case 1: Archival description

An archivist is processing a collection and wants to add controlled access points at the item level within the finding aid for key people from the collection who appear in videos. It is not important where in the video people appear, the archivist just wants to know if they appear at some point. They collect images of each person they wish to find and add them as supplemental files to the collection to be added to the workflow for training. They run the workflow containing the Face Recognition MGM and then view the generated contact sheets to find the names of people appearing in each video and confirm the MGM was correct by looking at the faces in the thumbnail frames. They spotcheck a few of the videos with contact sheets, then add the names to the finding aid.

Notes:

The archivist uploads a zip file of faces for the people they want to identify as a supplementary file (one folder of faces per person within the zip file, named for how they would like the faces to be labeled).
The workflow includes the Input Supplement step to pull this supplementary file associated with the collection or item into the workflow.

Use case 2: Indexing people in a video

A metadata specialist wants to index appearances of key people in a collection of videos, so they can add them as index points to the digital object for users to navigate to in the video player. After submitting images of faces to train the Face Recognition tool on, they run the videos through the MGM workflow. They download CSV files for the videos, which include time codes and labels for each face detected, then upload them to their video platform as index points.

Notes:

The metadata specialist uploads a zip file of faces for the people they want to identify as a supplementary file (one folder of faces per person within the zip file, named for how they would like the faces to be labeled).
CSV output for facial recognition is not yet available in AMP. The above workflow will produce output in AMP JSON format (see below).

AMP JSON Output

Summary:

Element	Datatype	Obligation	Definition
media	object	required	Wrapper for metadata about the source media file.
media.filename	string	required	Filename of the source file.
media.duration	string	required	The duration of the source file.
media.frameRate	number	required	The frame rate of the video, in FPS.
media.numFrames	number	required	The number of frames in the video.
media.resolution	object	required	Resolution of the video.
media.resolution.width	number	required	Width of the frame, in pixels.
media.resolution.height	number	required	Height of the frame, in pixels.
frames	array	required	List of frames containing identified faces.
frames[*]	object	optional	A frame containing an identified face.
frames[*].start	string (s.fff)	required	Time of the frame, in seconds.
frames[*].objects	list	required	List of bounding boxes in the frame containing identified faces.
frames[].objects[]	object	required	A bounding box in the frame containing an identified face.
frames[].objects[].name	string	required	The name of the face within the bounding box.
frames[].objects[].score	object	optional	A confidence or relevance score for the face.
frames[].objects[].score.type	string (confidence \| relevance)	required	The type of score, confidence or relevance.
frames[].objects[].score.value	number	required	The score value, typically a number in the range of 0-1.
frames[].objects[].vertices	object	optional	The top left (xmin, ymin) and bottom right (xmax, ymax) relative bounding coordinates.
frames[].objects[].vertices.xmin	number	required	The top left x coordinate.
frames[].objects[].vertices.ymin	number	required	The top left y coordinate.
frames[].objects[].vertices.xmax	number	required	The bottom right x coordinate.
frames[].objects[].vertices.ymax	number	required	The bottom right y coordinate.

Sample output

Sample Output

{
    "media": {
        "filename": "myfile.mov",
        "duration": "8334.335",
        "frameRate": 30.000,
        "numFrames": 1547,
        "resolution": {
            "width": 654,
            "height": 486
        }
    },
    "frames": [
        {
            "start": "625.024",
            "objects": [
                {
                    "name": "Herman B. Wells",
                    "score": {
                        "type": "confidence",
                        "scoreValue": 0.9903119
                    },
                    "vertices": {
                        "xmin": 219,
                        "ymin": 21,
                        "xmax": 340,
                        "ymax": 53
                    }
                }
            ]
        }
    ]
}

Attachments:

Facial Recognition Workflow.png (image/png)
FacialRecognitionWFcase1.png (image/png)
FRcase2.png (image/png)
Screen Shot 2022-09-30 at 10.45.58 AM.png (image/png)
Screen Shot 2022-09-30 at 10.53.49 AM.png (image/png)\

Document generated by Confluence on Feb 25, 2025 10:39