Facial Recognition - AudiovisualMetadataPlatform/amp_documentation GitHub Wiki

  1. AMP: Audiovisual Metadata Platform
  2. Documentation
  3. For Collection Managers
  4. MGMs (Metadata Generation Mechanisms)

Facial Recognition

Facial recognition is the detection and identification of human faces in a video based on a model trained with images of known individuals.

Inputs

Video file

Output Formats

  • amp_faces: Face labels and timestamps of all instances of recognized faces in AMP JSON format (see below)
  • contact_sheet: Image of middle frame thumbnails with labels and timecodes for all instances of recognized faces.

MGMs in AMP

Dlib Face Recognition (Python) 

Dlib Face Recognition is an open source Python library for detecting and manipulating faces in images. The AMP implementation allows a user to submit training images of the faces they wish to find in videos, then Face Recognition will detect and label similar faces, returning time stamps, bounding coordinates, and labels of faces it has detected. Face Recognition will only try to recognize and label faces submitted by the user for training for a specific workflow.

Parameters: 

  • Reuse Previous Training Results: A flag indicating whether or not to use previous training results from the same training photos if it exists.
  • Face Match Tolerance: Tolerance level when matching faces, a lower value means a stricter match.

Notes on Use

  • Before running facial recognition, you must upload a zip file of face images for the people you want to find as a supplemental file. For each individual you want to identify, include a folder variety of images of the person's face (making sure no other faces are in the images), from different angles and ages, if possible. Name this folder with the label you want to display in the results. Compress all folders of individuals as a zip file and upload it as a supplementary file (category: face) at the collection or item level. (More details on uploading supplemental files are on the About Supplemental Files page.)
  • Each of the tools has at least one of the outputs checked by default. Checked outputs will display the workflow step and output in the dashboard when "[Show Relevant Results Only" is turned on.] See Tips for Creating a Workflow{rel="nofollow" style=""} for an explanation of what each output option means.

Use Cases and Example Workflows

Use case 1: Archival description

An archivist is processing a collection and wants to add controlled access points at the item level within the finding aid for key people from the collection who appear in videos. It is not important where in the video people appear, the archivist just wants to know if they appear at some point. They collect images of each person they wish to find and add them as supplemental files to the collection to be added to the workflow for training. They run the workflow containing the Face Recognition MGM and then view the generated contact sheets to find the names of people appearing in each video and confirm the MGM was correct by looking at the faces in the thumbnail frames. They spotcheck a few of the videos with contact sheets, then add the names to the finding aid.

Notes:

  • The archivist uploads a zip file of faces for the people they want to identify as a supplementary file (one folder of faces per person within the zip file, named for how they would like the faces to be labeled).
  • The workflow includes the Input Supplement step to pull this supplementary file associated with the collection or item into the workflow.

Use case 2: Indexing people in a video

A metadata specialist wants to index appearances of key people in a collection of videos, so they can add them as index points to the digital object for users to navigate to in the video player. After submitting images of faces to train the Face Recognition tool on, they run the videos through the MGM workflow. They download CSV files for the videos, which include time codes and labels for each face detected, then upload them to their video platform as index points.

Notes:

  • The metadata specialist uploads a zip file of faces for the people they want to identify as a supplementary file (one folder of faces per person within the zip file, named for how they would like the faces to be labeled).
  • CSV output for facial recognition is not yet available in AMP. The above workflow will produce output in AMP JSON format (see below).

AMP JSON Output

Summary: 

Element Datatype Obligation Definition
media object required Wrapper for metadata about the source media file.
media.filename string required Filename of the source file.
media.duration string required The duration of the source file.
media.frameRate number required The frame rate of the video, in FPS.
media.numFrames number required The number of frames in the video.
media.resolution object required Resolution of the video.
media.resolution.width number required Width of the frame, in pixels.
media.resolution.height number required Height of the frame, in pixels.
frames array required List of frames containing identified faces.
frames[*] object optional A frame containing an identified face.
frames[*].start string (s.fff) required Time of the frame, in seconds.
frames[*].objects list required List of bounding boxes in the frame containing identified faces.
frames[*].objects[*] object required A bounding box in the frame containing an identified face.
frames[*].objects[*].name string required The name of the face within the bounding box.
frames[*].objects[*].score object optional A confidence or relevance score for the face.
frames[*].objects[*].score.type string (confidence | relevance) required The type of score, confidence or relevance.
frames[*].objects[*].score.value number required The score value, typically a number in the range of 0-1.
frames[*].objects[*].vertices object optional The top left (xmin, ymin) and bottom right (xmax, ymax) relative bounding coordinates.
frames[*].objects[*].vertices.xmin number required The top left x coordinate.
frames[*].objects[*].vertices.ymin number required The top left y coordinate.
frames[*].objects[*].vertices.xmax number required The bottom right x coordinate.
frames[*].objects[*].vertices.ymax number required The bottom right y coordinate.

Sample output

Sample Output

{
    "media": {
        "filename": "myfile.mov",
        "duration": "8334.335",
        "frameRate": 30.000,
        "numFrames": 1547,
        "resolution": {
            "width": 654,
            "height": 486
        }
    },
    "frames": [
        {
            "start": "625.024",
            "objects": [
                {
                    "name": "Herman B. Wells",
                    "score": {
                        "type": "confidence",
                        "scoreValue": 0.9903119
                    },
                    "vertices": {
                        "xmin": 219,
                        "ymin": 21,
                        "xmax": 340,
                        "ymax": 53
                    }
                }
            ]
        }
    ]
}

Attachments:

Facial Recognition Workflow.png (image/png)
FacialRecognitionWFcase1.png (image/png)
FRcase2.png (image/png)
Screen Shot 2022-09-30 at 10.45.58 AM.png (image/png)
Screen Shot 2022-09-30 at 10.53.49 AM.png (image/png)\

Document generated by Confluence on Feb 25, 2025 10:39