AWS Comprehend - AudiovisualMetadataPlatform/amp_documentation GitHub Wiki
- About
- The AWS Comprehend adapter is a python based tool that takes in speech segmentation JSON as input and produces a list of entities. Each entity contains a start, end, type, label, and score.
- Source Code
- galaxy/tools/aws/aws_comprehend.xml
Tool configuration detailing tool execution, input file, output file, and labeling. - galaxy/tools/aws/aws_comprehend.py
Python script to call AWS comprehend via API and conform json to schema - galaxy/tools/amp_json_schema/entity_extraction_schema.py
Set of classes representing the entity extraction schema
- galaxy/tools/aws/aws_comprehend.xml
- Dependencies
-
Boto3
-
An AWS Bucket
-
An AWS IAM role for Comprehend capable of put/get/list operations on the supplied S3 bucket. See the following documentation: https://docs.aws.amazon.com/comprehend/latest/dg/access-control-managing-permissions.html#auth-role-permissions\
{ "Version": "2012-10-17", "Statement": [ { "Sid": "VisualEditor0", "Effect": "Allow", "Action": [ "s3:PutObject", "s3:GetObject", "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::amp-dev-test/*", "arn:aws:s3:::amp-dev-test" ] } ] }
-
-
Installation:
$ pip install Boto3 -
Running the tool
- The tool can be invoked from Galaxy UI as other tools. User needs to supply input data in the form of standardized speech to text output
- Parameters
-
$input_file: the speech to text output.
-
$json_file: the output json file.
-
$bucketName: the AWS bucket to store input and output files. For testing purposes, the bucket amp-dev-test can be used.
-
[$][dataAccessRoleArn: IAM role allowing comprehend to access S3.
][
]
-
- Output
- Json file conforming to schema located here https://wiki.dlib.indiana.edu/display/AMP/MGM---Entity-Extraction
Document generated by Confluence on Feb 25, 2025 10:39