Text Classification - HearstCorp/rover-wiki GitHub Wiki

Text Classification

Setup

This service is dependent upon AlchemyAPI, a third party API that we make calls to. You will need to add an environment variable, ALCHEMY_API_KEY to your .env file.

For the key, please email [email protected] or [email protected]

API

GET https://{domain}/v2/classifier?text={some text}

This is a separate endpoint that lets the frontend classify content whilst the author/editor are working on it. Responses look like:

GET https://{domain}/v2/classifier?text=The cat sat on the mat

JSON Response:

Name Type Description
tags Array A list of strings, each string is a valid edit theme, taken from a Category object title

If there is an error - from the API key being incorrectly configured, or the text being non-existent, they will be returned like:

GET https://{domain}/v2/classifier?derp

JSON Response:

Name Type Description
error String A useful error message

Manage.py command

There is a corresponding manage.py command to classify all (or just recent) video content, that uses video title+description fields. This is to let the classifier run as a cronjob.

Classify everything that doesn't have a classification:

$ ./manage.py classifyvideos

Classify everything unclassified that was added since x seconds ago:

$ ./manage.py classifyvideos --since 3600

Classify everything as above. If there are old videos that have classifications, but less than 3 entries, try and add some more to them:

$ ./manage.py classifyvideos --augment

These two options can be used in conjunction with each other:

$ ./manage.py classifyvideos --augment --since 3600

A stronger version of --augment is --replace-all which will overwrite everything. This isn't recommended though, as it will overwrite classifications that were added by hand by the editors:

$ ./manage.py classifyvideos --replace-all

This can be used in conjunction with the --since option as well:

$ ./manage.py classifyvideos --replace-all --since 3600