Text Classification - HearstCorp/rover-wiki GitHub Wiki
Text Classification
Setup
This service is dependent upon AlchemyAPI, a third party API that we make calls to.
You will need to add an environment variable, ALCHEMY_API_KEY
to your .env
file.
For the key, please email [email protected]
or [email protected]
API
GET https://{domain}/v2/classifier?text={some text}
This is a separate endpoint that lets the frontend classify content whilst the author/editor are working on it. Responses look like:
GET https://{domain}/v2/classifier?text=The cat sat on the mat
JSON Response:
Name | Type | Description |
---|---|---|
tags | Array | A list of strings, each string is a valid edit theme, taken from a Category object title |
If there is an error - from the API key being incorrectly configured, or the text being non-existent, they will be returned like:
GET https://{domain}/v2/classifier?derp
JSON Response:
Name | Type | Description |
---|---|---|
error | String | A useful error message |
Manage.py command
There is a corresponding manage.py command to classify all (or just recent) video content, that uses video title+description fields. This is to let the classifier run as a cronjob.
Classify everything that doesn't have a classification:
$ ./manage.py classifyvideos
Classify everything unclassified that was added since x seconds ago:
$ ./manage.py classifyvideos --since 3600
Classify everything as above. If there are old videos that have classifications, but less than 3 entries, try and add some more to them:
$ ./manage.py classifyvideos --augment
These two options can be used in conjunction with each other:
$ ./manage.py classifyvideos --augment --since 3600
A stronger version of --augment
is --replace-all
which will overwrite everything. This
isn't recommended though, as it will overwrite classifications that were added by hand by the
editors:
$ ./manage.py classifyvideos --replace-all
This can be used in conjunction with the --since
option as well:
$ ./manage.py classifyvideos --replace-all --since 3600