Development - GrafeasGroup/tor_worker GitHub Wiki
This section denotes how to develop for the Celery applications of the tor bots. See each of the original bots' repositories respective celery-rewrite branches for the source code where this applies.
The following sections are an attempt to simplify the vast depths of what the official Celery documentation covers. These sections exist to show only exactly what you need to know to develop certain aspects of the ToR rewrite using Celery.
All development
Since everything lives in the celery-rewrite branches of each repo (if there is no such branch, no work has been done yet for that repo), any pull requests should target the celery-rewrite branch instead of the usual master branch.
As goes without saying, keep test coverage up as high as makes sense, try to cover the important areas of the code to make sure it is working when submitting a pull request and continues to work in case we change implementation details. Again, all this within reason. We can always change tests when implementation is modified.
Tasks development
from a.b import c as foofor a Celery task is justfoo = signature(‘a.b.c’). Even if the task is in the same file as the method you're calling, still import it in this manner for testing purposes. More on that later.- call all other tasks asynchronously only, so
foo.delay(arg1, arg2, arg3)instead offoo(arg1, arg2, arg3) - an argument named
selfis available in each task. Since it usually appears that tasks are standalone methods with decorators just before them, it's best to think of the decorator as transforming it into a derivative of theTaskclass intask_base.py. Thus,selfbecomes more familiar, as if the task's method is an instance method on theTaskclass. - each task has a set of resources available via
selfby default. At time of writing, that includes an HTTP client (requests.Sessioninstance), praw client, Redis client, and an instantiatedConfigobject fromtor_core.config(in thecelery-rewritebranch), lazily instantiated only when it is used for the first time. - the
@app.task()decorator before every celery task will stay the same 98% of the time. Just copy-paste because you don’t really need to know what this does. If it becomes relevant, feel free to look at the celery docs. - Tasks should never return anything. Return values should be passed to calls to other tasks, or written out to other resources (praw instances, Redis clients, http calls, etc.)
- all tasks should be in a file named
tasks.pyin any given python module, for the purpose of automatically discovering all tasks (reasons why this is necessary given elsewhere)
Task permissions
- routing tasks to queues is done so a worker with certain capabilities (e.g., tor mod Reddit login) can handle tasks that require those abilities.
- routing is handled in each repository’s
celeryconfig.pyfile and combined across all of the bots through some voodoo magic in thetor_workerrepository - Most basic of permissions can be assumed by the
torrepo's split oftor.role_moderator.taskscontaining tasks that require permissions of some moderator, andtor.role_anyone.taskscontaining tasks any anonymous user can execute. Any more detailed requirements of permissions (e.g., checking the inbox must only be by the tor user) can be specified as-needed.
Operations for running (Celery) bots
With some recent(ish) changes, running celery workers should execute with default settings just like the legacy commands they replace. The tor repo will provide tor-moderator command which will start a worker processing the anonymous, ToR user's, and moderator role's respective queues. tor-apprentice will do the same for the tor_ocr repository and same with tor-archivist and the tor_archivist repo.
Adding to, modifying queued items, and other operational tasks on the running bots can be handled with commands provided by the tor_core repository, which are (at the time of writing) yet to be written.
Any change to any one bot's tasks will require a restart of all bots in order to pick up those changes. Reasons for that are related to all bots needing to be aware of all tasks available, or risking the task queued being thought of as unknown and one of the workers discarding it. (Note: This is purely theoretical and needs to be tested. For all we know things might work out fine if only one bot is restarted.)