Development - GrafeasGroup/tor_worker GitHub Wiki

This section denotes how to develop for the Celery applications of the tor bots. See each of the original bots' repositories respective celery-rewrite branches for the source code where this applies.

The following sections are an attempt to simplify the vast depths of what the official Celery documentation covers. These sections exist to show only exactly what you need to know to develop certain aspects of the ToR rewrite using Celery.

All development

Since everything lives in the celery-rewrite branches of each repo (if there is no such branch, no work has been done yet for that repo), any pull requests should target the celery-rewrite branch instead of the usual master branch.

As goes without saying, keep test coverage up as high as makes sense, try to cover the important areas of the code to make sure it is working when submitting a pull request and continues to work in case we change implementation details. Again, all this within reason. We can always change tests when implementation is modified.

Tasks development

  • from a.b import c as foo for a Celery task is just foo = signature(‘a.b.c’). Even if the task is in the same file as the method you're calling, still import it in this manner for testing purposes. More on that later.
  • call all other tasks asynchronously only, so foo.delay(arg1, arg2, arg3) instead of foo(arg1, arg2, arg3)
  • an argument named self is available in each task. Since it usually appears that tasks are standalone methods with decorators just before them, it's best to think of the decorator as transforming it into a derivative of the Task class in task_base.py. Thus, self becomes more familiar, as if the task's method is an instance method on the Task class.
  • each task has a set of resources available via self by default. At time of writing, that includes an HTTP client (requests.Session instance), praw client, Redis client, and an instantiated Config object from tor_core.config (in the celery-rewrite branch), lazily instantiated only when it is used for the first time.
  • the @app.task() decorator before every celery task will stay the same 98% of the time. Just copy-paste because you don’t really need to know what this does. If it becomes relevant, feel free to look at the celery docs.
  • Tasks should never return anything. Return values should be passed to calls to other tasks, or written out to other resources (praw instances, Redis clients, http calls, etc.)
  • all tasks should be in a file named tasks.py in any given python module, for the purpose of automatically discovering all tasks (reasons why this is necessary given elsewhere)

Task permissions

  • routing tasks to queues is done so a worker with certain capabilities (e.g., tor mod Reddit login) can handle tasks that require those abilities.
  • routing is handled in each repository’s celeryconfig.py file and combined across all of the bots through some voodoo magic in the tor_worker repository
  • Most basic of permissions can be assumed by the tor repo's split of tor.role_moderator.tasks containing tasks that require permissions of some moderator, and tor.role_anyone.tasks containing tasks any anonymous user can execute. Any more detailed requirements of permissions (e.g., checking the inbox must only be by the tor user) can be specified as-needed.
  • more on this to come

Operations for running (Celery) bots

With some recent(ish) changes, running celery workers should execute with default settings just like the legacy commands they replace. The tor repo will provide tor-moderator command which will start a worker processing the anonymous, ToR user's, and moderator role's respective queues. tor-apprentice will do the same for the tor_ocr repository and same with tor-archivist and the tor_archivist repo.

Adding to, modifying queued items, and other operational tasks on the running bots can be handled with commands provided by the tor_core repository, which are (at the time of writing) yet to be written.

Any change to any one bot's tasks will require a restart of all bots in order to pick up those changes. Reasons for that are related to all bots needing to be aware of all tasks available, or risking the task queued being thought of as unknown and one of the workers discarding it. (Note: This is purely theoretical and needs to be tested. For all we know things might work out fine if only one bot is restarted.)