Task Manager's task implementation - Yicong-Huang/Wildfires GitHub Wiki
This page is to show you how to construct a task for the task manager to run
Description
Typically, a runnable/task will have these three parts as support:
- crawler - crawl data from selected website
- extractor - extract the downloaded data and get the information you need
- dumper - put the information into the database
These parts can be altered if the task is for a different purpose, e.g. classification.
In the task file itself, there usually is only a run
function which can be called in the Task Manager
.
Implementation
Normally, you do the pipeline file by file.
For each file you want from the website:
- first you crawl that file(using
wget
orrequest
) - then extract and dump it
- finally you delete that file to save space
- move on to the next file you are going to get
Remember to put logging information to catch the possible exceptions in the task you are working on