Other useful tools - Titousensei/sisyphus GitHub Wiki

Pusher has a counter for every action and Key-related access. It will print them in the log at the end of the push. Example:

[Pusher] close: ActionIfKeyMap{[id, hash]?match -> OutputFile{[id, url, title, hash] -> "products_same.gz" +60,985,506 rows} 60,985,506 used, 25 warnings
[Pusher] close: ActionIfKey{[urlhash]?found -> BreakAfter"Duplicate New Urls" 2,390,849 used}

Pusher has powerful debugging options:

  • debug(): print every current row after all the actions have been performed
  • debug(n>0): print the first n rows
  • debug(-1): print a row for each progress marker (every 1M)

Pusher also has a built-in profiler() command, to measure the time taken by each action and print a summary at the end. This is mostly useful when several cpu-intensive Modifiers are used. Please note that the processing will be slower when enabled (about x2-x5).

The class IOUtil provides utility methods for file management:

  • assertSpace(), assertDir(), assertFile(): before you start the processing, you can make sure there is enough space on disk, and that the proper files and directories are present.
  • rename(): rename a file or directory; exception if destination exists.
  • replace(): rename a file or directory; delete or backup destination if exists.
  • deleteDirectory(): recursively delete a directory.

Finally, the class BackgroundShell allows a java program to run shell commands, either directly or in a background thread. The stdout and stderr will can be captured and printed in the log as one block when the shell finishes.

Previous: Joining - Next: Common Use Case