Archiving - ericaltendorf/plotman GitHub Wiki

Introduction

Many of us have multiple drives that we want to fill with plots. If we're lucky we are even adding to the pile to keep our plotters busy. The dst drive must be picked hours ahead of time when the chia plot process is launched. The more plotting activity and drive swapping there is the harder it becomes to predict which plot process should target which drive. When the prediction is wrong you end up having to manually intervene and move plots to unjam the stuck plotting processes.

Plotman's archiving operates on completed plots to avoid the need to predict the future. The final drives are selected to make sure they all get filled. Specific plots are chosen in an effort to use more of the available bandwidth of the dst drives. Configuration is provided to significantly avoid IO contention at the receiving end of the transfer as well.

Most users will want to either configure archiving or provide their own plot distribution mechanism. Regardless of whether the dst paths are dedicated drives or configured to be the same as the tmp drives, they are used as a buffer between plot creation and archiving.

In the past, archiving required use of rsyncd and ssh which made it cumbersome to use with local plot storage drives. That mode is still available and well suited for remote storage, but you can also setup local archiving in as little as four lines of YAML. Usually, the only required external setup is mounting the drives in the expacted manner and having an rsync client installed.

Archiving is configured by selecting a target definition and specifying the parameters it requires. Builtin target definitions are provided for the common rsyncd target for remote archiving as well as a new local rsync target. For local rsync, the only required parameter is a path to a directory that contains the mount points of the plot storage drives. You can also write your own target definition in your configuration file if you want to adjust one of the builtins or develop your entire own transfer mechanism. Each target definition is composed of two activities. Drive identification and the actual file transfer. Each activity is defined by a script which can be written in a language of your choice.

Instructions and comments are based on a standard Ubuntu Server installation. Commands and file locations may differ for other Linux distributions.

Local path archiving

We will start with the simple setup first. This may be used for any locally accessible path including directly mounted internal or external drives as well as network mounts via nfs, smb, or other means. Network mounts are not explicitly recommended, but mentioned for completeness.

archiving:
  target: local_rsync
  env:
    site_root: /mnt/farm

This selects the builtin target definition named local_rsync and configures its site_root parameter to refer to /mnt/farm. In this configuration, candidate drives would be mounted inside the specified directory such as at /mnt/farm/plots1 and /mnt/farm/plots2. Since all drives in that directory are considered, it is generally best to not use /mnt itself or /media/username since they are general use mount points that will often contain other mounted drives. rsync will be used to transfer the completed plots to their final resting places using local paths. No rsyncd server is required and any configured will not be used.

Remote rsyncd archiving

Setting up remote archiving is a bit more involved. While the configuration file is different, as is the code backing that, the overall functionality is the same as was historically provided by plotman. Except more configurable. The details of rsyncd and ssh setup will be covered later. Here is a basic setup.

archiving:
  target: rsyncd
  env:
    site_root: /mnt/farm
    user: username
    host: plot.storage.ip
    rsync_port: 12000
    site: sites

This selects the builtin target definition named rsyncd. ssh will be used to connect to plot.storage.ip as the user username to check which drives are mounted inside /mnt/farm and how much space they have available. Once a plot is available and a target drive has been selected, rsync will be used to connect to rsyncd on the remote system to transfer the plot.

Custom archiving setups

You can define your own target definitions in your plotman.yaml configuration file. You can write the two required scripts either inline in the configuration or reference external scripts you maintain separately. You could duplicate the builtin local_rsync target definition as follows. This is meant to be exemplary only. Presumably you would only do this if you were going to modify it in some way. You can define multiple target definitions though presently you can only select and use one at a time.

archiving:
  target: my_target
  env:
    site_root: /mnt/farm
  target_definitions:
    my_target:
      env:
        command: rsync
        options: --preallocate --remove-source-files --skip-compress plot --whole-file
        site_root: null
      disk_space_script: |
        #!/bin/bash
        df -BK | grep " ${site_root}/" | awk '{ gsub(/K$/,"",$4); print $6 ":" $4*1024 }'
      transfer_script: |
        #!/bin/bash
        "${command}" ${options} "${source}" "${destination}"
      transfer_process_name: "{command}"
      transfer_process_argument_prefix: "{site_root}"

The my_target: env: section defines parameters that will be made available to the scripts as environment variables. You either provide a default string value or specify null to make the parameter mandatory. For example, the site_root is a thing we cannot make any sensible guess for. The user must specify it in the archiving: env: section.

The output of the disk space script must have the form of a single line per disk with the path and available byte count separated by a colon. If you are writing your own custom disk space script you can select any directories any way you want to.

/mnt/farm/plots1:94148112384
/mnt/farm/plots2:39723638784

The transfer script is provided two extra environment variables. source will be an absolute path to the plot that needs to be transferred. destination will be one of the paths reported by the disk space script. /mnt/farm/plots2 for example.

transfer_process_name is used as the first filter when discovering existing archive transfer processes. It should be written as a Python format string. Names to be interpolated will match the environment variables defined as parameters. transfer_process_argument_prefix is the second filter. We will scan the arguments of any process matching transfer_process_name to see if any arguments start with the specified prefix. If both requirements are satisfied, we consider that an active archival transfer.

If you prefer, you can maintain the scripts as separate files and specify their paths such as follows.

archiving:
  target: my_target
  env:
    site_root: /mnt/farm
  target_definitions:
    my_target:
      env:
        command: rsync
        options: --preallocate --remove-source-files --skip-compress plot --whole-file
        site_root: null
      disk_space_path: /some/where/disk_space
      transfer_path: /some/where/transfer
      transfer_process_name: "{command}"
      transfer_process_argument_prefix: "{site_root}"

Machine set up

There are two main pieces to plotting. Creating the plots and getting them to where you want them to be farmed. In some cases these will both be on the same machine, in other cases there will be one or more dedicated plotters with a separate farmer. We will start by setting up the Plot Storage and then configure everything on the Plotter.

1) Plot Storage set up

On your Plot Storage machine, make sure that all the storage drives are mounted, rsync daemon is running and SSH is set up to accept incoming connections from your Plotter.

Mount drives

Archiving expects there to be a directory that contains the mounts for the drives you want to archive to and no other drives mounted there. /mnt itself is unlikely to be a good choice since it is a standard place to mount anything. In this tutorial the following mount points will be used:

/mnt/farm/plots1
/mnt/farm/plots2

Set up rsync daemon (rsyncd)

If you are using the rsyncd target definition described above, or a similar ssh/rsyncd custom setup, then you will need to configure and run the rsync daemon on the Plot Storage system.

Install rsync (Ubuntu Server already comes with it installed)
Create /etc/rsyncd.conf:

lock file = /var/run/rsync.lock
log file = /var/log/rsyncd.log
pid file = /var/run/rsyncd.pid

# don't change the port, plotman (as of version 0.2) has the port hard coded
port = 12000

# rsync module name
[chia]
    # Path with your mounted drives
    path = /mnt/farm
    comment = Chia
    # Use the username that you log into Ubuntu with or create a new one
    uid = username
    # User group (by default same as username)
    gid = username
    read only = no
    list = yes
    # dont uncomment this, 
    #auth users = none
    # plotman does not work with authentication
    #secrets file = none
    # since we dont use auth only accept connections from plotter's ip
    hosts allow = plotter.ip.address

Start rsync daemon by typing sudo systemctl start rsync
If you automatically want the daemon to start after a reboot type: sudo systemctl enable rsync

Clarification about 'path' variable in rsyncd.conf

The path variable represents a path where all your storage drives are mount. So in our example in /mnt/farm we have two drives mounted, namely plots1 and plots2.

If you only have a single drive mounted for archiving (e.g. /media/username/plots) your path should not point to /media/username/plots, but rather to /media/username. Note that you should still consider that any other mount points in that directory will be considered for archiving. This should generally drive you to use explicit mount points instead of any default locations.

SSH configuration

Make sure you configure ssh in a way that you can connect from your Plotter without having to use a password or keyfile passphrase.

The best way to do this is to create a ssh-key without a passphrase on the Plotter and copy the public key to your Plot Storage.

2) Plotter set up

Update your plotman.yaml

Update the archive section of your plotman.yaml (Default Configuration File) file. If the Plotter and the Plot storage are on the same machine then you can use rsyncd_host: localhost. In our example the config would look as follows:

archive:
        rsyncd_module: chia               # Module name specified in the Plot Storage's rsyncd.conf
        rsyncd_path: /mnt/farm            # Path where your storage drives are mounted (same as in rsyncd.conf)
        rsyncd_bwlimit: 100000            # Bandwidth limit in KB/s
        rsyncd_host: plot.storage.ip      # IP address or hostname of your Plot Storage, localhost if local
        rsyncd_user: username             # Username that can ssh into your Plot Storage

Test your setup / trouble shooting

Before starting plotman you should make sure that both SSH and rsync is set up correctly. If you can't successfully run the tests below, plotman's archiving will not work.

Make sure SSH is set up correctly

The machines should be set up in such a way that you can SSH from your Plotter to your Plot Storage without having to enter a password. In order to do this you should use a ssh public/private keypair that doesn't require entering a passphrase.

To test if you set this up correctly type the following command on your Plotter: ssh [email protected] df -aBK | grep /mnt/farm/

The command above should give you a list of all the mounted drives of your Plot Storage. If it doesn't, or if it asks for a password or a passphrase then SSH is not set up as required by plotman and archiving will not work.

To tunnel rsync through ssh, rsync should have -e ssh or --rsh ssh in the rsync options:

rsync -Pe ssh testfile.test rsync://[email protected]:12000/chia/plots1
rsync -P --rsh ssh testfile.test rsync://[email protected]:12000/chia/plots1

Make sure rsync works

Create a testfile on your Plotter using echo "testing" > testfile.test
Enter the following on your Plotter: rsync -P testfile.test rsync://[email protected]:12000/chia/plots1
Check your Plot Storage and make sure testfile.test exists in /mnt/farm/plots1

Manually run plotman's rsync command

If both of the tests above pass but archiving still doesn't work, you can look at the rsync output in your console.

Start plotman interactive
Locate the rsync line in the Log section at the bottom of the screen, e.g.:

05-03 08:37:46 Starting archive: rsync --bwlimit=80000 --remove-source-files -P /mnt/dst1/plot-k32-2021-05-03-01-50-b4271f88a74b36b516c242151e00fdda20e3f31ce1f8624465bf05a195009ecd.plot rsync://[email protected]:12000/chia/plots2

Copy the part after 05-03 08:37:46 Starting archive: . In our example that would be:

rsync --bwlimit=80000 --remove-source-files -P /mnt/plots1/plot-k32-2021-05-03-01-50-b4271f88a74b36b516c242151e00fdda20e3f31ce1f8624465bf05a195009ecd.plot rsync://[email protected]:12000/chia/plots2

Run the command in your terminal and use the output for finding any errors you may have in your configuration.

Disable archiving in plotman

In order to disable archiving, completely comment out the corresponding archive: section in your .config/plotman/plotman.yaml. Users should either use archiving or provide their own plot distribution mechanism. The dst directories are not intended to be the final storage location for plots.