2 using the storageloader - thebeansgroup/snowplow GitHub Wiki
HOME > SNOWPLOW SETUP GUIDE > Step 4: setting up alternative data stores > Using the StorageLoader
## 1. OverviewRunning the StorageLoader is very straightforward - please review the command-line options in the next section.
## 2. Command-line optionsInvoke StorageLoader using Bundler's bundle exec
syntax:
$ bundle exec bin/snowplow-storage-loader
Note the bin/
sub-folder, and that the bundle exec
command will
only work when you are inside the storage-loader
folder.
The command-line options for StorageLoader look like this:
Usage: snowplow-storage-loader [options]
Specific options:
-c, --config CONFIG configuration file
-i, --include compupdate,vacuum include optional work step(s)
-s, --skip download|delete,load,analyze,archive skip work step(s)
Common options:
-h, --help Show this message
-v, --version Show version
A note on the --include
option: this includes optional work steps
which are otherwise not used. --include vacuum
runs a VACUUM
operation on the table following the load. --include compupdate
runs
the load then determines the best compression encoding format to use for
each each of the fields in your Redshift event table, using the :comprows:
setting for the sample size. For more information on Amazon’s comprows
functionality, see the [Redshift documentation] comprows.
A note on the --skip
option: this skips the work steps listed. So
for example --skip download,load
would only run the final archive
step. This is useful if you have an error in your load and need to
re-run only part of it.
As per the above, running StorageLoader is a matter of populating
your configuration file, let's call it my-config.yml
for this
example, and then invoking StorageLoader like so:
$ bundle exec snowplow-storage-loader --config my-config.yml
StorageLoader depends on Snowplow's [Infobright Ruby Loader] [irl],
which in turn uses the locate
shell command. If your shell complains
that this is missing, in which case you can install it separately.
To install and configure locate
on Debian/Ubuntu:
$ sudo apt-get install mlocate
$ sudo updatedb
All done? Then schedule the StorageLoader to regularly migrate new data into your data store (e.g. Infobright or Redshift).