LogstashHarvestTechnology: Logstash Harvest Component - IKANOW/Aleph2-examples GitHub Wiki

Overview

The Logstash Harvester enables users to configure/launch logstash jobs from v2 nodes.

Installation

This is most likely handled by the rpm, install
The Logstash Harvester requires you to install 2 logstash plugins before it will work, S3 input and WebHDFS output, these can easily be installed by navigating to your logstash folder (e.g. /opt/logstash/) and running these 2 commands:
bin/plugin install logstash-input-s3
bin/plugin install logstash-output-webhdfs

Security considerations

Currently the Logstash harvester is not safe for non-admin users on secure clusters. It will shortly be integrated with RBAC but in the meantime access to this enrichment engine should be restricted to admin users (by restricting the read rights of the uploaded JAR).

Logging

The Logstash harvester has the following Aleph2 log types:

  • subsystem: "LogstashHarvestService", command: "test_output", level: - when running tests, each line of Logstash log is retrieved after the test ends (this will either be INFO+ or DEBUG+ depending on whether the debug_verbosity flah below is set). The log format is simply the native Logstash log object converted to JSON.
    • (Note that all Logstash logging is discarded in normal mode, it's simply too verbose. Even in just test mode, it can generate GBs of log data. Some work is required to cap the logging generated.)

Bucket configuration

The logstash harvester has a simple configuration:

{
   "script": string,
   "output_override": string, (optional)
   "debug_verbosity": boolean, (optional)
   "write_settings_override" : WriteSettings (optional)
}

Logstash configuration (script):

Here is where you specify your logstash input and filter blocks. You do not need to specify an output block, one is added automatically to ensure your output gets sent to hdfs or elasticsearch (see output_override command below). Use \n to separate lines. See example at end of page to see how to take advantage of v1 JS block in the source editor to make submission easier.

Output Override (output_override):

Defaults to output logstash results in HDFS ("hdfs"), can be overridden currently to "elasticsearch" to output results to elasticsearch instead.

Debug Verbosity (debug_verbosity):

For tests only, defaults to false. If true adds --debug command line flag to starting logstash tests, otherwise adds --verbose to tests. See https://www.elastic.co/guide/en/logstash/current/command-line-flags.html for more information.

Write Settings Override (write_settings_override):

This is for overriding the default flush specs on files, the WriteSettings object is available here: WriteSettings We only overwrite 2 of the fields, "batch_max_objects" - we segment to the next file once we reach this many objects, defaults to 33554432 "batch_flush_interval" - how many seconds between flushing, defaults to 300. During testing "batch_flush_interval" is overridden to 10s in an attempt to quickly finish the test (as you generally only need small numbers of results to test on). Example config: { "batch_flush_interval":25, "batch_max_objects":3000 }

Global configuration

In most cases it will not be necessary to apply global configuration, but the following fields are configurable per shared library:

{
   "base_dir" : string, //defaults to /opt/logstash-infinite/
   "working_dir" : string, //defaults to /opt/logstash-infinite/logstash/
   "master_config_dir" : string, //defaults to /opt/logstash-infinite/logstash.conf.d/
   "slave_config_dir" : string, //defaults to /opt/logstash-infinite/dist.logstash.conf.d/
   "binary_path" : string, //defaults to /opt/logstash-infinite/logstash/bin/logstash
   "restart_file" : string, //defaults to /opt/logstash-infinite/RESTART_LOGSTASH
   "hadoop_mount_root" : string, //defaults to /opt/hadoop-fileshare/app/aleph2
   "non_admin_inputs" : string, //defaults to collectd,drupal_dblog,gelf,gemfire,imap,irc,lumberjack,s3,snmptrap,sqs,syslog,twitter,udp,xmpp,zenoss
   "non_admin_filters" : string, //defaults to advisor,alter,anonymize,checksum,cidr,cipher,clone,collate,csv,date,dns,drop,elapsed,extractnumbers,fingerprint,geoip,gelfify,grep,grok,grokdiscovery,l18n,json,json_encode,kv,metaevent,metrics,multiline,mutate,noop,prune,punct,railsparallelrequest,range,sleep,split,sumnumbers,syslog_pri,throttle,translate,unique,urldecode,useragent,uuid,wms,wmts,xml
   "non_admin_outputs" : string  //defaults to ""
} 

Where

base_dir: TODO working_dir: TODO master_config_dir: TODO slave_config_dir: TODO binary_path: TODO restart_file: TODO hadoop_mount_root: TODO non_admin_inputs: TODO non_admin_filters: TODO non_admin_outputs: TODO

Example config (using v1 JS block):

                "harvest_configs": [
                    {
                        "config": {
                            "debug_verbosity": true,
                            "output_override": "elasticsearch",
                            "script": "$$SCRIPT_logstash$$",
                            "write_settings_override":{
                                 "batch_flush_interval":25,
                                 "batch_max_objects":3000
                            }
                        },
                        "enabled": true,
                        "library_ids_or_names": [],
                        "name": "harvester_1"
                    }
                ],
                "scripting": {
                    "logstash": {
                        "script": "input {\n  s3 {\n//s3 fields here\n }\n}\nfilter\n{\n//filter fields here\n}",
                        "separator_regex": "//ALEPH2_MODULE-.*"
                    },
                    "sub_prefix": "$$SCRIPT_",
                    "sub_suffix": "$$"
                },
               //...rest of bucket
⚠️ **GitHub.com Fallback** ⚠️