Running Distributed Mode - garyrussell/spring-xd GitHub Wiki
The Spring XD distributed runtime (DIRT) supports distribution of processing tasks across multiple nodes. See Getting Started for information on running Spring XD as a single node.
The XD distributed runtime architecture consists of the following distributed components:
-
Admin - Manages Stream and Job deployments and other end user operations and provides REST services to access runtime state, system metrics, and analytics
-
Container - Hosts deployed Modules (stream processing tasks) and batch jobs
-
ZooKeeper - Provides all runtime information for the XD cluster. Tracks running containers, in which containers modules and jobs are deployed, stream definitions, deployment manifests, and the like, see XD Distributed Runtime for an overview on how XD uses ZooKeeper.
-
Spring Batch Job Repository Database - An RDBMS is required for jobs. The XD distribution comes with HSQLDB, but this is not appropriate for a production installation. XD supports any JDBC compliant database.
-
A Message Broker - Used for data transport. XD data transport is designed to be pluggable. Currently XD supports Rabbit MQ and Redis for messaging during stream and job processing, and Kafka for messaging during stream processing only. Please note that support for job processing using Kafka as transport is not currently available, and is forthcoming by Spring XD 1.1.0.RELEASE. A production installation must configure one of these transport options. At this time, Rabbit MQ is recommended for production scenarios, as Redis support is not intended to be used for production use, and Kafka support in XD is still under development and scheduled to be fully available by Spring XD 1.1.0.RELEASE. In either case, a separate server must be running to provide the messaging middleware.
-
Analytics Repository - XD currently uses Redis to store the counters and gauges provided Analytics
In addition, XD provides a Command Line Interface (CLI), XD Shell as well as a web application, XD-UI to interact with the XD runtime.
The XD distribution provides shell scripts to start its runtime components under the xd directory of the XD installation:
Whether you are running _xd-admin, xd-container or even xd-singlenode you can always get help by typing the command followed by --help. For example:
xd/bin/xd-admin --help
_____ __ _______
/ ___| (-) \ \ / / _ \
\ `--. _ __ _ __ _ _ __ __ _ \ V /| | | |
`--. \ '_ \| '__| | '_ \ / _` | / ^ \| | | |
/\__/ / |_) | | | | | | | (_| | / / \ \ |/ /
\____/| .__/|_| |_|_| |_|\__, | \/ \/___/
| | __/ |
|_| |___/
{appversion} eXtreme Data
Started : AdminServerApplication
Documentation: https://github.com/spring-projects/spring-xd/wiki
Usage:
--analytics [redis] : How to persist analytics such as counters and gauges
--help (-?, -h) : Show this help screen
--httpPort <httpPort> : Http port for the REST API server
--mgmtPort <mgmtPort> : The port for the management server
-
analytics - The data store that will be used to store the analytics data. The default is redis
-
help - Displays help for the command args. Help information may be accessed with a -? or -h.
-
httpPort - The http port for the REST API server. Defaults to 9393.
-
mgmtPort - The port for the management server. Defaults to the admin server port.
Also, note that it is recommended to use fixed http port for XDAdmin(s). This makes it easy to know the admin server addresses the REST clients (shell, webUI) can point to. If a random port is chosen (with server.port or $PORT set to 0), then one needs to go through the log and find which port admin server’s tomcat starts at.
-
analytics - How to persist analytics such as counters and gauges. The default is redis
-
groups - The assigned group membership for this container as a comma delimited list
-
hadoopDistro - The Hadoop distribution to be used for HDFS access. HDFS is not available if not set.
-
help - Displays help for the command args. Help information may be accessed with a -? or -h.
-
mgmtPort - The port for the management server. Defaults to the container server port.
The distributed runtime requires an RDBMS. The XD distrubution comes with an HSQLDB in memory database for testing purposes, but an alternate is expected. To start HSQLDB:
$ cd hsqldb/bin
$ ./hsqldb-serverTo configure XD to connect to a different RDBMS, have a look at xd/config/servers.yml in the spring:datasource section for details. Note that spring.batch.initializer.enabled is set to true by default which will initialize the Spring Batch schema if it is not already set up. However, if those tables have already been created, they will be unaffected.
If the provided schemas are customized, other values may need to be customized. In the xd/config/servers.yml the following block exposes database specific values for the batch job repository.
spring:
batch:
isolationLevel: ISOLATION_SERIALIZABLE # (1)
clobType: # (2)
dbType: # (3)
maxVarcharLength: 2500 # (4)
tablePrefix: BATCH_ # (5)
validateTransactionState: true # (6)
initializer:
enabled: false # (7)-
Transaction isolation level for the job repository.
-
A special handler for large objects. The default is usually fine, except for some (usually older) versions of Oracle. The default is determined from the data base type.
-
Used to determine what id incremented to use. The default is usually fine, except when the type returned by the datasource should be overridden (GemfireXD for example).
-
Configures how large the maximum message can be stored in a VARCHAR type field.
-
Prefix for repository tables.
-
Flag to determine whether to check for an existing transaction when a JobExecution is created. Defaults to true because it is usually a mistake, and leads to problems with restartability and also to deadlocks in multi-threaded steps.
-
Flag that indicates if the database tables should be created on startup.
Currently XD does not ship with ZooKeeper. At the time of this writing, the compliant version is 3.4.6 and you can download it from here. Please refer to the ZooKeeper Getting Started Guide for more information. A ZooKeeper ensemble consisting of at least three members is recommended for production installations, but a single server is all that is needed to have XD up and running.
You can configure the root path in Zookeeper where an XD cluster’s top level nodes will be created. This allows you to run multiple independent clusters of XD that share a single ZK instance. Add the following to servers.yml to configure. You can also set as an environment variable, system property in the standard manner.
Additionally, various time related settings may be optionally configured for ZooKeeper:
-
sessionTimeout - session timeout in milliseconds
-
connectionTimeout - connection timeout in milliseconds
-
initialRetryWait - initial amount of time to wait between retries after a failed connection (uses the Apache Curator ExponentialBackoffRetry)
-
retryMaxAttempts - maximum number of times to retry after a failed connection (uses the Apache Curator ExponentialBackoffRetry)
zk:
namespace: xd
client:
connect: localhost:2181
sessionTimeout: 60000
connectionTimeout: 30000
initialRetryWait: 1000
retryMaxAttempts: 3
Redis is the default transport when running in distributed mode.
If you already have a running instance of Redis it can be used for Spring XD. By default Spring XD will try to use a Redis instance running on localhost using port 6379. You can change that in the servers.yml file residing in the config/ directory.
If you don’t have a pre-existing installation of Redis, you can use the Spring XD provided instance (For Linux and Mac) which is included in the .zip download. If you are installing using brew or rpm you should install Redis using those installers or download the source tarball and compile Redis yourself. If you used the .zip download then inside the Spring XD installation directory (spring-xd) do:
$ cd redis/bin
$ ./install-redisThis will compile the Redis source tar and add the Redis executables under redis/bin:
-
redis-check-dump
-
redis-sentinel
-
redis-benchmark
-
redis-cli
-
redis-server
You are now ready to start Redis by executing
$ ./redis-server|
Tip
|
For further information on installing Redis in general, please checkout the Redis Quick Start guide. If you are using Mac OS, you can also install Redis via Homebrew |
Presently, Spring XD does not ship Windows binaries for Redis (See XD-151). However, Microsoft is actively working on supporting Redis on Windows. You can download Windows Redis binaries from:
If you try to run Spring XD and Redis is NOT running, you will see the following exception:
11:26:37,830 ERROR main launcher.RedisContainerLauncher:85 - Unable to connect to Redis on localhost:6379; nested exception is com.lambdaworks.redis.RedisException: Unable to connect Redis does not seem to be running. Did you install and start Redis? Please see the Getting Started section of the guide for instructions.
$ redis-serverYou should see something like this:
[35142] 01 May 14:36:28.939 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
[35142] 01 May 14:36:28.940 * Max number of open files set to 10032
_._
_.-``__ ''-._
_.-`` `. `_. ''-._ Redis 2.6.12 (00000000/0) 64 bit
.-`` .-```. ```\/ _.,_ ''-._
( ' , .-` | `, ) Running in stand alone mode
|`-._`-...-` __...-.``-._|'` _.-'| Port: 6379
| `-._ `._ / _.-' | PID: 35142
`-._ `-._ `-./ _.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' | http://redis.io
`-._ `-._`-.__.-'_.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' |
`-._ `-._`-.__.-'_.-' _.-'
`-._ `-.__.-' _.-'
`-._ _.-'
`-.__.-'
[35142] 01 May 14:36:28.941 # Server started, Redis version 2.6.12
[35142] 01 May 14:36:28.941 * The server is now ready to accept connections on port 6379
If you already have a running instance of RabbitMQ it can be used for Spring XD. By default Spring XD will try to use a Rabbit instance running on localhost using port 5672. The default account credentials of guest/guest are assumed. You can change that in the servers.yml file residing in the config/ directory.
If you don’t have a RabbitMQ installation already, head over to http://www.rabbitmq.com and follow the instructions. Packages are provided for Windows, Mac and various flavor of unix/linux.
Start the RabbitMQ broker by running the rabbitmq-server script:
$ rabbitmq-serverYou should see something similar to this:
RabbitMQ 3.3.0. Copyright (C) 2007-2013 GoPivotal, Inc. ## ## Licensed under the MPL. See http://www.rabbitmq.com/ ## ## ########## Logs: /usr/local/var/log/rabbitmq/[email protected] ###### ## /usr/local/var/log/rabbitmq/[email protected] ########## Starting broker... completed with 10 plugins.
Spring XD consists of two servers
-
XDAdmin - controls deployment of modules into containers
-
XDContainer - executes modules
You can start the xd-container and xd-admin servers individually as follows:
xd/bin>$ ./xd-admin
xd/bin>$ ./xd-containerSpring XD uses data transport for sending data from the output of one module to the input of the next module. In general, this requires remote transport between container nodes. The Admin server also uses the data bus to launch batch jobs by sending a message to the job’s launch channel. Since the same transport must be shared by the Admin and all Containers, the transport configuration is centrally configured in xd/config/servers.yml.
The default transport is redis. Open servers.yml with a text editor and you will see the transport configuration near the top. To change the transport, you can uncomment this section and change the transport to rabbit or any other supported transport. Any changes to the transport configuration must be replicated to every XD node in the cluster.
|
Note
|
XD singlenode also supports a --transport command line argument, useful for testing streams under alternate transports. |
#xd: # transport: redis
|
Note
|
If you have multiple XD instances running share a single RabbitMQ server for transport, you may encounter issues if each system contains streams of the same name. We recommend using a different RabbitMQ virtual host for each system. Update the |
By default, the xd-container will store Analytics data in redis. At the time of writing, this is the only supported option (when running in distributed mode). Use the --analytics option to specify another backing store for Analytics data.
xd/bin>$ ./xd-container --analytics redisThere are additional configuration options available for these scripts:
To specify the location of the Spring XD install other than the default configured in the script
export XD_HOME=<Specific XD install directory>To specify the http port of the XDAdmin server,
xd/bin>$ ./xd-admin --httpPort <httpPort>The XDContainer nodes by default start up with server.port 0 (which means they will scan for an available HTTP port). You can disable the HTTP endpoints for the XDContainer by setting server.port=-1. Note that in this case HTTP source support will not work in a PaaS environment because typically it would require XD to bind to a specific port. Both the XDAdmin and XDContainer processes bind to server.port $PORT (i.e. an environment variable if one is available, as is typical in a PaaS).
Spring XD supports the following Hadoop distributions:
-
hadoop25 - Apache Hadoop 2.5.2
-
hadoop26 - Apache Hadoop 2.6.0 (default)
-
phd21 - Pivotal HD 2.1 and 2.0
-
cdh5 - Cloudera CDH 5.3.0
-
hdp22 - Hortonworks Data Platform 2.2
To specify the distribution libraries to use for Hadoop client connections, use the option
--hadoopDistro for the xd-container and xd-shell commands:
xd/bin>$ ./xd-shell --hadoopDistro <distribution>
xd/bin>$ ./xd-admin
xd/bin>$ ./xd-container --hadoopDistro <distribution>Pass in the --help option to see other configuration properties.