Consul tutorial - RichardHightower/reactive-java-book GitHub Wiki
Consul is a great fit for service discovery which is needed for elastic, cloud services, and an essential ingredient for microservices.
We will set up multiple server agents and client agents and walk you through some basics of Consul. We will show you how to use the HTTP API and the Java API.
Before we get started, let's cover some Consul basics. If you know what Consul is or sort of know what Consul is you can skip this next section. If you read the slide deck on Consul or the other article we wrote about Consul, then you can skip this next section. (Microservice Service Discovery with Consul).
##What is Consul?
Consul provides, service discovery, health monitoring and config services for microservice architectures.
With service discovery you can look up services which are organized in the topology of your datacenters. Consul uses client agents and RAFT to provide a consistent view of services. Consul provides a consistent view of configuration as well also using RAFT. Consul provides a microservice interface to a replicated view of your service topology and its configuration. Consul can monitor and change services topology based on health of individual nodes.
Consul provides scalable distributed health checks. Consul only does minimal datacenter to datacenter communication so each datacenter has its own Consul cluster. Consul provides a domain model for managing topology of datacenters, server nodes, and services running on server nodes along with their configuration and current health status.
Consul is like combining the features of a DNS server plus Consistent Key/Value Store like etcd plus features of ZooKeeper for service discovery, and health monitoring like Nagios but all rolled up into a consistent system. Essentially, Consul is all the bits you need to have a coherent domain service model available to provide service discovery, health and replicated config, service topology and health status. Consul also provides a nice REST interface and Web UI to see your service topology and distributed service config. Consul organizes your services in a Catalog called the Service Catalog and then provides a DNS and REST/HTTP/JSON interface to it.
To use Consul you start up an agent process. The Consul agent process is a long running daemon on every member of Consul cluster. The agent process can be run in server mode or client mode. Consul agent clients would run on every physical server or OS virtual machine (if that makes more sense). Client runs on server hosting services. The clients use gossip and RPC calls to stay in sync with Consul.
A client, consul agent running in client mode, forwards request to a server, consul agent running in server mode. Clients are mostly stateless. The client does LAN gossip to the server nodes to communicate changes. A server, consul agent running in server mode, is like a client agent but with more tasks. The consul servers use the RAFT quorum mechanism to see who is the leader. The consul servers maintain cluster state like the Service Catalog. The leader manages a consistent view of config key/value pairs, and service health and topology. Consul servers also handle WAN gossip to other datacenters. Consul server nodes forwards queries to leader, and forward queries to other datacenters.
A Datacenter is fairly obvious. It is anything that allows for fast communication between nodes, with as few or no hops, little or no routing, and in short: high speed communication. This could be an Amazon EC2 availability zone, a networking environment like a subnet, or any private, low latency, high bandwidth network. You can decide exactly what datacenter means for your organization.
Consul server nodes use consensus to determine who is the leader. They have agreement on who is the leader. They use a transactional finite state machine (FSM) to make sure all server nodes are in lock step with critical tasks. They also employ a replicated log, FSM, peer set (who gets the replicated log), quorum (majority of peers agree), committed entry, and a leader. The end result is consistent view of configuration and live services and their topology.
Consul is built on top of Serf. Serf is a full gossip protocol and provides membership, failure detection, and event broadcast mechanisms. Serf provides the clustering for the server nodes. Consul uses LAN gossip for client and servers in the same local network or datacenter. Consul uses WAN Gossip but it is only for servers.
With Consul you run 3 to five servers per datacenter. You use WAN gossip to communicate over Internet or wide area networks. Consul also provides an RPC - Remote Procedure Call which is a request / response mechanism allowing a client to make a request of a server to for example join a cluster, leave a cluster, etc.
Consul client agents maintain its set of service, check registrations and health information. Clients also update health checks and local state. Local agent state (agent client node) is different than catalog state (agent server node). Local Agent notifies Catalog of changes. Updates are synced right away from client agent to the services Catalog which is stored in the triad of servers.
The local agents (agents running in client mode) check to make sure its view of the world matches the catalog, and if not the catalog is updated. An Example would be an client registers a new service check with a client agent, then the client agent notifies the consul server to update catalog that this service check exists. When a check is removed from the local client agent, it is removed from catalog which exists in the cluster of consul servers. If the agents run health checks, and status changes, then update is sent to catalog. Agent is the authority of that node (client and server), and the services that exist on that node. The Agent scope is the Server or Virtual Machine running the agent. The Catalog scope is a single datacenter or local area network or EC2 availability zone. Changes are also synchronized when a change is made and periodically every minute to ten minutes depending on the size of the cluster.
Consul provides the following REST endpoints for interacting with Consul:
- kv - key value
- agent - api for dealing with agent
- catalog - dealing with datacenter catalog
- health - show health checks
- sessions - group operations and manage consistent view
- events - fire fast gossip based events
- acl - setup access control lists and security for Consul
- status - check status
Each endpoint provides operations which either take JSON, request params and deliver JSON responses. In addition to programmatically configuring Consul via REST, you can use JSON config files stored locally to bootstrap config and service topology.
Consul provides a series of health checks. The most common is called the Dead man switch or time to live health check, the service has to dial in with status update every X time period. The other health check is the HTTP ping every N period of time where HTTP code 200-299 means PASS, HTTP code 429 is WARN and any other HTTP status code is FAIL. Lastly you can run a script every with the client agent (or server agent). If the process returns 0, then the check passes, 1 and it is considered a warning, any other process return value is considered a FAIL.
##Setting up a local consul cluster for testing.
You can test consul with one node. To test out the consul RAFT algorithm, we will set up four agents. Three will be servers and one will be a client.
A consul agent runs in server mode or client mode. Server agents maintain the state of the cluster. You want to have three to 5 server agents per datacenter. It should be an odd number to facilitate leader selection. Client agents are used on the servers whose services you want to monitor and report back to the consul servers.
The goal is to make sure you have a complete handle on how to recover and the difference between ctrl-c (graceful shutdown) and kill -9 (I have fallen and can't get up), and when you need to bootstrap and when you do not.
To reduce the amount of command line arguments, I will use a config file to start up the servers.
Here is the server1 config.
####Server 1 configuration server1.json
{
"datacenter": "dc1",
"data_dir": "/opt/consul1",
"log_level": "INFO",
"node_name": "server1",
"server": true,
"bootstrap" : true,
"ports" : {
"dns" : -1,
"http" : 9500,
"rpc" : 9400,
"serf_lan" : 9301,
"serf_wan" : 9302,
"server" : 9300
}
}
When you are starting up a new server cluster you typically put one of the servers in bootstrap
mode. This tells consul that this server is allowed to elect itself as leader. It is like when you ask Dick Chenney to be on the VP selection committee and he nominates himself. He was in bootstrap mode.
Consul servers are boring if they have no one to talk to so let's add two more server config files to the mix.
{
"datacenter": "dc1",
"data_dir": "/opt/consul2",
"log_level": "INFO",
"node_name": "server2",
"server": true,
"ports" : {
"dns" : -1,
"http" : 10500,
"rpc" : 10400,
"serf_lan" : 10301,
"serf_wan" : 10302,
"server" : 10300
},
"start_join" : ["127.0.0.1:9301", "127.0.0.1:11301"]
}
Notice that server 2 points to server 1 and three in the start_join
config option.
{
"datacenter": "dc1",
"data_dir": "/opt/consul3",
"log_level": "INFO",
"node_name": "server3",
"server": true,
"ports" : {
"dns" : -1,
"http" : 11500,
"rpc" : 11400,
"serf_lan" : 11301,
"serf_wan" : 11302,
"server" : 11300
},
"start_join" : ["127.0.0.1:9301", "127.0.0.1:10301"]
}
And server 3 points to server 1 and 2. We are assigning different ports so we can run them all on one box. For obvious reasons, you would not need to assign ports if you were not running them all on the same box. You would never run servers on the same box. It would defeat the purpose of the replication for reliability.
To startup the three servers use the following command lines.
consul agent -config-file=server1.json -ui-dir=/opt/consul/web
We installed the web consul files for the UI in /opt/consul/web. You can download the UI from Consul UI.
consul agent -config-file=server2.json
consul agent -config-file=server3.json
Go ahead and start up the servers.
You should have the following files.
$ tree
.
├── server1.json
├── server1.sh
├── server2.json
├── server2.sh
├── server3.json
└── server3.sh
Run chmod +x on the .sh files so you can run them. Then run them.
The log for server 1 should look like this:
$ ./server1.sh
==> WARNING: Bootstrap mode enabled! Do not enable unless necessary
==> WARNING: It is highly recommended to set GOMAXPROCS higher than 1
==> Starting Consul agent...
==> Starting Consul agent RPC...
==> Consul agent running!
Node name: 'server1'
Datacenter: 'dc1'
Server: true (bootstrap: true)
Client Addr: 127.0.0.1 (HTTP: 9500, HTTPS: -1, DNS: -1, RPC: 9400)
Cluster Addr: 10.0.0.162 (LAN: 9301, WAN: 9302)
Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
Atlas: <disabled>
==> Log data will now stream in as it occurs:
2015/04/14 10:43:04 [INFO] raft: Node at 10.0.0.162:9300 [Follower] entering Follower state
2015/04/14 10:43:04 [INFO] serf: EventMemberJoin: server1 10.0.0.162
2015/04/14 10:43:04 [INFO] serf: EventMemberJoin: server1.dc1 10.0.0.162
2015/04/14 10:43:04 [ERR] agent: failed to sync remote state: No cluster leader
2015/04/14 10:43:04 [INFO] consul: adding server server1 (Addr: 10.0.0.162:9300) (DC: dc1)
2015/04/14 10:43:04 [INFO] consul: adding server server1.dc1 (Addr: 10.0.0.162:9300) (DC: dc1)
2015/04/14 10:43:05 [WARN] raft: Heartbeat timeout reached, starting election
2015/04/14 10:43:05 [INFO] raft: Node at 10.0.0.162:9300 [Candidate] entering Candidate state
2015/04/14 10:43:05 [INFO] raft: Election won. Tally: 1
2015/04/14 10:43:05 [INFO] raft: Node at 10.0.0.162:9300 [Leader] entering Leader state
2015/04/14 10:43:05 [INFO] consul: cluster leadership acquired
2015/04/14 10:43:05 [INFO] consul: New leader elected: server1
2015/04/14 10:43:05 [INFO] raft: Disabling EnableSingleNode (bootstrap)
2015/04/14 10:43:05 [INFO] consul: member 'server1' joined, marking health alive
2015/04/14 10:43:07 [INFO] agent: Synced service 'consul'
It is warning us that we should not start server 1 in bootstrap mode unless we know what we are doing. Since we are just learning consul, let's leave it as such.
Then it does a Dick Chenney, it says it could not find a leader so it makes itself leader.
Now we start up server 2 in another terminal window.
2015/04/14 10:45:56 [INFO] serf: EventMemberJoin: server2 10.0.0.162
2015/04/14 10:45:56 [INFO] consul: adding server server2 (Addr: 10.0.0.162:10300) (DC: dc1)
2015/04/14 10:45:56 [INFO] raft: Added peer 10.0.0.162:10300, starting replication
2015/04/14 10:45:56 [WARN] raft: AppendEntries to 10.0.0.162:10300 rejected, sending older logs (next: 1)
2015/04/14 10:45:56 [INFO] raft: pipelining replication to peer 10.0.0.162:10300
2015/04/14 10:45:56 [INFO] consul: member 'server2' joined, marking health alive
It sees server 2 and marks it as alive. Whoot!
Going back to server 2's output we get
$ ./server2.sh
==> WARNING: It is highly recommended to set GOMAXPROCS higher than 1
==> Starting Consul agent...
==> Starting Consul agent RPC...
==> Joining cluster...
Join completed. Synced with 1 initial agents
==> Consul agent running!
Node name: 'server2'
Datacenter: 'dc1'
Server: true (bootstrap: false)
Client Addr: 127.0.0.1 (HTTP: 10500, HTTPS: -1, DNS: -1, RPC: 10400)
Cluster Addr: 10.0.0.162 (LAN: 10301, WAN: 10302)
Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
Atlas: <disabled>
==> Log data will now stream in as it occurs:
2015/04/14 10:45:56 [INFO] raft: Node at 10.0.0.162:10300 [Follower] entering Follower state
2015/04/14 10:45:56 [INFO] serf: EventMemberJoin: server2 10.0.0.162
2015/04/14 10:45:56 [INFO] serf: EventMemberJoin: server2.dc1 10.0.0.162
2015/04/14 10:45:56 [INFO] agent: (LAN) joining: [127.0.0.1:9301 127.0.0.1:11301]
2015/04/14 10:45:56 [INFO] consul: adding server server2 (Addr: 10.0.0.162:10300) (DC: dc1)
2015/04/14 10:45:56 [INFO] consul: adding server server2.dc1 (Addr: 10.0.0.162:10300) (DC: dc1)
2015/04/14 10:45:56 [INFO] serf: EventMemberJoin: server1 10.0.0.162
2015/04/14 10:45:56 [INFO] consul: adding server server1 (Addr: 10.0.0.162:9300) (DC: dc1)
2015/04/14 10:45:56 [INFO] agent: (LAN) joined: 1 Err: <nil>
2015/04/14 10:45:56 [ERR] agent: failed to sync remote state: No cluster leader
2015/04/14 10:45:56 [WARN] raft: Failed to get previous log: 6 log not found (last: 0)
2015/04/14 10:46:20 [INFO] agent: Synced service 'consul'
It is a bit cryptic. We get some error messages about there being no leader. Then it says it synced. Whoot!
Server 3 startup is a little more smooth because the other two servers are alive!
$ ./server3.sh
==> WARNING: It is highly recommended to set GOMAXPROCS higher than 1
==> Starting Consul agent...
==> Starting Consul agent RPC...
==> Joining cluster...
Join completed. Synced with 2 initial agents
==> Consul agent running!
Node name: 'server3'
Datacenter: 'dc1'
Server: true (bootstrap: false)
Client Addr: 127.0.0.1 (HTTP: 11500, HTTPS: -1, DNS: -1, RPC: 11400)
Cluster Addr: 10.0.0.162 (LAN: 11301, WAN: 11302)
Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
Atlas: <disabled>
==> Log data will now stream in as it occurs:
2015/04/14 10:48:45 [INFO] serf: EventMemberJoin: server3 10.0.0.162
2015/04/14 10:48:45 [INFO] raft: Node at 10.0.0.162:11300 [Follower] entering Follower state
2015/04/14 10:48:45 [INFO] consul: adding server server3 (Addr: 10.0.0.162:11300) (DC: dc1)
2015/04/14 10:48:45 [INFO] serf: EventMemberJoin: server3.dc1 10.0.0.162
2015/04/14 10:48:45 [INFO] agent: (LAN) joining: [127.0.0.1:9301 127.0.0.1:10301]
2015/04/14 10:48:45 [INFO] consul: adding server server3.dc1 (Addr: 10.0.0.162:11300) (DC: dc1)
2015/04/14 10:48:45 [INFO] serf: EventMemberJoin: server1 10.0.0.162
2015/04/14 10:48:45 [INFO] serf: EventMemberJoin: server2 10.0.0.162
2015/04/14 10:48:45 [INFO] consul: adding server server1 (Addr: 10.0.0.162:9300) (DC: dc1)
2015/04/14 10:48:45 [INFO] consul: adding server server2 (Addr: 10.0.0.162:10300) (DC: dc1)
2015/04/14 10:48:45 [INFO] agent: (LAN) joined: 2 Err: <nil>
2015/04/14 10:48:45 [ERR] agent: failed to sync remote state: No cluster leader
2015/04/14 10:48:45 [WARN] raft: Failed to get previous log: 12 log not found (last: 0)
2015/04/14 10:49:07 [INFO] agent: Synced service 'consul'
```
Now are three servers are up and their state is synced.
Now in the server 1 log we have two messages about being in-sync.
```bash
2015/04/14 10:45:56 [INFO] serf: EventMemberJoin: server2 10.0.0.162
2015/04/14 10:45:56 [INFO] consul: adding server server2 (Addr: 10.0.0.162:10300) (DC: dc1)
2015/04/14 10:45:56 [INFO] raft: Added peer 10.0.0.162:10300, starting replication
2015/04/14 10:45:56 [WARN] raft: AppendEntries to 10.0.0.162:10300 rejected, sending older logs (next: 1)
2015/04/14 10:45:56 [INFO] raft: pipelining replication to peer 10.0.0.162:10300
2015/04/14 10:45:56 [INFO] consul: member 'server2' joined, marking health alive
2015/04/14 10:48:45 [INFO] serf: EventMemberJoin: server3 10.0.0.162
2015/04/14 10:48:45 [INFO] consul: adding server server3 (Addr: 10.0.0.162:11300) (DC: dc1)
2015/04/14 10:48:45 [INFO] raft: Added peer 10.0.0.162:11300, starting replication
2015/04/14 10:48:45 [INFO] consul: member 'server3' joined, marking health alive
2015/04/14 10:48:45 [WARN] raft: AppendEntries to 10.0.0.162:11300 rejected, sending older logs (next: 1)
2015/04/14 10:48:45 [INFO] raft: pipelining replication to peer 10.0.0.162:11300
```
All is well. We have three healthy nodes.
If we go to server2 terminal and shut it down with control-C, it shuts down gracefully.
Now go to server 1 output and watch. It keeps trying to reconnect to server 2.
Now start up server 2 again. Notice how it reconnects and the look at the logs for server1 and server3.
Now do the same with server3. Shut it down. Watch the logs of server 1 and server 2.
When you see this log:
#### Server 3 log on reconnect
```bash
2015/04/14 11:01:53 [WARN] raft: Failed to get previous log: 23 log not found (last: 21)
It means it could not find a log so it has to advance and replicate data after index 21. Consul keeps a version number of the data called an index and as servers come online, they look at their index and then ask for stuff that happened after that index so they can sync changes.
2015/04/14 11:01:53 [INFO] raft: Removed ourself, transitioning to follower
2015/04/14 11:02:09 [INFO] agent: Synced service 'consul'
You can shutdown any two servers, and then bring them back up. The leadership will change to the server that stayed up. Do not shut down all three. Then it gets a bit trickier to get them bootstrapped again.
Each server maintains its own state. Look at the files in each servers data folder.
$ pwd
/opt/consul1
$ tree
.
├── checkpoint-signature
├── raft
│ ├── mdb
│ │ ├── data.mdb
│ │ └── lock.mdb
│ ├── peers.json
│ └── snapshots
├── serf
│ ├── local.snapshot
│ └── remote.snapshot
└── tmp
└── state506459124
├── data.mdb
└── lock.mdb
6 directories, 8 files
Look at the snapshot files and the contents of the json peers.json.
Now shut down all three with ctrl-C. Now start them up again. They will not be able to select a leader because ctrl-C allows them to leave gracefully. Do I understand this? No. But I know the solution. Because we left with ctrl-c it was a graceful leave and now have to delete the peers so it could reconnect which almost makes sense but not quite but I don't care because it works. We just delete the peers.json file.
I went ahead and changed the configuration file for them to be able to be restarted.
{
"datacenter": "dc1",
"data_dir": "/opt/consul1",
"log_level": "INFO",
"node_name": "server1",
"server": true,
"ports" : {
"dns" : -1,
"http" : 9500,
"rpc" : 9400,
"serf_lan" : 9301,
"serf_wan" : 9302,
"server" : 9300
},
"retry_join" : [
"127.0.0.1:9301",
"127.0.0.1:10301",
"127.0.0.1:11301"]
}
{
"datacenter": "dc1",
"data_dir": "/opt/consul2",
"log_level": "INFO",
"node_name": "server2",
"server": true,
"ports" : {
"dns" : -1,
"http" : 10500,
"rpc" : 10400,
"serf_lan" : 10301,
"serf_wan" : 10302,
"server" : 10300
},
"retry_join" : [
"127.0.0.1:9301",
"127.0.0.1:10301",
"127.0.0.1:11301"
]
}
{
"datacenter": "dc1",
"data_dir": "/opt/consul3",
"log_level": "INFO",
"node_name": "server3",
"server": true,
"ports" : {
"dns" : -1,
"http" : 11500,
"rpc" : 11400,
"serf_lan" : 11301,
"serf_wan" : 11302,
"server" : 11300
},
"retry_join" : [
"127.0.0.1:9301",
"127.0.0.1:10301",
"127.0.0.1:11301"
]
}
In order to startup clean, you have to start one of the servers in bootstrap mode like we had server1.json setup before and then delete all of the peers.json files.
$ pwd
/opt
$ find . -name "peers.json"
./consul1/raft/peers.json
./consul2/raft/peers.json
./consul3/raft/peers.json
$ find . -name "peers.json" | xargs rm
Now restart them.
As long as one of the servers does not startup in bootstrap mode, you will get this all day long.
2015/04/14 13:02:31 [ERR] agent: failed to sync remote state: rpc error: No cluster leader
2015/04/14 13:02:32 [ERR] agent: failed to sync remote state: rpc error: No cluster leader
I have another set of bootstrap json files for each server with a startup script. You need to delete the peer file and then start up one of the servers in bootstrap mode using these scripts. See the discussion at issue 526, and then re-read outage recovery.
{
"datacenter": "dc1",
"data_dir": "/opt/consul1",
"log_level": "INFO",
"node_name": "server1",
"server": true,
"bootstrap": true,
"ports" : {
"dns" : -1,
"http" : 9500,
"rpc" : 9400,
"serf_lan" : 9301,
"serf_wan" : 9302,
"server" : 9300
}
}
{
"datacenter": "dc1",
"data_dir": "/opt/consul2",
"log_level": "INFO",
"node_name": "server2",
"server": true,
"bootstrap": true,
"ports" : {
"dns" : -1,
"http" : 10500,
"rpc" : 10400,
"serf_lan" : 10301,
"serf_wan" : 10302,
"server" : 10300
}
}
{
{
"datacenter": "dc1",
"data_dir": "/opt/consul3",
"log_level": "INFO",
"node_name": "server3",
"server": true,
"bootstrap": true,
"ports" : {
"dns" : -1,
"http" : 11500,
"rpc" : 11400,
"serf_lan" : 11301,
"serf_wan" : 11302,
"server" : 11300
}
}
$ cat server1boot.sh
export GOMAXPROCS=10
consul agent -config-file=server1boot.json \
-retry-interval=3s \
-ui-dir=/opt/consul/web
There is a startup script per server.
Remember to delete the peers.json file.
$ pwd
/opt
$ find . -name "peers.json" | xargs rm
Now you are set. Pick any server, run it in bootstrap mode.
$ ./server3boot.sh
==> WARNING: Bootstrap mode enabled! Do not enable unless necessary
==> Starting Consul agent...
==> Starting Consul agent RPC...
==> Consul agent running!
Node name: 'server3'
Datacenter: 'dc1'
Server: true (bootstrap: true)
Client Addr: 127.0.0.1 (HTTP: 11500, HTTPS: -1, DNS: -1, RPC: 11400)
Cluster Addr: 10.0.0.162 (LAN: 11301, WAN: 11302)
Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
Atlas: <disabled>
....
$ ./server2.sh
==> Starting Consul agent...
==> Starting Consul agent RPC...
==> Consul agent running!
Node name: 'server2'
Datacenter: 'dc1'
Server: true (bootstrap: false)
Client Addr: 127.0.0.1 (HTTP: 10500, HTTPS: -1, DNS: -1, RPC: 10400)
Cluster Addr: 10.0.0.162 (LAN: 10301, WAN: 10302)
Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
Atlas: <disabled>
...
$ ./server1.sh
==> Starting Consul agent...
==> Starting Consul agent RPC...
==> Consul agent running!
Node name: 'server1'
Datacenter: 'dc1'
Server: true (bootstrap: false)
Client Addr: 127.0.0.1 (HTTP: 9500, HTTPS: -1, DNS: -1, RPC: 9400)
Cluster Addr: 10.0.0.162 (LAN: 9301, WAN: 9302)
Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
Atlas: <disabled>
To force a failure, pick a server process and kill -9 it.
Also if you kill all of the servers at once with kill -9 as in:
$ pkill -9 consul
You do not have to bootstrap any of them.
Thus you can run:
./server1.sh
./server2.sh
./server3.sh
##Setting up a client to test our local cluster
The Agent is the center of the Consul world. The agent must run on every node that is part of a Consul cluster. You have two types of agents: clients and servers. The agent servers are the information hub and they store data for the cluster. They also replicate the date to the other server nodes and to the client nodes. The client nodes are what sit on every service box. Agent client nodes are lightweight instances that sit on every server in the cluster and rely on the agent servers for most of their state.
To start up consul in client mode we will use the following config file.
$ cat client1.json
{
"datacenter": "dc1",
"data_dir": "/opt/consulclient",
"log_level": "INFO",
"node_name": "client1",
"server": false,
"ports" : {
"dns" : -1,
"http" : 8500,
"rpc" : 8400,
"serf_lan" : 8301,
"serf_wan" : 8302,
"server" : 8300
},
"start_join" : [
"127.0.0.1:9301",
"127.0.0.1:10301",
"127.0.0.1:11301"
]
}
You will notice that we are not in server mode which puts us in client mode. The client is addressable on port 8XXX. We specified where to see the servers using the start_join key.
consul agent -config-file=client1.json
Now when we start up consul, we just specify the client1.json file.
Now we can get info about our cluster.
$ consul info
agent:
check_monitors = 0
check_ttls = 0
checks = 0
services = 0
build:
prerelease =
revision = 0c7ca91c
version = 0.5.0
consul:
known_servers = 3
server = false
runtime:
arch = amd64
cpu_count = 8
goroutines = 34
max_procs = 1
os = darwin
version = go1.4.2
serf_lan:
encrypted = false
event_queue = 0
event_time = 6
failed = 0
intent_queue = 0
left = 0
member_time = 42
members = 4
query_queue = 0
query_time = 1
We can list the members in this cluster:
$ consul members
Node Address Status Type Build Protocol
client1 10.0.0.162:8301 alive client 0.5.0 2
server2 10.0.0.162:10301 alive server 0.5.0 2
server1 10.0.0.162:9301 alive server 0.5.0 2
server3 10.0.0.162:11301 alive server 0.5.0 2
You can also use the HTTP interface to see what members are in the cluster.
$ curl http://localhost:8500/v1/agent/members
[
{
"Name": "server2",
"Addr": "10.0.0.162",
"Port": 10301,
"Tags": {
"build": "0.5.0:0c7ca91c",
"dc": "dc1",
"port": "10300",
"role": "consul",
"vsn": "2",
"vsn_max": "2",
"vsn_min": "1"
},
"Status": 1,
"ProtocolMin": 1,
"ProtocolMax": 2,
"ProtocolCur": 2,
"DelegateMin": 2,
"DelegateMax": 4,
"DelegateCur": 4
},
{
"Name": "server1",
"Addr": "10.0.0.162",
"Port": 9301,
"Tags": {
"build": "0.5.0:0c7ca91c",
"dc": "dc1",
"port": "9300",
"role": "consul",
"vsn": "2",
"vsn_max": "2",
"vsn_min": "1"
},
"Status": 1,
"ProtocolMin": 1,
"ProtocolMax": 2,
"ProtocolCur": 2,
"DelegateMin": 2,
"DelegateMax": 4,
"DelegateCur": 4
},
{
"Name": "server3",
"Addr": "10.0.0.162",
"Port": 11301,
"Tags": {
"build": "0.5.0:0c7ca91c",
"dc": "dc1",
"port": "11300",
"role": "consul",
"vsn": "2",
"vsn_max": "2",
"vsn_min": "1"
},
"Status": 1,
"ProtocolMin": 1,
"ProtocolMax": 2,
"ProtocolCur": 2,
"DelegateMin": 2,
"DelegateMax": 4,
"DelegateCur": 4
},
{
"Name": "client1",
"Addr": "10.0.0.162",
"Port": 8301,
"Tags": {
"build": "0.5.0:0c7ca91c",
"dc": "dc1",
"role": "node",
"vsn": "2",
"vsn_max": "2",
"vsn_min": "1"
},
"Status": 1,
"ProtocolMin": 1,
"ProtocolMax": 2,
"ProtocolCur": 2,
"DelegateMin": 2,
"DelegateMax": 4,
"DelegateCur": 4
}
]
You can try out other agent HTTP calls by looking at the Agent HTTP API.
You can use the HTTP API from any client or server:
$ curl http://localhost:8500/v1/catalog/datacenters
["dc1"]
$ curl http://localhost:9500/v1/catalog/datacenters
["dc1"]
$ curl http://localhost:10500/v1/catalog/datacenters
["dc1"]
$ curl http://localhost:10500/v1/catalog/nodes
[{"Node":"client1","Address":"10.0.0.162"},
{"Node":"server1","Address":"10.0.0.162"},
{"Node":"server2","Address":"10.0.0.162"},
{"Node":"server3","Address":"10.0.0.162"}]
$ curl http://localhost:8500/v1/catalog/services
{"consul":[]}
$ curl --upload-file register_service.json \
http://localhost:8500/v1/agent/service/register
{
"ID": "myservice1",
"Name": "myservice",
"Address": "127.0.0.1",
"Port": 8080,
"Check": {
"Interval": "10s",
"TTL": "15s"
}
}
The above registers a new service called myservice. Name
is the name of the service while ID
is a specific instance of that service.
The Check we installed expects the service to check in with the servers every 15s.
Once you register the service, then you can see it from the agent as follows:
$ curl http://localhost:8500/v1/agent/services
{
"myservice1": {
"ID": "myservice1",
"Service": "myservice",
"Tags": null,
"Address": "127.0.0.1",
"Port": 8080
}
}
To check this services health, we can use this endpoint.
$ curl http://localhost:8500/v1/health/service/myservice
[
{
"Node": {
"Node": "client1",
"Address": "10.0.0.162"
},
"Service": {
"ID": "myservice1",
"Service": "myservice",
"Tags": null,
"Address": "127.0.0.1",
"Port": 8080
},
"Checks": [
{
"Node": "client1",
"CheckID": "service:myservice1",
"Name": "Service 'myservice' check",
"Status": "critical",
"Notes": "",
"Output": "TTL expired",
"ServiceID": "myservice1",
"ServiceName": "myservice"
},
{
"Node": "client1",
"CheckID": "serfHealth",
"Name": "Serf Health Status",
"Status": "passing",
"Notes": "",
"Output": "Agent alive and reachable",
"ServiceID": "",
"ServiceName": ""
}
]
}
]
Here we can see that the health status is critical because the TTL failed.
To tell consul that our fictional service is passing, every 15 seconds we need to send it this:
$ curl http://localhost:8500/v1/agent/check/pass/service:myservice1
$ curl http://localhost:8500/v1/health/service/myservice
[
{
"Node": {
"Node": "client1",
"Address": "10.0.0.162"
},
"Service": {
"ID": "myservice1",
"Service": "myservice",
"Tags": null,
"Address": "127.0.0.1",
"Port": 8080
},
"Checks": [
{
"Node": "client1",
"CheckID": "service:myservice1",
"Name": "Service 'myservice' check",
"Status": "passing",
"Notes": "",
"Output": "",
"ServiceID": "myservice1",
"ServiceName": "myservice"
},
{
"Node": "client1",
"CheckID": "serfHealth",
"Name": "Serf Health Status",
"Status": "passing",
"Notes": "",
"Output": "Agent alive and reachable",
"ServiceID": "",
"ServiceName": ""
}
]
}
]
Notice that our status went form "Status": "critical"
to "Status": "passing"
.
You can also mark a service as warn or critical using an HTTP call.
$ curl http://localhost:8500/v1/agent/check/warn/service:myservice1
$ curl http://localhost:8500/v1/agent/check/fail/service:myservice1
Note that you can also query the health status from any node. All nodes in the cluster know where myservice services are running. All server nodes and all client nodes.
$ curl http://localhost:9500/v1/health/service/myservice
Let's start up another client and install a service in it. We created another client startup script.
$ cat client2.json
{
"datacenter": "dc1",
"data_dir": "/opt/consulclient2",
"log_level": "INFO",
"node_name": "client2",
"server": false,
"ports" : {
"dns" : -1,
"http" : 7500,
"rpc" : 7400,
"serf_lan" : 7301,
"serf_wan" : 7302,
"server" : 7300
},
"start_join" : [
"127.0.0.1:9301",
"127.0.0.1:10301",
"127.0.0.1:11301"
]
}
$ cat client2.sh
consul agent -config-file=client2.json
Then we will create another register service json file.
$ cat register_service2.json
{
"ID": "myservice2",
"Name": "myservice",
"Address": "127.0.0.1",
"Port": 9090,
"Check": {
"Interval": "10s",
"TTL": "15s"
}
}
#### Running service registry against client2 agent
```bash
curl --upload-file register_service2.json \
http://localhost:7500/v1/agent/service/register
####Make both services healthy
curl http://localhost:8500/v1/agent/check/pass/service:myservice1
curl http://localhost:7500/v1/agent/check/pass/service:myservice2
####Query them
curl http://localhost:9500/v1/health/service/myservice
[
{
"Node": {
"Node": "client2",
"Address": "10.0.0.162"
},
"Service": {
"ID": "myservice2",
"Service": "myservice",
"Tags": null,
"Address": "127.0.0.1",
"Port": 9090
},
"Checks": [
{
"Node": "client2",
"CheckID": "service:myservice2",
"Name": "Service 'myservice' check",
"Status": "passing",
"Notes": "",
"Output": "",
"ServiceID": "myservice2",
"ServiceName": "myservice"
},
{
"Node": "client2",
"CheckID": "serfHealth",
"Name": "Serf Health Status",
"Status": "passing",
"Notes": "",
"Output": "Agent alive and reachable",
"ServiceID": "",
"ServiceName": ""
}
]
},
{
"Node": {
"Node": "client1",
"Address": "10.0.0.162"
},
"Service": {
"ID": "myservice1",
"Service": "myservice",
"Tags": null,
"Address": "127.0.0.1",
"Port": 8080
},
"Checks": [
{
"Node": "client1",
"CheckID": "service:myservice1",
"Name": "Service 'myservice' check",
"Status": "passing",
"Notes": "",
"Output": "",
"ServiceID": "myservice1",
"ServiceName": "myservice"
},
{
"Node": "client1",
"CheckID": "serfHealth",
"Name": "Serf Health Status",
"Status": "passing",
"Notes": "",
"Output": "Agent alive and reachable",
"ServiceID": "",
"ServiceName": ""
}
]
}
]
##Accessing Consul from Java
Consul has APIs for different language bindings for Python, Ruby, Go, etc. See Consul download page for a complete list of language bindings. There are two for Java Consul Client, and Consul API. QBit, the microservice lib for Java has one as well which is a fork of the consul client.
QBit adds a simplified interface to Consul demonstrated as follows:
package io.advantageous.examples;
import io.advantageous.boon.core.Sys;
import io.advantageous.consul.discovery.ConsulServiceDiscoveryBuilder;
import io.advantageous.qbit.service.discovery.ServiceDefinition;
import io.advantageous.qbit.service.discovery.ServiceDiscovery;
import java.util.List;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class ServiceDiscoveryMain {
public static void main(String... args) throws Exception {
final ConsulServiceDiscoveryBuilder consulServiceDiscoveryBuilder =
ConsulServiceDiscoveryBuilder.consulServiceDiscoveryBuilder();
/* Attach to agent 2. */
final ServiceDiscovery clientAgent2 = consulServiceDiscoveryBuilder.setConsulPort(7500).build();
/* Attach to agent 1. */
final ServiceDiscovery clientAgent1 = consulServiceDiscoveryBuilder.setConsulPort(8500).build();
/* Start up service discovery. */
clientAgent2.start();
clientAgent1.start();
/* Check in the services using their ids. */
clientAgent1.checkInOk("myservice1");
clientAgent2.checkInOk("myservice2");
final ExecutorService executorService = Executors.newSingleThreadExecutor();
executorService.submit(() -> {
/* Let checks them in occasionally, and get a list of healthy nodes. */
for (int index = 0; index < 100; index++) {
System.out.println("Checking in....");
/* Check in myservice1. */
clientAgent1.checkInOk("myservice1");
/* Check in myservice2. */
clientAgent2.checkInOk("myservice2");
Sys.sleep(100);
/* Get a list of only the healthy nodes. */
List<ServiceDefinition> serviceDefinitions = clientAgent1.loadServices("myservice");
serviceDefinitions.forEach(serviceDefinition
-> System.out.println("FROM client agent 1: " + serviceDefinition));
serviceDefinitions = clientAgent2.loadServices("myservice");
serviceDefinitions.forEach(serviceDefinition
-> System.out.println("FROM client agent 2: " + serviceDefinition));
Sys.sleep(10_000);
}
/* After 100 * 10 seconds stop. */
clientAgent2.stop();
clientAgent1.stop();
});
}
}
The above connects to client agent 2 clientAgent2
and client agent 1 clientAgent1
. We then use the service discovery object to check in the services using their ids clientAgent1.checkInOk("myservice1")
. Then every ten seconds we check the services in (we have 15 seconds before the TTL expires). Then for good measure we load the available services serviceDefinitions = clientAgent1.loadServices("myservice")
and print them out to the console. This simplified interface makes it easy to register services with consul and do periodic check-in.
Here is the gradle build script for the above:
apply plugin: 'java'
apply plugin:'application'
sourceCompatibility = 1.8
version = '1.0'
mainClassName = "io.advantageous.examples.Main"
repositories {
mavenLocal()
mavenCentral()
}
dependencies {
testCompile group: 'junit', name: 'junit', version: '4.11'
compile group: 'io.advantageous.qbit', name: 'qbit-vertx', version: '0.7.3-SNAPSHOT'
compile group: 'io.advantageous.qbit', name: 'qbit-consul-client', version: '0.7.3-SNAPSHOT'
}
To learn more about the build script and QBit see QBit Restful Microservices.
Let's unregister those services with curl and then register them with our Java service discovery API.
$ curl http://localhost:7500/v1/agent/service/deregister/myservice2
$ curl http://localhost:8500/v1/agent/service/deregister/myservice1
When you unregister a service, you also unregister its check.
2015/04/14 17:09:05 [INFO] agent: Deregistered service 'myservice2'
2015/04/14 17:09:05 [INFO] agent: Deregistered check 'service:myservice2'
Now we will register our services with the ServiceDiscovery API.
/* Attach to agent 2. */
final ServiceDiscovery clientAgent2 = consulServiceDiscoveryBuilder.setConsulPort(7500).build();
/* Attach to agent 1. */
final ServiceDiscovery clientAgent1 = consulServiceDiscoveryBuilder.setConsulPort(8500).build();
/* Start up service discovery. */
clientAgent2.start();
clientAgent1.start();
/* Register the services. */
clientAgent1.registerWithIdAndTimeToLive(
"myservice", "myservice1", 8080, 10);
clientAgent2.registerWithIdAndTimeToLive(
"myservice", "myservice2", 9090, 10);
....
//The rest of the code as before.
The method clientAgent2.registerWithIdAndTimeToLive(..)
allows us to register the service and set its TTL in one swipe.
You can see the services are in fact registered and deemed healthy. Use the Consul API to see the myservice services by going to this url http://localhost:9500/ui/#/dc1/services/myservice
.
What QBit does is it does a long poll to Consul to get the service info. And then it updates the services when they change. This way one can tolerate the failure of a microservice. The microservice call could fail if a supplier in not available, the client, which could be another upstream service has to respond to this as gracefully as possible. One could implement a Circuit Breaker but you will want to know when a downstream or service in a server pool goes down. This is where Consul and its health monitoring come into play.
Since microservices can fail whenever they want without notice, it's important to quickly detect the failures and restore the service, or talk to another service in that same peer group or provide some deprecated response as your automation kicks in to recover the failed downstream microservices. Consul aids in providing real-time health monitoring of Microservices . The TTL is just one of the ways.
Let's demonstrate this by splitting our Java example out into two main methods, one for each microservice.
public class MyService1Main {
public static void main(String... args) throws Exception {
final ConsulServiceDiscoveryBuilder consulServiceDiscoveryBuilder =
ConsulServiceDiscoveryBuilder.consulServiceDiscoveryBuilder();
/* Attach to agent 1. */
final ServiceDiscovery clientAgent1 = consulServiceDiscoveryBuilder.setConsulPort(8500).build();
/* Start up service discovery. */
clientAgent1.start();
/* Register the services. */
clientAgent1.registerWithIdAndTimeToLive("myservice", "myservice1", 8080, 10);
/* Check in the services using their ids. */
clientAgent1.checkInOk("myservice1");
final ExecutorService executorService = Executors.newSingleThreadExecutor();
executorService.submit(() -> {
/* Let checks them in occasionally, and get a list of healthy nodes. */
for (int index = 0; index < 100; index++) {
System.out.println("Checking in....");
/* Check in myservice1. */
clientAgent1.checkInOk("myservice1");
Sys.sleep(100);
/* Get a list of only the healthy nodes. */
List<ServiceDefinition> serviceDefinitions = clientAgent1.loadServices("myservice");
serviceDefinitions.forEach(serviceDefinition
-> System.out.println("FROM client agent 1: " + serviceDefinition));
Sys.sleep(10_000);
}
/* After 100 * 10 seconds stop. */
clientAgent1.stop();
});
}
...
...
...
public class MyService2Main {
public static void main(String... args) throws Exception {
final ConsulServiceDiscoveryBuilder consulServiceDiscoveryBuilder =
ConsulServiceDiscoveryBuilder.consulServiceDiscoveryBuilder();
/* Attach to agent 2. */
final ServiceDiscovery clientAgent2 = consulServiceDiscoveryBuilder.setConsulPort(7500).build();
/* Start up service discovery. */
clientAgent2.start();
/* Register the services. */
clientAgent2.registerWithIdAndTimeToLive("myservice", "myservice2", 9090, 10);
/* Check in the services using their ids. */
clientAgent2.checkInOk("myservice2");
final ExecutorService executorService = Executors.newSingleThreadExecutor();
executorService.submit(() -> {
/* Let checks them in occasionally, and get a list of healthy nodes. */
for (int index = 0; index < 100; index++) {
System.out.println("Checking in....");
/* Check in myservice2. */
clientAgent2.checkInOk("myservice2");
Sys.sleep(100);
/* Get a list of only the healthy nodes. */
List<ServiceDefinition> serviceDefinitions = clientAgent2.loadServices("myservice");
serviceDefinitions.forEach(serviceDefinition
-> System.out.println("FROM client agent 2: " + serviceDefinition));
Sys.sleep(10_000);
}
/* After 100 * 10 seconds stop. */
clientAgent2.stop();
});
}
Now run both, you shoulds see this in the output:
Checking in....
FROM client agent 1: ServiceDefinition{status=PASS, id='myservice1', name='myservice', host='10.0.0.162', port=8080}
Checking in....
FROM client agent 1: ServiceDefinition{status=PASS, id='myservice1', name='myservice', host='10.0.0.162', port=8080}
FROM client agent 1: ServiceDefinition{status=PASS, id='myservice2', name='myservice', host='10.0.0.162', port=9090}
Checking in....
FROM client agent 1: ServiceDefinition{status=PASS, id='myservice1', name='myservice', host='10.0.0.162', port=8080}
FROM client agent 1: ServiceDefinition{status=PASS, id='myservice2', name='myservice', host='10.0.0.162', port=9090}
...
Checking in....
FROM client agent 2: ServiceDefinition{status=PASS, id='myservice1', name='myservice', host='10.0.0.162', port=8080}
FROM client agent 2: ServiceDefinition{status=PASS, id='myservice2', name='myservice', host='10.0.0.162', port=9090}
Checking in....
FROM client agent 2: ServiceDefinition{status=PASS, id='myservice1', name='myservice', host='10.0.0.162', port=8080}
FROM client agent 2: ServiceDefinition{status=PASS, id='myservice2', name='myservice', host='10.0.0.162', port=9090}
Now kill the first process. You should see its service disappear. It happens becuse it was not able to check in with Consul.
Checking in....
FROM client agent 2: ServiceDefinition{status=PASS, id='myservice2', name='myservice', host='10.0.0.162', port=9090}
Checking in....
FROM client agent 2: ServiceDefinition{status=PASS, id='myservice2', name='myservice', host='10.0.0.162', port=9090}
Let's leave off there for now. Next time we will talk about ServicePool
and ServicePoolListener
to listen for changes to services for a particular service pool.
#Conclusion We covered how to setup a cluster. We covered how to recover a cluster if for some reason it gets into a situation where RAFT cannot elect a new leader. We covered how cluster nodes recover. We covered the theory behind how Consul works. Then we covered working with the HTTP client. Then we covered using a Java API to register services and do health checks.
Next time we will cover ServicePool
s, and the ServicePoolListener
s so we can easily be notified when a service was added or removed. We will also implement HTTP check and a script check, and show how to incorporate a ServiceDiscovery
and a ServicePool
into a QBit, the Java microservice lib for Microservice Architecture, service to detect when a service goes down.
Reactive Programming, Java Microservices, Rick Hightower
Java Microservices Architecture
[Microservice Service Discovery with Consul] (http://www.mammatustech.com/Microservice-Service-Discovery-with-Consul)
[Reactive Microservices] (http://www.mammatustech.com/reactive-microservices)
[High Speed Microservices] (http://www.mammatustech.com/high-speed-microservices)