Consul tutorial - RichardHightower/reactive-java-book GitHub Wiki

Consul is a great fit for service discovery which is needed for elastic, cloud services, and an essential ingredient for microservices.

We will set up multiple server agents and client agents and walk you through some basics of Consul. We will show you how to use the HTTP API and the Java API.

Before we get started, let's cover some Consul basics. If you know what Consul is or sort of know what Consul is you can skip this next section. If you read the slide deck on Consul or the other article we wrote about Consul, then you can skip this next section. (Microservice Service Discovery with Consul).

##What is Consul?

Consul provides, service discovery, health monitoring and config services for microservice architectures.

With service discovery you can look up services which are organized in the topology of your datacenters. Consul uses client agents and RAFT to provide a consistent view of services. Consul provides a consistent view of configuration as well also using RAFT. Consul provides a microservice interface to a replicated view of your service topology and its configuration. Consul can monitor and change services topology based on health of individual nodes.

Consul provides scalable distributed health checks. Consul only does minimal datacenter to datacenter communication so each datacenter has its own Consul cluster. Consul provides a domain model for managing topology of datacenters, server nodes, and services running on server nodes along with their configuration and current health status.

Consul is like combining the features of a DNS server plus Consistent Key/Value Store like etcd plus features of ZooKeeper for service discovery, and health monitoring like Nagios but all rolled up into a consistent system. Essentially, Consul is all the bits you need to have a coherent domain service model available to provide service discovery, health and replicated config, service topology and health status. Consul also provides a nice REST interface and Web UI to see your service topology and distributed service config. Consul organizes your services in a Catalog called the Service Catalog and then provides a DNS and REST/HTTP/JSON interface to it.

To use Consul you start up an agent process. The Consul agent process is a long running daemon on every member of Consul cluster. The agent process can be run in server mode or client mode. Consul agent clients would run on every physical server or OS virtual machine (if that makes more sense). Client runs on server hosting services. The clients use gossip and RPC calls to stay in sync with Consul.

A client, consul agent running in client mode, forwards request to a server, consul agent running in server mode. Clients are mostly stateless. The client does LAN gossip to the server nodes to communicate changes. A server, consul agent running in server mode, is like a client agent but with more tasks. The consul servers use the RAFT quorum mechanism to see who is the leader. The consul servers maintain cluster state like the Service Catalog. The leader manages a consistent view of config key/value pairs, and service health and topology. Consul servers also handle WAN gossip to other datacenters. Consul server nodes forwards queries to leader, and forward queries to other datacenters.

A Datacenter is fairly obvious. It is anything that allows for fast communication between nodes, with as few or no hops, little or no routing, and in short: high speed communication. This could be an Amazon EC2 availability zone, a networking environment like a subnet, or any private, low latency, high bandwidth network. You can decide exactly what datacenter means for your organization.

Consul server nodes use consensus to determine who is the leader. They have agreement on who is the leader. They use a transactional finite state machine (FSM) to make sure all server nodes are in lock step with critical tasks. They also employ a replicated log, FSM, peer set (who gets the replicated log), quorum (majority of peers agree), committed entry, and a leader. The end result is consistent view of configuration and live services and their topology.

Consul is built on top of Serf. Serf is a full gossip protocol and provides membership, failure detection, and event broadcast mechanisms. Serf provides the clustering for the server nodes. Consul uses LAN gossip for client and servers in the same local network or datacenter. Consul uses WAN Gossip but it is only for servers.

With Consul you run 3 to five servers per datacenter. You use WAN gossip to communicate over Internet or wide area networks. Consul also provides an RPC - Remote Procedure Call which is a request / response mechanism allowing a client to make a request of a server to for example join a cluster, leave a cluster, etc.

Consul client agents maintain its set of service, check registrations and health information. Clients also update health checks and local state. Local agent state (agent client node) is different than catalog state (agent server node). Local Agent notifies Catalog of changes. Updates are synced right away from client agent to the services Catalog which is stored in the triad of servers.

The local agents (agents running in client mode) check to make sure its view of the world matches the catalog, and if not the catalog is updated. An Example would be an client registers a new service check with a client agent, then the client agent notifies the consul server to update catalog that this service check exists. When a check is removed from the local client agent, it is removed from catalog which exists in the cluster of consul servers. If the agents run health checks, and status changes, then update is sent to catalog. Agent is the authority of that node (client and server), and the services that exist on that node. The Agent scope is the Server or Virtual Machine running the agent. The Catalog scope is a single datacenter or local area network or EC2 availability zone. Changes are also synchronized when a change is made and periodically every minute to ten minutes depending on the size of the cluster.

Consul provides the following REST endpoints for interacting with Consul:

  • kv - key value
  • agent - api for dealing with agent
  • catalog - dealing with datacenter catalog
  • health - show health checks
  • sessions - group operations and manage consistent view
  • events - fire fast gossip based events
  • acl - setup access control lists and security for Consul
  • status - check status

Each endpoint provides operations which either take JSON, request params and deliver JSON responses. In addition to programmatically configuring Consul via REST, you can use JSON config files stored locally to bootstrap config and service topology.

Consul provides a series of health checks. The most common is called the Dead man switch or time to live health check, the service has to dial in with status update every X time period. The other health check is the HTTP ping every N period of time where HTTP code 200-299 means PASS, HTTP code 429 is WARN and any other HTTP status code is FAIL. Lastly you can run a script every with the client agent (or server agent). If the process returns 0, then the check passes, 1 and it is considered a warning, any other process return value is considered a FAIL.

##Setting up a local consul cluster for testing.

You can test consul with one node. To test out the consul RAFT algorithm, we will set up four agents. Three will be servers and one will be a client.

A consul agent runs in server mode or client mode. Server agents maintain the state of the cluster. You want to have three to 5 server agents per datacenter. It should be an odd number to facilitate leader selection. Client agents are used on the servers whose services you want to monitor and report back to the consul servers.

The goal is to make sure you have a complete handle on how to recover and the difference between ctrl-c (graceful shutdown) and kill -9 (I have fallen and can't get up), and when you need to bootstrap and when you do not.

To reduce the amount of command line arguments, I will use a config file to start up the servers.

Here is the server1 config.

####Server 1 configuration server1.json

{
  "datacenter": "dc1",
  "data_dir": "/opt/consul1",
  "log_level": "INFO",
  "node_name": "server1",
  "server": true,
  "bootstrap" : true,
  "ports" : {

    "dns" : -1,
    "http" : 9500,
    "rpc" : 9400,
    "serf_lan" : 9301,
    "serf_wan" : 9302,
    "server" : 9300
  }
}

When you are starting up a new server cluster you typically put one of the servers in bootstrap mode. This tells consul that this server is allowed to elect itself as leader. It is like when you ask Dick Chenney to be on the VP selection committee and he nominates himself. He was in bootstrap mode.

Consul servers are boring if they have no one to talk to so let's add two more server config files to the mix.

Server 2 config server2.json

{
  "datacenter": "dc1",
  "data_dir": "/opt/consul2",
  "log_level": "INFO",
  "node_name": "server2",
  "server": true,
  "ports" : {

    "dns" : -1,
    "http" : 10500,
    "rpc" : 10400,
    "serf_lan" : 10301,
    "serf_wan" : 10302,
    "server" : 10300
  },
  "start_join" : ["127.0.0.1:9301", "127.0.0.1:11301"]
}

Notice that server 2 points to server 1 and three in the start_join config option.

Server 3 config server3.json

{
  "datacenter": "dc1",
  "data_dir": "/opt/consul3",
  "log_level": "INFO",
  "node_name": "server3",
  "server": true,
  "ports" : {

    "dns" : -1,
    "http" : 11500,
    "rpc" : 11400,
    "serf_lan" : 11301,
    "serf_wan" : 11302,
    "server" : 11300
  },
  "start_join" : ["127.0.0.1:9301", "127.0.0.1:10301"]
}

And server 3 points to server 1 and 2. We are assigning different ports so we can run them all on one box. For obvious reasons, you would not need to assign ports if you were not running them all on the same box. You would never run servers on the same box. It would defeat the purpose of the replication for reliability.

To startup the three servers use the following command lines.

Server 1 start script server1.sh

consul agent -config-file=server1.json  -ui-dir=/opt/consul/web

We installed the web consul files for the UI in /opt/consul/web. You can download the UI from Consul UI.

Server 2 start script server2.sh

consul agent -config-file=server2.json

Server 3 start script server3.sh

consul agent -config-file=server3.json

Go ahead and start up the servers.

You should have the following files.

$ tree
.
├── server1.json
├── server1.sh
├── server2.json
├── server2.sh
├── server3.json
└── server3.sh

Run chmod +x on the .sh files so you can run them. Then run them.

The log for server 1 should look like this:

Server 1 startup

$ ./server1.sh 
==> WARNING: Bootstrap mode enabled! Do not enable unless necessary
==> WARNING: It is highly recommended to set GOMAXPROCS higher than 1
==> Starting Consul agent...
==> Starting Consul agent RPC...
==> Consul agent running!
         Node name: 'server1'
        Datacenter: 'dc1'
            Server: true (bootstrap: true)
       Client Addr: 127.0.0.1 (HTTP: 9500, HTTPS: -1, DNS: -1, RPC: 9400)
      Cluster Addr: 10.0.0.162 (LAN: 9301, WAN: 9302)
    Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
             Atlas: <disabled>

==> Log data will now stream in as it occurs:

    2015/04/14 10:43:04 [INFO] raft: Node at 10.0.0.162:9300 [Follower] entering Follower state
    2015/04/14 10:43:04 [INFO] serf: EventMemberJoin: server1 10.0.0.162
    2015/04/14 10:43:04 [INFO] serf: EventMemberJoin: server1.dc1 10.0.0.162
    2015/04/14 10:43:04 [ERR] agent: failed to sync remote state: No cluster leader
    2015/04/14 10:43:04 [INFO] consul: adding server server1 (Addr: 10.0.0.162:9300) (DC: dc1)
    2015/04/14 10:43:04 [INFO] consul: adding server server1.dc1 (Addr: 10.0.0.162:9300) (DC: dc1)
    2015/04/14 10:43:05 [WARN] raft: Heartbeat timeout reached, starting election
    2015/04/14 10:43:05 [INFO] raft: Node at 10.0.0.162:9300 [Candidate] entering Candidate state
    2015/04/14 10:43:05 [INFO] raft: Election won. Tally: 1
    2015/04/14 10:43:05 [INFO] raft: Node at 10.0.0.162:9300 [Leader] entering Leader state
    2015/04/14 10:43:05 [INFO] consul: cluster leadership acquired
    2015/04/14 10:43:05 [INFO] consul: New leader elected: server1
    2015/04/14 10:43:05 [INFO] raft: Disabling EnableSingleNode (bootstrap)
    2015/04/14 10:43:05 [INFO] consul: member 'server1' joined, marking health alive
    2015/04/14 10:43:07 [INFO] agent: Synced service 'consul'

It is warning us that we should not start server 1 in bootstrap mode unless we know what we are doing. Since we are just learning consul, let's leave it as such.

Then it does a Dick Chenney, it says it could not find a leader so it makes itself leader.

Now we start up server 2 in another terminal window.

Server 1 terminal output after starting up server 2

 2015/04/14 10:45:56 [INFO] serf: EventMemberJoin: server2 10.0.0.162
    2015/04/14 10:45:56 [INFO] consul: adding server server2 (Addr: 10.0.0.162:10300) (DC: dc1)
    2015/04/14 10:45:56 [INFO] raft: Added peer 10.0.0.162:10300, starting replication
    2015/04/14 10:45:56 [WARN] raft: AppendEntries to 10.0.0.162:10300 rejected, sending older logs (next: 1)
    2015/04/14 10:45:56 [INFO] raft: pipelining replication to peer 10.0.0.162:10300
    2015/04/14 10:45:56 [INFO] consul: member 'server2' joined, marking health alive

It sees server 2 and marks it as alive. Whoot!

Going back to server 2's output we get

Server 2 output on startup

$ ./server2.sh 
==> WARNING: It is highly recommended to set GOMAXPROCS higher than 1
==> Starting Consul agent...
==> Starting Consul agent RPC...
==> Joining cluster...
    Join completed. Synced with 1 initial agents
==> Consul agent running!
         Node name: 'server2'
        Datacenter: 'dc1'
            Server: true (bootstrap: false)
       Client Addr: 127.0.0.1 (HTTP: 10500, HTTPS: -1, DNS: -1, RPC: 10400)
      Cluster Addr: 10.0.0.162 (LAN: 10301, WAN: 10302)
    Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
             Atlas: <disabled>

==> Log data will now stream in as it occurs:

    2015/04/14 10:45:56 [INFO] raft: Node at 10.0.0.162:10300 [Follower] entering Follower state
    2015/04/14 10:45:56 [INFO] serf: EventMemberJoin: server2 10.0.0.162
    2015/04/14 10:45:56 [INFO] serf: EventMemberJoin: server2.dc1 10.0.0.162
    2015/04/14 10:45:56 [INFO] agent: (LAN) joining: [127.0.0.1:9301 127.0.0.1:11301]
    2015/04/14 10:45:56 [INFO] consul: adding server server2 (Addr: 10.0.0.162:10300) (DC: dc1)
    2015/04/14 10:45:56 [INFO] consul: adding server server2.dc1 (Addr: 10.0.0.162:10300) (DC: dc1)
    2015/04/14 10:45:56 [INFO] serf: EventMemberJoin: server1 10.0.0.162
    2015/04/14 10:45:56 [INFO] consul: adding server server1 (Addr: 10.0.0.162:9300) (DC: dc1)
    2015/04/14 10:45:56 [INFO] agent: (LAN) joined: 1 Err: <nil>
    2015/04/14 10:45:56 [ERR] agent: failed to sync remote state: No cluster leader
    2015/04/14 10:45:56 [WARN] raft: Failed to get previous log: 6 log not found (last: 0)
    2015/04/14 10:46:20 [INFO] agent: Synced service 'consul'

It is a bit cryptic. We get some error messages about there being no leader. Then it says it synced. Whoot!

Server 3 startup is a little more smooth because the other two servers are alive!

$ ./server3.sh 
==> WARNING: It is highly recommended to set GOMAXPROCS higher than 1
==> Starting Consul agent...
==> Starting Consul agent RPC...
==> Joining cluster...
    Join completed. Synced with 2 initial agents
==> Consul agent running!
         Node name: 'server3'
        Datacenter: 'dc1'
            Server: true (bootstrap: false)
       Client Addr: 127.0.0.1 (HTTP: 11500, HTTPS: -1, DNS: -1, RPC: 11400)
      Cluster Addr: 10.0.0.162 (LAN: 11301, WAN: 11302)
    Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
             Atlas: <disabled>

==> Log data will now stream in as it occurs:

    2015/04/14 10:48:45 [INFO] serf: EventMemberJoin: server3 10.0.0.162
    2015/04/14 10:48:45 [INFO] raft: Node at 10.0.0.162:11300 [Follower] entering Follower state
    2015/04/14 10:48:45 [INFO] consul: adding server server3 (Addr: 10.0.0.162:11300) (DC: dc1)
    2015/04/14 10:48:45 [INFO] serf: EventMemberJoin: server3.dc1 10.0.0.162
    2015/04/14 10:48:45 [INFO] agent: (LAN) joining: [127.0.0.1:9301 127.0.0.1:10301]
    2015/04/14 10:48:45 [INFO] consul: adding server server3.dc1 (Addr: 10.0.0.162:11300) (DC: dc1)
    2015/04/14 10:48:45 [INFO] serf: EventMemberJoin: server1 10.0.0.162
    2015/04/14 10:48:45 [INFO] serf: EventMemberJoin: server2 10.0.0.162
    2015/04/14 10:48:45 [INFO] consul: adding server server1 (Addr: 10.0.0.162:9300) (DC: dc1)
    2015/04/14 10:48:45 [INFO] consul: adding server server2 (Addr: 10.0.0.162:10300) (DC: dc1)
    2015/04/14 10:48:45 [INFO] agent: (LAN) joined: 2 Err: <nil>
    2015/04/14 10:48:45 [ERR] agent: failed to sync remote state: No cluster leader
    2015/04/14 10:48:45 [WARN] raft: Failed to get previous log: 12 log not found (last: 0)
     2015/04/14 10:49:07 [INFO] agent: Synced service 'consul'
    ```
    Now are three servers are up and their state is synced.
    Now in the server 1 log we have two messages about being in-sync.
    
    ```bash
     2015/04/14 10:45:56 [INFO] serf: EventMemberJoin: server2 10.0.0.162
    2015/04/14 10:45:56 [INFO] consul: adding server server2 (Addr: 10.0.0.162:10300) (DC: dc1)
    2015/04/14 10:45:56 [INFO] raft: Added peer 10.0.0.162:10300, starting replication
    2015/04/14 10:45:56 [WARN] raft: AppendEntries to 10.0.0.162:10300 rejected, sending older logs (next: 1)
    2015/04/14 10:45:56 [INFO] raft: pipelining replication to peer 10.0.0.162:10300
    2015/04/14 10:45:56 [INFO] consul: member 'server2' joined, marking health alive
    2015/04/14 10:48:45 [INFO] serf: EventMemberJoin: server3 10.0.0.162
    2015/04/14 10:48:45 [INFO] consul: adding server server3 (Addr: 10.0.0.162:11300) (DC: dc1)
    2015/04/14 10:48:45 [INFO] raft: Added peer 10.0.0.162:11300, starting replication
    2015/04/14 10:48:45 [INFO] consul: member 'server3' joined, marking health alive
    2015/04/14 10:48:45 [WARN] raft: AppendEntries to 10.0.0.162:11300 rejected, sending older logs (next: 1)
    2015/04/14 10:48:45 [INFO] raft: pipelining replication to peer 10.0.0.162:11300
    ```
    All is well. We have three healthy nodes.

If we go to server2 terminal and shut it down with control-C, it shuts down gracefully.

Now go to server 1 output and watch. It keeps trying to reconnect to server 2.

Now start up server 2 again. Notice how it reconnects and the look at the logs for server1 and server3.

Now do the same with server3. Shut it down. Watch the logs of server 1 and server 2. 

When you see this log:

#### Server 3 log on reconnect
```bash

2015/04/14 11:01:53 [WARN] raft: Failed to get previous log: 23 log not found (last: 21)

It means it could not find a log so it has to advance and replicate data after index 21. Consul keeps a version number of the data called an index and as servers come online, they look at their index and then ask for stuff that happened after that index so they can sync changes.

    2015/04/14 11:01:53 [INFO] raft: Removed ourself, transitioning to follower
    2015/04/14 11:02:09 [INFO] agent: Synced service 'consul' 

You can shutdown any two servers, and then bring them back up. The leadership will change to the server that stayed up. Do not shut down all three. Then it gets a bit trickier to get them bootstrapped again.

Each server maintains its own state. Look at the files in each servers data folder.

$ pwd
/opt/consul1

$ tree
.
├── checkpoint-signature
├── raft
│   ├── mdb
│   │   ├── data.mdb
│   │   └── lock.mdb
│   ├── peers.json
│   └── snapshots
├── serf
│   ├── local.snapshot
│   └── remote.snapshot
└── tmp
    └── state506459124
        ├── data.mdb
        └── lock.mdb

6 directories, 8 files

Look at the snapshot files and the contents of the json peers.json.

Now shut down all three with ctrl-C. Now start them up again. They will not be able to select a leader because ctrl-C allows them to leave gracefully. Do I understand this? No. But I know the solution. Because we left with ctrl-c it was a graceful leave and now have to delete the peers so it could reconnect which almost makes sense but not quite but I don't care because it works. We just delete the peers.json file.

I went ahead and changed the configuration file for them to be able to be restarted.

server1.json no bootstrap retry sever list

{
  "datacenter": "dc1",
  "data_dir": "/opt/consul1",
  "log_level": "INFO",
  "node_name": "server1",
  "server": true,
  "ports" : {

    "dns" : -1,
    "http" : 9500,
    "rpc" : 9400,
    "serf_lan" : 9301,
    "serf_wan" : 9302,
    "server" : 9300
  },

  "retry_join" : [
    "127.0.0.1:9301",
    "127.0.0.1:10301",
    "127.0.0.1:11301"]

}

server2.json retry sever list

{
  "datacenter": "dc1",
  "data_dir": "/opt/consul2",
  "log_level": "INFO",
  "node_name": "server2",
  "server": true,
  "ports" : {

    "dns" : -1,
    "http" : 10500,
    "rpc" : 10400,
    "serf_lan" : 10301,
    "serf_wan" : 10302,
    "server" : 10300
  },


  "retry_join" : [
    "127.0.0.1:9301",
    "127.0.0.1:10301",
    "127.0.0.1:11301"
  ]
}

server3.json retry sever list

{
  "datacenter": "dc1",
  "data_dir": "/opt/consul3",
  "log_level": "INFO",
  "node_name": "server3",
  "server": true,
  "ports" : {

    "dns" : -1,
    "http" : 11500,
    "rpc" : 11400,
    "serf_lan" : 11301,
    "serf_wan" : 11302,
    "server" : 11300
  },
  "retry_join" : [
    "127.0.0.1:9301",
    "127.0.0.1:10301",
    "127.0.0.1:11301"
  ]

}

In order to startup clean, you have to start one of the servers in bootstrap mode like we had server1.json setup before and then delete all of the peers.json files.

Starting up all three servers after a graceful shutdown by deleting peer files

$ pwd
/opt

$ find . -name "peers.json" 
./consul1/raft/peers.json
./consul2/raft/peers.json
./consul3/raft/peers.json

$ find . -name "peers.json" | xargs rm

Now restart them.

As long as one of the servers does not startup in bootstrap mode, you will get this all day long.

Unable to elect a leader

    2015/04/14 13:02:31 [ERR] agent: failed to sync remote state: rpc error: No cluster leader
    2015/04/14 13:02:32 [ERR] agent: failed to sync remote state: rpc error: No cluster leader

I have another set of bootstrap json files for each server with a startup script. You need to delete the peer file and then start up one of the servers in bootstrap mode using these scripts. See the discussion at issue 526, and then re-read outage recovery.

bootstrap json file server1.json

{
  "datacenter": "dc1",
  "data_dir": "/opt/consul1",
  "log_level": "INFO",
  "node_name": "server1",
  "server": true,
  "bootstrap": true,
  "ports" : {

    "dns" : -1,
    "http" : 9500,
    "rpc" : 9400,
    "serf_lan" : 9301,
    "serf_wan" : 9302,
    "server" : 9300
  }
}

bootstrap json file server2.json

{
  "datacenter": "dc1",
  "data_dir": "/opt/consul2",
  "log_level": "INFO",
  "node_name": "server2",
  "server": true,
  "bootstrap": true,
  "ports" : {

    "dns" : -1,
    "http" : 10500,
    "rpc" : 10400,
    "serf_lan" : 10301,
    "serf_wan" : 10302,
    "server" : 10300
  }
}

bootstrap json file server3.json

{
 
{
  "datacenter": "dc1",
  "data_dir": "/opt/consul3",
  "log_level": "INFO",
  "node_name": "server3",
  "server": true,
  "bootstrap": true,
  "ports" : {

    "dns" : -1,
    "http" : 11500,
    "rpc" : 11400,
    "serf_lan" : 11301,
    "serf_wan" : 11302,
    "server" : 11300
  }
}

Server 1 bootstrap starter

$ cat server1boot.sh
export GOMAXPROCS=10
consul agent -config-file=server1boot.json  \ 
-retry-interval=3s  \
-ui-dir=/opt/consul/web

There is a startup script per server.

Remember to delete the peers.json file.

Deleting peers

$ pwd
/opt

$ find . -name "peers.json" | xargs rm

Now you are set. Pick any server, run it in bootstrap mode.

Running server 3 in bootstrap mode

$ ./server3boot.sh 
==> WARNING: Bootstrap mode enabled! Do not enable unless necessary
==> Starting Consul agent...
==> Starting Consul agent RPC...
==> Consul agent running!
         Node name: 'server3'
        Datacenter: 'dc1'
            Server: true (bootstrap: true)
       Client Addr: 127.0.0.1 (HTTP: 11500, HTTPS: -1, DNS: -1, RPC: 11400)
      Cluster Addr: 10.0.0.162 (LAN: 11301, WAN: 11302)
    Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
             Atlas: <disabled>

....
$ ./server2.sh 
==> Starting Consul agent...
==> Starting Consul agent RPC...
==> Consul agent running!
         Node name: 'server2'
        Datacenter: 'dc1'
            Server: true (bootstrap: false)
       Client Addr: 127.0.0.1 (HTTP: 10500, HTTPS: -1, DNS: -1, RPC: 10400)
      Cluster Addr: 10.0.0.162 (LAN: 10301, WAN: 10302)
    Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
             Atlas: <disabled>

...
$ ./server1.sh 
==> Starting Consul agent...
==> Starting Consul agent RPC...
==> Consul agent running!
         Node name: 'server1'
        Datacenter: 'dc1'
            Server: true (bootstrap: false)
       Client Addr: 127.0.0.1 (HTTP: 9500, HTTPS: -1, DNS: -1, RPC: 9400)
      Cluster Addr: 10.0.0.162 (LAN: 9301, WAN: 9302)
    Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
             Atlas: <disabled>

To force a failure, pick a server process and kill -9 it.

Also if you kill all of the servers at once with kill -9 as in:

Killing all of the servers

$ pkill -9 consul

You do not have to bootstrap any of them.

Thus you can run:

Starting up servers that died versus were shut down gracefully

./server1.sh 
./server2.sh 
./server3.sh 

##Setting up a client to test our local cluster

The Agent is the center of the Consul world. The agent must run on every node that is part of a Consul cluster. You have two types of agents: clients and servers. The agent servers are the information hub and they store data for the cluster. They also replicate the date to the other server nodes and to the client nodes. The client nodes are what sit on every service box. Agent client nodes are lightweight instances that sit on every server in the cluster and rely on the agent servers for most of their state.

To start up consul in client mode we will use the following config file.

Client consul mode config file client1.json

$ cat client1.json 

{
  "datacenter": "dc1",
  "data_dir": "/opt/consulclient",
  "log_level": "INFO",
  "node_name": "client1",
  "server": false,
  "ports" : {

    "dns" : -1,
    "http" : 8500,
    "rpc" : 8400,
    "serf_lan" : 8301,
    "serf_wan" : 8302,
    "server" : 8300
  },

  "start_join" : [
    "127.0.0.1:9301",
    "127.0.0.1:10301",
    "127.0.0.1:11301"
    ]
}

You will notice that we are not in server mode which puts us in client mode. The client is addressable on port 8XXX. We specified where to see the servers using the start_join key.

Client consul mode startup script client1.sh

consul agent -config-file=client1.json 

Now when we start up consul, we just specify the client1.json file.

Getting information about our node and cluster

Now we can get info about our cluster.

$ consul info

agent:
	check_monitors = 0
	check_ttls = 0
	checks = 0
	services = 0
build:
	prerelease = 
	revision = 0c7ca91c
	version = 0.5.0
consul:
	known_servers = 3
	server = false
runtime:
	arch = amd64
	cpu_count = 8
	goroutines = 34
	max_procs = 1
	os = darwin
	version = go1.4.2
serf_lan:
	encrypted = false
	event_queue = 0
	event_time = 6
	failed = 0
	intent_queue = 0
	left = 0
	member_time = 42
	members = 4
	query_queue = 0
	query_time = 1

We can list the members in this cluster:

Listing the members in our cluster

$ consul members
Node     Address           Status  Type    Build  Protocol
client1  10.0.0.162:8301   alive   client  0.5.0  2
server2  10.0.0.162:10301  alive   server  0.5.0  2
server1  10.0.0.162:9301   alive   server  0.5.0  2
server3  10.0.0.162:11301  alive   server  0.5.0  2

You can also use the HTTP interface to see what members are in the cluster.

Using curl to get list of members

$ curl http://localhost:8500/v1/agent/members

Output using /v1/agent/members http call

[
    {
        "Name": "server2",
        "Addr": "10.0.0.162",
        "Port": 10301,
        "Tags": {
            "build": "0.5.0:0c7ca91c",
            "dc": "dc1",
            "port": "10300",
            "role": "consul",
            "vsn": "2",
            "vsn_max": "2",
            "vsn_min": "1"
        },
        "Status": 1,
        "ProtocolMin": 1,
        "ProtocolMax": 2,
        "ProtocolCur": 2,
        "DelegateMin": 2,
        "DelegateMax": 4,
        "DelegateCur": 4
    },
    {
        "Name": "server1",
        "Addr": "10.0.0.162",
        "Port": 9301,
        "Tags": {
            "build": "0.5.0:0c7ca91c",
            "dc": "dc1",
            "port": "9300",
            "role": "consul",
            "vsn": "2",
            "vsn_max": "2",
            "vsn_min": "1"
        },
        "Status": 1,
        "ProtocolMin": 1,
        "ProtocolMax": 2,
        "ProtocolCur": 2,
        "DelegateMin": 2,
        "DelegateMax": 4,
        "DelegateCur": 4
    },
    {
        "Name": "server3",
        "Addr": "10.0.0.162",
        "Port": 11301,
        "Tags": {
            "build": "0.5.0:0c7ca91c",
            "dc": "dc1",
            "port": "11300",
            "role": "consul",
            "vsn": "2",
            "vsn_max": "2",
            "vsn_min": "1"
        },
        "Status": 1,
        "ProtocolMin": 1,
        "ProtocolMax": 2,
        "ProtocolCur": 2,
        "DelegateMin": 2,
        "DelegateMax": 4,
        "DelegateCur": 4
    },
    {
        "Name": "client1",
        "Addr": "10.0.0.162",
        "Port": 8301,
        "Tags": {
            "build": "0.5.0:0c7ca91c",
            "dc": "dc1",
            "role": "node",
            "vsn": "2",
            "vsn_max": "2",
            "vsn_min": "1"
        },
        "Status": 1,
        "ProtocolMin": 1,
        "ProtocolMax": 2,
        "ProtocolCur": 2,
        "DelegateMin": 2,
        "DelegateMax": 4,
        "DelegateCur": 4
    }
]

You can try out other agent HTTP calls by looking at the Agent HTTP API.

You can use the HTTP API from any client or server:

Using HTTP API

$ curl http://localhost:8500/v1/catalog/datacenters
["dc1"]

$ curl http://localhost:9500/v1/catalog/datacenters
["dc1"]

$ curl http://localhost:10500/v1/catalog/datacenters
["dc1"]

$ curl http://localhost:10500/v1/catalog/nodes
[{"Node":"client1","Address":"10.0.0.162"},
{"Node":"server1","Address":"10.0.0.162"},
{"Node":"server2","Address":"10.0.0.162"},
{"Node":"server3","Address":"10.0.0.162"}]

$ curl http://localhost:8500/v1/catalog/services
{"consul":[]}

Register a new service

Register a new service with bash

$ curl --upload-file register_service.json \ 
http://localhost:8500/v1/agent/service/register

register_service.json

{
  "ID": "myservice1",
  "Name": "myservice",
  "Address": "127.0.0.1",
  "Port": 8080,
  "Check": {
    "Interval": "10s",
    "TTL": "15s"
  }
}

The above registers a new service called myservice. Name is the name of the service while ID is a specific instance of that service. The Check we installed expects the service to check in with the servers every 15s.

Once you register the service, then you can see it from the agent as follows:

Seeing the service we just registered

$ curl http://localhost:8500/v1/agent/services

Seeing the service we just registered

{
    "myservice1": {
        "ID": "myservice1",
        "Service": "myservice",
        "Tags": null,
        "Address": "127.0.0.1",
        "Port": 8080
    }
}

To check this services health, we can use this endpoint.

$ curl http://localhost:8500/v1/health/service/myservice
[
    {
        "Node": {
            "Node": "client1",
            "Address": "10.0.0.162"
        },
        "Service": {
            "ID": "myservice1",
            "Service": "myservice",
            "Tags": null,
            "Address": "127.0.0.1",
            "Port": 8080
        },
        "Checks": [
            {
                "Node": "client1",
                "CheckID": "service:myservice1",
                "Name": "Service 'myservice' check",
                "Status": "critical",
                "Notes": "",
                "Output": "TTL expired",
                "ServiceID": "myservice1",
                "ServiceName": "myservice"
            },
            {
                "Node": "client1",
                "CheckID": "serfHealth",
                "Name": "Serf Health Status",
                "Status": "passing",
                "Notes": "",
                "Output": "Agent alive and reachable",
                "ServiceID": "",
                "ServiceName": ""
            }
        ]
    }
]

Here we can see that the health status is critical because the TTL failed.

To tell consul that our fictional service is passing, every 15 seconds we need to send it this:

Sending a TTL check

$ curl http://localhost:8500/v1/agent/check/pass/service:myservice1

Checking to see if our service is healthy

$ curl http://localhost:8500/v1/health/service/myservice

Checking to see if our service is healthy output

[
    {
        "Node": {
            "Node": "client1",
            "Address": "10.0.0.162"
        },
        "Service": {
            "ID": "myservice1",
            "Service": "myservice",
            "Tags": null,
            "Address": "127.0.0.1",
            "Port": 8080
        },
        "Checks": [
            {
                "Node": "client1",
                "CheckID": "service:myservice1",
                "Name": "Service 'myservice' check",
                "Status": "passing",
                "Notes": "",
                "Output": "",
                "ServiceID": "myservice1",
                "ServiceName": "myservice"
            },
            {
                "Node": "client1",
                "CheckID": "serfHealth",
                "Name": "Serf Health Status",
                "Status": "passing",
                "Notes": "",
                "Output": "Agent alive and reachable",
                "ServiceID": "",
                "ServiceName": ""
            }
        ]
    }
]

Notice that our status went form "Status": "critical" to "Status": "passing".

You can also mark a service as warn or critical using an HTTP call.

Marking status as warn or fail (critical)

$ curl http://localhost:8500/v1/agent/check/warn/service:myservice1

$ curl http://localhost:8500/v1/agent/check/fail/service:myservice1

Note that you can also query the health status from any node. All nodes in the cluster know where myservice services are running. All server nodes and all client nodes.

Query for health works from any agent node

$ curl http://localhost:9500/v1/health/service/myservice

Let's start up another client and install a service in it. We created another client startup script.

client2 startup

$ cat client2.json

{
  "datacenter": "dc1",
  "data_dir": "/opt/consulclient2",
  "log_level": "INFO",
  "node_name": "client2",
  "server": false,
  "ports" : {

    "dns" : -1,
    "http" : 7500,
    "rpc" : 7400,
    "serf_lan" : 7301,
    "serf_wan" : 7302,
    "server" : 7300
  },

  "start_join" : [
    "127.0.0.1:9301",
    "127.0.0.1:10301",
    "127.0.0.1:11301"
    ]
}

$ cat client2.sh
consul agent -config-file=client2.json  

Then we will create another register service json file.

register json file for myservice2

$ cat register_service2.json 
{
  "ID": "myservice2",
  "Name": "myservice",
  "Address": "127.0.0.1",
  "Port": 9090,
  "Check": {
    "Interval": "10s",
    "TTL": "15s"
  }
}

#### Running service registry against client2 agent
```bash
curl --upload-file register_service2.json \ 
http://localhost:7500/v1/agent/service/register

####Make both services healthy

curl http://localhost:8500/v1/agent/check/pass/service:myservice1
curl http://localhost:7500/v1/agent/check/pass/service:myservice2

####Query them

curl http://localhost:9500/v1/health/service/myservice

Output of querying both services

[
    {
        "Node": {
            "Node": "client2",
            "Address": "10.0.0.162"
        },
        "Service": {
            "ID": "myservice2",
            "Service": "myservice",
            "Tags": null,
            "Address": "127.0.0.1",
            "Port": 9090
        },
        "Checks": [
            {
                "Node": "client2",
                "CheckID": "service:myservice2",
                "Name": "Service 'myservice' check",
                "Status": "passing",
                "Notes": "",
                "Output": "",
                "ServiceID": "myservice2",
                "ServiceName": "myservice"
            },
            {
                "Node": "client2",
                "CheckID": "serfHealth",
                "Name": "Serf Health Status",
                "Status": "passing",
                "Notes": "",
                "Output": "Agent alive and reachable",
                "ServiceID": "",
                "ServiceName": ""
            }
        ]
    },
    {
        "Node": {
            "Node": "client1",
            "Address": "10.0.0.162"
        },
        "Service": {
            "ID": "myservice1",
            "Service": "myservice",
            "Tags": null,
            "Address": "127.0.0.1",
            "Port": 8080
        },
        "Checks": [
            {
                "Node": "client1",
                "CheckID": "service:myservice1",
                "Name": "Service 'myservice' check",
                "Status": "passing",
                "Notes": "",
                "Output": "",
                "ServiceID": "myservice1",
                "ServiceName": "myservice"
            },
            {
                "Node": "client1",
                "CheckID": "serfHealth",
                "Name": "Serf Health Status",
                "Status": "passing",
                "Notes": "",
                "Output": "Agent alive and reachable",
                "ServiceID": "",
                "ServiceName": ""
            }
        ]
    }
]

##Accessing Consul from Java

Consul has APIs for different language bindings for Python, Ruby, Go, etc. See Consul download page for a complete list of language bindings. There are two for Java Consul Client, and Consul API. QBit, the microservice lib for Java has one as well which is a fork of the consul client.

QBit adds a simplified interface to Consul demonstrated as follows:

Using the QBit ServiceDiscovery interface

package io.advantageous.examples;

import io.advantageous.boon.core.Sys;
import io.advantageous.consul.discovery.ConsulServiceDiscoveryBuilder;
import io.advantageous.qbit.service.discovery.ServiceDefinition;
import io.advantageous.qbit.service.discovery.ServiceDiscovery;

import java.util.List;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;


public class ServiceDiscoveryMain {


    public static void main(String... args) throws Exception {

        final ConsulServiceDiscoveryBuilder consulServiceDiscoveryBuilder =
                ConsulServiceDiscoveryBuilder.consulServiceDiscoveryBuilder();


        /* Attach to agent 2. */
        final ServiceDiscovery clientAgent2 = consulServiceDiscoveryBuilder.setConsulPort(7500).build();
        /* Attach to agent 1. */
        final ServiceDiscovery clientAgent1 = consulServiceDiscoveryBuilder.setConsulPort(8500).build();

        /* Start up service discovery. */
        clientAgent2.start();
        clientAgent1.start();

        /* Check in the services using their ids. */
        clientAgent1.checkInOk("myservice1");
        clientAgent2.checkInOk("myservice2");

        final ExecutorService executorService = Executors.newSingleThreadExecutor();

        executorService.submit(() -> {

            /* Let checks them in occasionally, and get a list of healthy nodes. */
            for (int index = 0; index < 100; index++) {
                System.out.println("Checking in....");

                /* Check in myservice1. */
                clientAgent1.checkInOk("myservice1");


                /* Check in myservice2. */
                clientAgent2.checkInOk("myservice2");
                Sys.sleep(100);


                /* Get a list of only the healthy nodes. */
                List<ServiceDefinition> serviceDefinitions = clientAgent1.loadServices("myservice");

                serviceDefinitions.forEach(serviceDefinition
                        -> System.out.println("FROM client agent 1: " + serviceDefinition));

                serviceDefinitions = clientAgent2.loadServices("myservice");

                serviceDefinitions.forEach(serviceDefinition
                        -> System.out.println("FROM client agent 2: " + serviceDefinition));

                Sys.sleep(10_000);

            }

            /* After 100 * 10 seconds stop. */
            clientAgent2.stop();
            clientAgent1.stop();
        });



    }

}

The above connects to client agent 2 clientAgent2 and client agent 1 clientAgent1. We then use the service discovery object to check in the services using their ids clientAgent1.checkInOk("myservice1"). Then every ten seconds we check the services in (we have 15 seconds before the TTL expires). Then for good measure we load the available services serviceDefinitions = clientAgent1.loadServices("myservice") and print them out to the console. This simplified interface makes it easy to register services with consul and do periodic check-in.

Here is the gradle build script for the above:

Gradle build script

apply plugin: 'java'
apply plugin:'application'

sourceCompatibility = 1.8
version = '1.0'
mainClassName = "io.advantageous.examples.Main"

repositories {
    mavenLocal()
    mavenCentral()
}

dependencies {
    testCompile group: 'junit', name: 'junit', version: '4.11'
    compile group: 'io.advantageous.qbit', name: 'qbit-vertx', version: '0.7.3-SNAPSHOT'
    compile group: 'io.advantageous.qbit', name: 'qbit-consul-client', version: '0.7.3-SNAPSHOT'
}

To learn more about the build script and QBit see QBit Restful Microservices.

Let's unregister those services with curl and then register them with our Java service discovery API.

Unregister the services

$ curl http://localhost:7500/v1/agent/service/deregister/myservice2
$ curl http://localhost:8500/v1/agent/service/deregister/myservice1

When you unregister a service, you also unregister its check.

Output from client2

    2015/04/14 17:09:05 [INFO] agent: Deregistered service 'myservice2'
    2015/04/14 17:09:05 [INFO] agent: Deregistered check 'service:myservice2'

Now we will register our services with the ServiceDiscovery API.

Registering services with the QBit Microservice Java lib for Consul

        /* Attach to agent 2. */
        final ServiceDiscovery clientAgent2 = consulServiceDiscoveryBuilder.setConsulPort(7500).build();
        
        /* Attach to agent 1. */
        final ServiceDiscovery clientAgent1 = consulServiceDiscoveryBuilder.setConsulPort(8500).build();



        /* Start up service discovery. */
        clientAgent2.start();
        clientAgent1.start();


        /* Register the services. */
        clientAgent1.registerWithIdAndTimeToLive(
            "myservice", "myservice1", 8080, 10);
        clientAgent2.registerWithIdAndTimeToLive(
                  "myservice", "myservice2", 9090, 10);
                  
     ....
     //The rest of the code as before. 

The method clientAgent2.registerWithIdAndTimeToLive(..) allows us to register the service and set its TTL in one swipe.

You can see the services are in fact registered and deemed healthy. Use the Consul API to see the myservice services by going to this url http://localhost:9500/ui/#/dc1/services/myservice.

What QBit does is it does a long poll to Consul to get the service info. And then it updates the services when they change. This way one can tolerate the failure of a microservice. The microservice call could fail if a supplier in not available, the client, which could be another upstream service has to respond to this as gracefully as possible. One could implement a Circuit Breaker but you will want to know when a downstream or service in a server pool goes down. This is where Consul and its health monitoring come into play.

Since microservices can fail whenever they want without notice, it's important to quickly detect the failures and restore the service, or talk to another service in that same peer group or provide some deprecated response as your automation kicks in to recover the failed downstream microservices. Consul aids in providing real-time health monitoring of Microservices . The TTL is just one of the ways.

Let's demonstrate this by splitting our Java example out into two main methods, one for each microservice.

MyService1Main.java and MyService2Main.java

public class MyService1Main {


    public static void main(String... args) throws Exception {

        final ConsulServiceDiscoveryBuilder consulServiceDiscoveryBuilder =
                ConsulServiceDiscoveryBuilder.consulServiceDiscoveryBuilder();


        /* Attach to agent 1. */
        final ServiceDiscovery clientAgent1 = consulServiceDiscoveryBuilder.setConsulPort(8500).build();



        /* Start up service discovery. */
        clientAgent1.start();


        /* Register the services. */
        clientAgent1.registerWithIdAndTimeToLive("myservice", "myservice1", 8080, 10);

        /* Check in the services using their ids. */
        clientAgent1.checkInOk("myservice1");


        final ExecutorService executorService = Executors.newSingleThreadExecutor();

        executorService.submit(() -> {

            /* Let checks them in occasionally, and get a list of healthy nodes. */
            for (int index = 0; index < 100; index++) {
                System.out.println("Checking in....");

                /* Check in myservice1. */
                clientAgent1.checkInOk("myservice1");


                Sys.sleep(100);


                /* Get a list of only the healthy nodes. */
                List<ServiceDefinition> serviceDefinitions = clientAgent1.loadServices("myservice");

                serviceDefinitions.forEach(serviceDefinition
                        -> System.out.println("FROM client agent 1: " + serviceDefinition));


                Sys.sleep(10_000);

            }

            /* After 100 * 10 seconds stop. */
            clientAgent1.stop();
        });

    }

...
...
...

public class MyService2Main {

    public static void main(String... args) throws Exception {

        final ConsulServiceDiscoveryBuilder consulServiceDiscoveryBuilder =
                ConsulServiceDiscoveryBuilder.consulServiceDiscoveryBuilder();


        /* Attach to agent 2. */
        final ServiceDiscovery clientAgent2 = consulServiceDiscoveryBuilder.setConsulPort(7500).build();



        /* Start up service discovery. */
        clientAgent2.start();


        /* Register the services. */
        clientAgent2.registerWithIdAndTimeToLive("myservice", "myservice2", 9090, 10);

        /* Check in the services using their ids. */
        clientAgent2.checkInOk("myservice2");


        final ExecutorService executorService = Executors.newSingleThreadExecutor();

        executorService.submit(() -> {

            /* Let checks them in occasionally, and get a list of healthy nodes. */
            for (int index = 0; index < 100; index++) {
                System.out.println("Checking in....");


                /* Check in myservice2. */
                clientAgent2.checkInOk("myservice2");
                Sys.sleep(100);


                /* Get a list of only the healthy nodes. */
                List<ServiceDefinition> serviceDefinitions = clientAgent2.loadServices("myservice");

                serviceDefinitions.forEach(serviceDefinition
                        -> System.out.println("FROM client agent 2: " + serviceDefinition));

                Sys.sleep(10_000);

            }

            /* After 100 * 10 seconds stop. */
            clientAgent2.stop();
        });



    }

Now run both, you shoulds see this in the output:

Output from Java applications


Checking in....
FROM client agent 1: ServiceDefinition{status=PASS, id='myservice1', name='myservice', host='10.0.0.162', port=8080}
Checking in....
FROM client agent 1: ServiceDefinition{status=PASS, id='myservice1', name='myservice', host='10.0.0.162', port=8080}
FROM client agent 1: ServiceDefinition{status=PASS, id='myservice2', name='myservice', host='10.0.0.162', port=9090}
Checking in....
FROM client agent 1: ServiceDefinition{status=PASS, id='myservice1', name='myservice', host='10.0.0.162', port=8080}
FROM client agent 1: ServiceDefinition{status=PASS, id='myservice2', name='myservice', host='10.0.0.162', port=9090}

...


Checking in....
FROM client agent 2: ServiceDefinition{status=PASS, id='myservice1', name='myservice', host='10.0.0.162', port=8080}
FROM client agent 2: ServiceDefinition{status=PASS, id='myservice2', name='myservice', host='10.0.0.162', port=9090}
Checking in....
FROM client agent 2: ServiceDefinition{status=PASS, id='myservice1', name='myservice', host='10.0.0.162', port=8080}
FROM client agent 2: ServiceDefinition{status=PASS, id='myservice2', name='myservice', host='10.0.0.162', port=9090}

Now kill the first process. You should see its service disappear. It happens becuse it was not able to check in with Consul.

Output after killing first process.

Checking in....
FROM client agent 2: ServiceDefinition{status=PASS, id='myservice2', name='myservice', host='10.0.0.162', port=9090}
Checking in....
FROM client agent 2: ServiceDefinition{status=PASS, id='myservice2', name='myservice', host='10.0.0.162', port=9090}

Let's leave off there for now. Next time we will talk about ServicePool and ServicePoolListener to listen for changes to services for a particular service pool.

#Conclusion We covered how to setup a cluster. We covered how to recover a cluster if for some reason it gets into a situation where RAFT cannot elect a new leader. We covered how cluster nodes recover. We covered the theory behind how Consul works. Then we covered working with the HTTP client. Then we covered using a Java API to register services and do health checks.

Next time we will cover ServicePools, and the ServicePoolListeners so we can easily be notified when a service was added or removed. We will also implement HTTP check and a script check, and show how to incorporate a ServiceDiscovery and a ServicePool into a QBit, the Java microservice lib for Microservice Architecture, service to detect when a service goes down.

Reactive Programming, Java Microservices, Rick Hightower

High-speed microservices consulting firm and authors of QBit with lots of experience with Vertx - Mammatus Technology

Highly recommended consulting and training firm who specializes in microservices architecture and mobile development that are already very familiar with QBit and Vertx as well as iOS and Android - About Objects

Java Microservices Architecture

[Microservice Service Discovery with Consul] (http://www.mammatustech.com/Microservice-Service-Discovery-with-Consul)

[Reactive Microservices] (http://www.mammatustech.com/reactive-microservices)

[High Speed Microservices] (http://www.mammatustech.com/high-speed-microservices)

Java Microservices Consulting

⚠️ **GitHub.com Fallback** ⚠️