Cluster Creation - woveon/wovtools GitHub Wiki

< Home

Woveon Logo

We use kops to create and manage the Kubernetes cluster, using the WovTools configuration management environment to set it up.

DNS Zone and Certificate Creation

Do this first since it takes time for the certificate to be created.

  • Create the DNS zone. In AWS, use Route53.
  • Create a certificate for the zone: AWS > Certificate Manager > Request a Certificate. Use wildcard in the certificate (ex *.example.com) so it covers all subdomains.

Secrets Configuration Add an entry to your secrets json:

{
"cluster" : {
"CLUSTERwov-aws-va-grape" : {
      "dns" : {
        "domain"     : "alywan.com",
        "hostedZone" : "Z1NR42SJ9ZADVC"
      },
      "rds" : {
        "subnet1" : "172.31.196.32/28",
        "subnet2" : "172.31.196.48/28"
      },
      "region" : "us-east-1",
      "zone" : {
        "primary"   : "c",
        "secondary" : "d"
      },
      "master" : {
        "count" : 1,
        "zones" : "us-east-1c",
        "size"  : "t2.medium"
      },
      "node" : {
        "description": "smallest config: single zone, 2 medium nodes",
        "count" : 2,
        "zones" : "us-east-1c",
        "size"  : "t2.medium"
      }
    }
    }

Create the Cluster The wov-cluster create CLUSTER script will use kops, and your configuration above, to create your cluster for you. This will take a 10+ minutes as it creates and validates the cluster.

wov-cluster create CLUSTERNAME

Configure Cluster

There are several changes to a cluster that make it much nicer to manage/user. The following script installs:

  • Helm and Tiller (across all RBAC roles)
  • Peer connection (and subnets) from Default VPC where databases reside. This enables databases to persist in the default VPC and link to the clusters.
wov-cluster config helm
wov-cluster config dbnet

Bastion

Reachability (SSH and Port Forwarding)

  • Check to see if your connection to bastion is enabled via port forwarding: wov-bastion-connection --check
  • Create an entry in ~/.ssh/config for your bastion. You will need entries to reach your database, and possibly your master and worker nodes. See the following example that opens local port 65440, for a connection to port 5432 (default Postgres port) on the database in AWS:
Host wov-aws-va-grape-bastion
  Hostname bastion.wov-aws-va-grape.alywan.com
  User admin
  IdentityFile ~/.ssh/wov/wov-aws-va-grape.pem
  ServerAliveInterval 180
  # 65440 -db, 65441 - master, 6544X - node X
  LocalForward 65440 wov-aws-va-grape-alywan-dev-db.cipwuxrfsmqo.us-east-1.rds.amazonaws.com:5432
  • Test
    • ssh login: ssh {CLUSTER}-bastion <-- from ~/.ssh/config file
    • psql into your database, by ssh tunneling through bastion ex. PGPASSWORD={PASSWORD} psql -p {LOCALPORT} -h localhost -U postgres -d woveon
  • ssh to the Kubernetes master node (through load balancer): ssh -i ~/.ssh/id_rsa [email protected]

Bastion Configuration and Lockdown

  • VPN Whitelist - A good idea at this point is to set the security group of your bastion host to only listen to your comany's VPN ips. For AWS, go to EC2 > Instance > Find your instance and select it's security group, then inbound rules. Since we use kops, and kops creates a load balancer for the bastion, change the load balancer's security group inbound rules.:
  • Tools and Upates - Connect to your bastion host by first enabling the ssh port forwarding and then connecting.
    • Update software: sudo apt-get update
    • dnsutils (nslookup): sudo apt-get install dnsutils
    • psql client: sudo apt-get install postgresql-client
    • Update Bastion for ssh keep alive
TCPKeepAlive no 
#ClientAliveInterval 30
#ClientAliveCountMax 240
- restart sshd service: `sudo systemctl restart sshd.service`

Ingress

Run wov-cluster-dns --ingress to create the ingress via helm and to create the AWS Route53 DNS entry for WOV_www_api_url to point to the load balancer that is created.

DNS Route

You need a CNAME dns rule to point to the ELB (the one created by stable/nginx-ingress) for the nodes. This should have been created with wov-cluster-dns --ingress but can be created/updated as needed with wov-cluster-dns --created-dns. This dns entry is for WWW_www_api_url (for api-STAGE.DOMAIN, ex. api-dev.woveon.com).

Troubleshooting

  • Use wov-env --health (or --health-no-tls | --health-tls to force http/https) to check known health paths (in secrets : .MICROSERVICE.healthpath)
  • Test system via curl:
    • curl https://api-STAGE.DNS/MICROSERVICE/VERSION/pub/health
  • How networking/DNS/Services/Pods should work
    • DNS entry (api-STAGE.DOMAIN) goes to load balancer (which was created by nginx-ingress).
    • nginx listens on that load balancer and sends to services
    • service goes to pods
      • services select pods because pods have a 'label' that service 'selects' for
      • kubectl get endpoints should show pod ips
      • kubectl get svc -o wide should show service ips
      • kubectl get pods -o wide should show pod ips
      • service targetPort is port of the Pod the service points to
  • Check your nginx access/error logs
    • kubectl logs -f ing-WOV_NS-nginx-ingress-controller-XXXX
  • If you are having troubles, run without https:
    • Restart your ingress: helm install stable/nginx-ingress --name ing-WOV_NS --set rbac.create=true (and restart DNS route)
    • You should be able to curl into your services: curl http://api-STAGE.DNS/MS/v2/pub/health
  • Fiddle with services
    • edit it directly: kubectl edit svc SERVICENAME
    • redeploy it wov-deploy --dev
    • edit template: wov-ed k8s/SERVICE.yaml.wov
    • review cluster's file: wov-ed -ccl k8s/SERVICE.yaml
  • Open a shell on your pod
    • kubectl exec -it PODNAME -- sh
    • install curl: apk add --update curl
    • check : curl localhost:80/route/to/test ex. curl localhost:80/MICROSERVICE/VERSION/pub/health
  • https://kubernetes-on-aws.readthedocs.io/en/latest/user-guide/ingress.html#ingress

Dashboard UI

  • Dashboard
    • start it: kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v1.10.1/src/deploy/recommended/kubernetes-dashboard.yaml
    • run it: kubectl proxy
    • access it: http://localhost:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/
    • use token from tiller (since we gave it full RBAC control): kubectl -n kube-system describe secret tiller-token-[TAB]

Todo - Things that I probably need to incorporate

additionalPolicies:
      node: |
        [
          {
            "Effect": "Allow",
            "Action": ["logs:CreateLogGroup", "logs:CreateLogStream", "logs:PutLogEvents"],
            "Resource": ["*"]
          }
        ]
      master: |
        [
         {
            "Effect": "Allow",
            "Action": ["logs:CreateLogGroup", "logs:CreateLogStream", "logs:PutLogEvents"],
            "Resource": ["*"]
          }
        ]
spec: 
...
docker:
logDriver: awslogs
logOpt:
- awslogs-region=eu-central-1
- awslogs-group=k8s
  • set the key so update can work (complains about AWS needing this) kops create secret --name wov-aws-va-dog.woveon.com sshpublickey admin -i ~/.ssh/wov/wov-aws-va-dog_pub

  • Then update cluster: kops update cluster CLUSTER

  • if it passes, kops update cluster wov-aws-va-dog.woveon.com --yes

  • (SKIPPED) Elastic Filesystem Server (EFS) install

    • NOTE: this is the way to do it... but I skipped this as it was failing too hard. Instead, I used node affinity to force the apihal node to the node in the zone with the volume.
    • Useful EFS documentation
    • More doc
    • Add a rule to the VPC's security group: NFS TCP 2049 10.0.0.0/16
      • NOTE: apparently EFS sits outside the security group of the VPC so the default rule does not work, so this allows all within the VPC's CIDR block in
    • VPC - dns resolution and hostname, make sure that is enabled.
    • can't mount in pods so create a test instance to check mounting
    • NOTE: should be in default VPC and peered in, but for time, just keeping inside

kops (old, use wov-cluster now to do this for you)

Created a cluster with kops. It's closer to kubectl and much more incremental. Also, much faster than kube-aws. Would like to use AWS's own tool, but it is $100 a month and a pain to use. I would like to switch to it in the future since it handles upgrades and such.

Create a cluster

  • Kops Directions for AWS

  • Create and setup the S3 bucket kops uses

    • aws s3api create-bucket --bucket kops.woveon.com --region us-east-1
    • aws s3api put-bucket-versioning --bucket kops.woveon.com --region us-east-1 --versioning-configuration Status=Enabled
  • determine some names and set up shell env vars

# example
export KOPS_STATE_STORE=s3://kops.woveon.com
export CLUSTER=wov-aws-va-dog    # ?is woveon.com needed on the end of this?
export ZONE=us-east-1
  • Create the VPC wov-aws-va-dog in aws

    • 10.0.0.0/16
    • edit (turn on) dns options (use actions button) for DNS resolution and DNS hostnames so you can mount NFS filesystems across peered connections.
  • Create a key for the cluster

    • create an RSA key to use for the machines in this cluster
      • I keep these keys in '~/.ssh/wov' since that is the 'application' part of wovtools
      • make private key
        • openssl genpkey -algorithm RSA -pkeyopt rsa_keygen_bits:2048 -out ~/.ssh/wov/{CLUSTER}.pem
      • make public key
        • openssl rsa -in {CLUSTER}.pem -outform PEM -pubout -out {CLUSTER}_pub.pem
    • Add to EC2 key pairs
      • Click 'Import'
      • Copy/Paste the ~/.ssh/wov/{CLUSTER}_pub.pem file data (NOT THE '-----' lines)
    • use the name, which should be for your cluster: ex. wov-aws-va-rabbit
  • Create the cluster with Kops

  kops create cluster \
    --ssh-public-key ~/.ssh/wov/wov-aws-va-dog_pub \
    --vpc vpc-0f4ca3ea63d5971a0 \
    --cloud aws \
    --name=wov-aws-va-dog \
    --master-count=1 \
    --master-zones us-east-1c \
    --zones us-east-1c,us-east-1d \
    --node-count 2 \
    --dns-zone woveon.com \
    --node-size "t2.medium" \
    --master-size "t2.medium" \
    --topology private



## Old
    --node-security-groups sg-03189dc0c42233172 \
    --networking weave
    --cloud-labels "Team=Dev,Owner=John Doe" \
  • Add bastion host and update
kops create instancegroup bastions --role Bastion --subnet utility-us-east-1c
kops update cluster wov-aws-va-dog.woveon.com         # check changes
kops update cluster wov-aws-va-dog.woveon.com --yes   # 'commit' if update is ok
kops rolling-update cluster                           # may be needed
  • add existing security group
    • first, create the security group with vpn access in the VPC
    • set the incoming rules to VPN
    • name it bastion-elbvpn.$NAME ex. bastion-elbvpn.wov-aws-va-dog.woveon.com
    • edit the cluster to use it kops edit cluster wov-aws-va-dog.woveon.com
loadBalancer:
      type: Public
#      additionalSecurityGroups:
#      - sg-<X>
  • Update bastion:
    • kops edit ig bastions --name wov-aws-va-dog.woveon.com
    • add SSH timeout to 60 minutes
    • Add bastion ELB security group to limit incoming SSH origins (VPN)
spec:
  ...
  topology:
    bastion:
      idleTimeoutSeconds: 1200
  ...
  loadBalancer:
    type: Public
    additionalSecurityGroups:
    - sg-<X>
  • Enable pings from bastion to nodes

    • on node, find security group (ex. nodes.wov-aws-va-dog.woveon.com) and add ALL ICMP traffic from the bastion security group ; for each node
  • useful commands

    • list nodes: kubectl get nodes --show-labels
  • Install Helm w/ RBAC

    • install helm and install a tiller user across all RBAC roles... because tiller RBAC is a pita.
    • NOTE: CLUSTERPROJECT is wcd clusters
brew upgrade kubernetes-helm
kubectl apply -f ${CLUSTERPROJECT}/tiller.yaml 
helm init --service-account tiller --upgrade
TCPKeepAlive no 
ClientAliveInterval 30
ClientAliveCountMax 240
  • restart sshd service

    • sudo systemctl restart sshd.service
  • Config Cluster Context

    • add the cluster to your Kubectl configuration
    • NOTE: kops does some of this. just verify that it did so correctly
  • How to connect to cluster?

    • peer the default VPC to the Kubernetes cluster's VPC

      • this enables them to connect, since they are in different VPCs and you want them to be private by default
      • also, this means the databases have a life outside the life of the cluster which is what you want
      • ensure the "Accepter" can resolve DNS (so select Peer connection and drop-down select edit dns)
    • ok, this is tricky

    • in VPC, select 'Peering Connections' on sidebar and press 'Create Peering Connection' up top

      • named : peer-wov-aws-va-rds-dog
      • local VPC (set to default VPC)
      • another VPC (set to cluster VPC): my account, this region, select VPC (accepter)
      • create
      • select the 'Actions' button to 'Accept' the peer connection (i.e. you to you so you are the one that accepts)
    • Connect subnets of requester VPC to the peering connection via their route tables

      • find every subnet in the database subnet group
        • find every route table for these subnet groups (if you're lucky, they use the same route table)
        • add a route to each subnet in the cluster (ex. utility-us-east-1[cd], us-east-1[cd]) to peer connect (pcx-X)
        • ex. 10.0.1.0/24, 10.0.2.0/24, 10.0.3.0/24, 10.0.4.0/24 were the four CIDR blocks of those four subnets
    • Connect subnets of cluster VPC to the peering connection via their route tables

      • find subnets of Cluster VPC and (ex. four of them, C/D x public/private)
        • find every route table for these
        • add a route to each subnet in the default cluster (ex. private-subnet-for-rds-[cd]).
        • ex. add 172.31.196.[0|16]/28 to route table of utility-us-east-1[cd] and us-east-1[cd]
    • ssh into your bastion and ping your RDS instance

  • VPN Only access - limit what can touch your ELB connection to your bastion

    • EC2 -> Load Balancers -> Select your Bastion ELB
    • find the security group (ex. bastion-elb.wov-aws-va-dog.woveon.com)
    • add inbound rules to SSH
ex. these limit the IPs to NordVPN servers of the given numbers. rotate these over time.
104.222.154.0/27  | NordVPN #1960-1965 .13-18
185.244.215.9/32  | NordVPN #1710
185.245.86.107/32 | NordVPN #2452