One of the most powerful features of Enterprise PKS is its capability to be desired state management for Kubernetes clusters. This capability is provided in part by BOSH. A simple node failure like a kubelet agent or power issue can be automatically recovered by the PKS system. You can simulate this recovery by powering off a worker node in vSphere. I wanted to push the limits of the PKS system by manually deleting a worker node and see what happens. I have to provide a caution before I begin:
Caution: DON’T MANUALLY DELETE ANY NODES MANAGED BY PKS. DELETING THE MASTER NODES MAY RESULT IN DATA LOSS.
Enterprise PKS automatically removes worker nodes that have failed as part of its desired state management. Enterprise PKS is a full platform management suite for Kubernetes based workloads. Operators should not manually modify Kubernetes constructs inside vSphere. While testing the desired state management capabilities of Enterprise PKS we ran into a slight problem if you manually delete a worker node. Manually deleting a worker node creates a situation where Enterprise PKS is unable to recover without manual intervention.
Start out with a healthy three node cluster:
root@cli-vm:~/PKS-Lab# kubectl get nodes NAME STATUS ROLES AGE VERSION 55b8512f-7469-4562-90c1-e4f133cd333a Ready <none> 19m v1.12.4 9c8f3f5c-c9d8-478d-9784-a13b3a128dbe Ready <none> 11m v1.12.4 c14736b9-2b54-484c-b783-a79453e28804 Ready <none> 166m v1.12.4
Locating a worker node, we powered it off and delete it after confirming twice that we want to take this action against BOSH. Inside Kubernetes there is a problem:
root@cli-vm:~/PKS-Lab# kubectl get nodes NAME STATUS ROLES AGE VERSION 55b8512f-7469-4562-90c1-e4f133cd333a Ready <none> 21m v1.12.4 9c8f3f5c-c9d8-478d-9784-a13b3a128dbe Ready <none> 13m v1.12.4
We are missing a node. Normally we would expect a replacement node to be deployed by BOSH after the five-minute timeout. In this case BOSH will not recreate the node no matter how long you wait. The failure to automatically resolve the situation is caused because each worker node has a persistent volume attached. When BOSH replaced a powered off worker node it detaches the persistent storage volume before deleting the virtual machine. The detached volume is then mounted to the new node. The persistent volume is not required for Kubernetes worker nodes but more an artifact of how BOSH operates. BOSH will not recreate the deleted node because it is concerned about data loss on persistent volume. You can safely manually deploy a new worker node using BOSH commands. If you remove storage from the powered off worker before you delete it BOSH will automatically deploy a new worker node.
Process to manually deploy a deleted persistent volume
Since BOSH is responsible for the desired state management of the cluster you use BOSH command to recreate the deleted volume and node.
Gather the bosh Uaa Admin User Credentials
- Login to Opsman via the web console
- Click on the BOSH tile
- Click on credentials tab
- Locate the Uaa Admin User Credentials
- Click on get credentials
- Cut and paste the password section in my case it’s HYmb4WAuvnWuGLzmAFoSTlrSv4_Qj4Vk
Resolve using the Opsman virtual machine and BOSH commands
- Use ssh to login to Opsman virtual machine as the user ubuntu
- Create a new alias for the environment using the following command on a single line(replace the ip address with the ip address or DNS name for your PKS server)
bosh alias-env pks -e 172.31.0.2 --ca-cert /var/tempest/workspaces/default/root_ca_certificate Using environment '172.31.0.2' as anonymous user Name p-bosh UUID ee537142-1370-4fee-a6c2-741c0cf66fdf Version 268.2.1 (00000000) CPI vsphere_cpi Features compiled_package_cache: disabled config_server: enabled local_dns: enabled power_dns: disabled snapshots: disabled User (not logged in) Succeeded
- Use BOSH and the alias to login to the PKS environment using the Username: admin Password: Uaa Admin User Credentials
bosh -e pks login Email (): admin Password (): Successfully authenticated with UAA Succeeded Use BOSH commands to locate current deployments: bosh -e pks deployments
- Identify your failed deployment using the deployments command (you need the service name)
ubuntu@opsman-corp-local:~$ bosh -e pks deployments Using environment '172.31.0.2' as user 'admin' (bosh.*.read, openid, bosh.*.admin, bosh.read, bosh.admin) Name Release(s) Stemcell(s) Team(s) harbor-container-registry-99b2c77d387b6caae53b bosh-dns/1.10.0 bosh-vsphere-esxi-ubuntu-xenial-go_agent/97.52 - harbor-container-registry/1.6.3-build.3 pivotal-container-service-bf45f9e2177d5da24998 backup-and-restore-sdk/1.8.0 bosh-vsphere-esxi-ubuntu-xenial-go_agent/170.15 - bosh-dns/1.10.0 bpm/0.13.0 cf-mysql/36.14.0 cfcr-etcd/1.8.0 docker/33.0.2 harbor-container-registry/1.6.3-build.3 kubo/0.25.8 kubo-service-adapter/1.3.0-build.129 nsx-cf-cni/126.96.36.19993410 on-demand-service-broker/0.24.0 pks-api/1.3.0-build.129 pks-helpers/50.0.0 pks-nsx-t/1.19.0 pks-telemetry/2.0.0-build.113 pks-vrli/0.7.0 sink-resources-release/0.1.15 syslog/11.4.0 uaa/64.0 wavefront-proxy/0.9.0 service-instance_84bc5c87-e480-4b17-97bc-afed45ab4a6e bosh-dns/1.10.0 bosh-vsphere-esxi-ubuntu-xenial-go_agent/170.15 pivotal-container-service-bf45f9e2177d5da24998 bpm/0.13.0 cfcr-etcd/1.8.0 docker/33.0.2 harbor-container-registry/1.6.3-build.3 kubo/0.25.8 nsx-cf-cni/188.8.131.5293410 pks-helpers/50.0.0 pks-nsx-t/1.19.0 pks-telemetry/2.0.0-build.113 pks-vrli/0.7.0 sink-resources-release/0.1.15 syslog/11.4.0 wavefront-proxy/0.9.0
- There are three deployments listed on my system (PKS management, Harbor, PKS cluster) we will be using service-instance_84bc5c87-e480-4b17-97bc-afed45ab4a6e which is the PKS cluster with a deleted node
- Review the virtual machines involved in the service instance:
ubuntu@opsman-corp-local:~$ bosh -e pks -d service-instance_84bc5c87-e480-4b17-97bc-afed45ab4a6e vms Using environment '172.31.0.2' as user 'admin' (bosh.*.read, openid, bosh.*.admin, bosh.read, bosh.admin) Task 6913. Done Deployment 'service-instance_84bc5c87-e480-4b17-97bc-afed45ab4a6e' Instance Process State AZ IPs VM CID VM Type Active master/e24ccbc1-8b3b-460c-9162-7199d4d67674 running PKS-COMP 184.108.40.206 vm-2ca9f83c-8d80-4e92-a1d5-ff0b3446c624 medium true worker/22be3ec4-7eae-4370-b6cc-d59bd7071f01 running PKS-COMP 220.127.116.11 vm-bad946f5-5b51-40f4-acd4-29bcf3ad7e6a medium true worker/35026d4b-fb24-4b05-8f33-a71dbebf03e7 running PKS-COMP 18.104.22.168 vm-b54970b7-1984-4d74-9285-48d28f308c0b medium true
- BOSH is aware of three total nodes one master and two workers our expect state is three worker nodes
- Running a BOSH consistency check allows us to clean out the persistent disk metadata
ubuntu@opsman-corp-local:~$ bosh -e pks -d service-instance_84bc5c87-e480-4b17-97bc-afed45ab4a6e cck Using environment '172.31.0.2' as user 'admin' (bosh.*.read, openid, bosh.*.admin, bosh.read, bosh.admin) Using deployment 'service-instance_84bc5c87-e480-4b17-97bc-afed45ab4a6e' Task 6920 Task 6920 | 19:26:07 | Scanning 4 VMs: Checking VM states (00:00:18) Task 6920 | 19:26:25 | Scanning 4 VMs: 3 OK, 0 unresponsive, 1 missing, 0 unbound (00:00:00) Task 6920 | 19:26:25 | Scanning 4 persistent disks: Looking for inactive disks (00:00:38) Task 6920 | 19:27:03 | Scanning 4 persistent disks: 3 OK, 1 missing, 0 inactive, 0 mount-info mismatch (00:00:00) Task 6920 Started Tue May 21 19:26:07 UTC 2019 Task 6920 Finished Tue May 21 19:27:03 UTC 2019 Task 6920 Duration 00:00:56 Task 6920 done # Type Description 48 missing_vm VM for 'worker/8eef54b7-eef0-4d95-b09d-8aeb551846c2 (2)' missing. 49 missing_disk Disk 'disk-0500a7de-10e2-414c-8b19-091147c58a98' (worker/8eef54b7-eef0-4d95-b09d-8aeb551846c2, 102400M) is missing 2 problems 1: Skip for now 2: Recreate VM without waiting for processes to start 3: Recreate VM and wait for processes to start 4: Delete VM reference VM for 'worker/8eef54b7-eef0-4d95-b09d-8aeb551846c2 (2)' missing. (1): 4 1: Skip for now 2: Delete disk reference (DANGEROUS!) Disk 'disk-0500a7de-10e2-414c-8b19-091147c58a98' (worker/8eef54b7-eef0-4d95-b09d-8aeb551846c2, 102400M) is missing (1): 2 Continue? [yN]: y Task 6928 Task 6928 | 19:29:49 | Applying problem resolutions: VM for 'worker/8eef54b7-eef0-4d95-b09d-8aeb551846c2 (2)' missing. (missing_vm 13): Delete VM reference (00:00:00) Task 6928 | 19:29:49 | Applying problem resolutions: Disk 'disk-0500a7de-10e2-414c-8b19-091147c58a98' (worker/8eef54b7-eef0-4d95-b09d-8aeb551846c2, 102400M) is missing (missing_disk 6): Delete disk reference (DANGEROUS!) (00:00:07) Task 6928 Started Tue May 21 19:29:49 UTC 2019 Task 6928 Finished Tue May 21 19:29:56 UTC 2019 Task 6928 Duration 00:00:07 Task 6928 done
- The process requires that we delete the entry for the worker node and the missing disk. Notice the big warning around data loss when deleting a volume. In this case we are deleting BOSH metadata because the volume is already gone.
Once the BOSH metadata is removed it will automatically deploy a new worker node and join it to the cluster. Enterprise PKS is flexible enough to handle normal operational tasks of managing and scaling Kubernetes in the enterprise while ensuring you don’t loose data.
Thanks to Matt Cowger from Pivotal for helping with the recovery process.