Products
Issue/Introduction
This article provides information on resolving the vCLS health issues, so that DRS functions correctly in the cluster.
Symptoms:
vSphere 7.0 Update 1, vSphere DRS for a cluster depends on the health of vSphere Cluster Services (vCLS). vCLS on a cluster configures a quorum on vCLS system VMs on the cluster. These VMs are necessary to maintain the health of the cluster services. If vCLS health gets impacted due to unavailability of these VMs in a cluster, then vSphere DRS will not be functional in the cluster until the time vCLS VMs are brought back up.
Below are the listed operations that could fail if performed when DRS is not functional. Also, another point to note that below operations on a new DRS enabled cluster will not be available until the first vCLS VM is deployed and powered-on in that cluster.
A new workload VM placement/power-on.
Host selection for a VM that is migrated from another cluster/host within the vCenter.
Migrated VM could get powered-on on a non-DRS selected host.
Placing a host into maintenance mode might get stuck if it has any powered-on VM
Invocation of DRS APIs such as ClusterComputeResource.placeVm() and ClusterComputeResource.enterMaintenanceMode() will get InvalidState.
Configuration of Workload Management, Supervisor Cluster and Tanzu Kubernetes Cluster will fail.
vSphere DRS functionality was impacted due to unhealthy state vSphere Cluster Services caused by the unavailability of vSphere Cluster Service VMs. vSphere Cluster Service VMs are required to maintain the health of vSphere DRS.
For more information, see vSphere Cluster Services (vCLS) in vSphere 7.0 Update 1 .
Cause
A user has powered off or deleted vCLS VMs from a DRS enabled cluster
vCLS VMs deployment failed
vCLS VMs power on failed
When vCLS is disabled on a cluster using Retreat Mode
HA was unable to failover vCLS VMs upon host or storage failure
Resolution
Additional Information
Scenarios with a resolution where vCLS VMs deployment could fail:
Not enough free resource in the cluster - Requires 400 MHz of CPU, 400 MB of memory and 2 GB of storage space on a cluster with more than 3 hosts. For more information on the resource requirements for these VMs, see the vCLS VM Resource Allocation section of the vSphere Resource Management Guide. vCLS reserves slots equal to the quorum size of the VMs + 1 per cluster. vCLS VMs require this much extra resources in the clusters for successful deployment.
Deployment failures in 1 node and 2 node vSAN cluster - vCLS VMs failed to deploy on a 1 or 2 node vSAN cluster with the error: Can't provision VM for ClusterAgent due to lack of suitable datastore. Since vCLS uses datastore default policy for datastore selection, if vSAN is the only available datastore within the cluster, then default policy requires 3 node vSAN cluster. The deployment of these VMs will fail in such a cluster. If a 2 Node vSAN cluster has a witness node, then deployment of vCLS VM succeeds. Workaround is to increase the size of the vSAN cluster or to change the datastore default policy.
Orphaned VM cases - If there are orphaned vCLS VMs in the vCenter Server because of disconnected and reconnected hosts, deployment of new vCLS VMs in such a cluster after adding the host might fail. Suggested workaround is to clean-up any stale/orphaned vCLS VMs from the inventory.
Not enough free resources in the cluster.
Power-on of disconnected/orphaned vCLS VMs could fail - If there are orphaned vCLS VMs in vCenter because of disconnected and reconnected hosts, power-on of such orphaned VMs could fail as these are disconnected. The workaround is to manually delete these VMs so new deployment of vCLS VMs will happen automatically in proper connected hosts/datastores.
Power-on failure due to changes to the configuration of the VMs - If user changes the configuration of vCLS VMs, power-on of such a VM could fail. User is not supposed to change any configuration of these VMs.
https://knowledge.broadcom.com/external/article?legacyId=79892
https://knowledge.broadcom.com/external/article/312147
Products
Issue/Introduction
vSphere Cluster Services (vCLS) is a new feature in vSphere 7.0 Update 1. This feature ensures cluster services such as vSphere DRS and vSphere HA are all available to maintain the resources and health of the workloads running in the clusters.
In vSphere 7.0 Update 1, VMware has released a platform/framework to facilitate them to run independently of the vCenter Server instance availability. In this release, vCenter Server is still required for running cluster services such as vSphere DRS, vSphere HA, etc.
Note: vSphere DRS depends on the health of the vSphere Cluster Services starting with vSphere 7.0 Update 1.
In vSphere 8.0 U3, VMware has released a newer version of vCLS known as Embedded vCLS, which will be used when both vCenter and ESXi are updated to 8.0 U3. The original version of vCLS (prior to 8.0 U3) will be referred to as External vCLS.
Environment
VMware vCenter Server 7.0.x
VMware vCenter Server 8.0.x
ESXi 7.0.x
ESXi 8.0.x
Resolution
Contents
Related KBs
vSphere Cluster Services
vCLS is a mandatory feature which is deployed on each vSphere cluster when vCenter Server is upgraded to Update 1 or after a fresh deployment of vSphere 7.0 Update 1. The ESXi hosts can be of any older version which is compatible with vCenter server 7.0 Update 1. For more information, see the vSphere Cluster Services (vCLS) section of the vSphere Resource Management Guide.
As explained in the documentation, there will be 1 to 3 vCLS VMs running on each vSphere cluster depending on the size of the cluster. vSphere DRS in a DRS enabled cluster will depend on the availability of at least 1 vCLS VM. Unlike your workload/application VMs, vCLS VMs should be treated like system VMs. Do not perform any operations on these VMs unless guided by VMware support or explicitly listed as supported operation in any documentation.
There is no way to disable vCLS on a vSphere cluster and still have vSphere DRS being functional on that cluster. However, should it be necessary, you can disable vCLS on a cluster by following the Retreat Mode steps, but this will impact some of the cluster services for that cluster.
Reference: How to Disable vCLS on a Cluster via Retreat Mode
This feature has two revisions. The first, introduced in vSphere 7.0 Update 1, is known as "External vCLS". It will be deprecated in future versions of vSphere. The second, introduced in vSphere 8.0 Update 3, is known as "Embedded vCLS". These versions serve the same overall purpose, but have different runtimes, leading to differences in behaviors and supported operations.
Size of the vCLS VMs
vSphere Cluster Service VMs are very small VMs compared to workload VMs. Each consumes 1 vCPU and 128 MB of memory and about 500 MB of storage. Below table shows the specification of these VMs:| Memory | 128 MB |
| Memory Reservation | 100 MB |
| Swap Size | 256 MB |
| CPU | 1 |
| CPU Reservation | 100 MHz |
| Hard Disk | 2 GB |
| Ethernet Adapter | 0 (It is a No NIC VM) |
| VMDK Size | ~245 MB (thin disk) |
| Storage Space | ~480 MB (thin disk) |
vCLS During Infrastructure Maintenance
Cluster compute maintenance (more details here - Automatic power-off of vCLS VMs during maintenance mode)
When there is only 1 host - vCLS VMs will be automatically powered-off when the single host cluster is put into Maintenance Mode, thus maintenance workflow is not blocked.
When there are 2 or more hosts - In a vSphere cluster where there is more than 1 host, and the host being considered for maintenance has running vCLS VMs, then vCLS VMs will be migrated to other hosts if there are free resources and if they have storage connectivity (shared storage). If these VMs cannot be migrated for the lack of free available resource on other hosts or if these VMs are placed in a local datastore, then these VMs will be powered off automatically to give preference to the host Maintenance Mode operation. As stated before, vSphere DRS for a cluster will not be functional where there is not at least 1 vCLS VM running in that cluster.
If you are decommissioning a cluster, then you have to put all the hosts into Maintenance Mode prior to deleting the cluster for proper clean-up of vCLS VMs. If you delete the cluster without placing the hosts in Maintenance Mode, there will be stale vCLS VMs running inside the hosts causing issues when these hosts with running VMs are added back to a new cluster.
Disconnect Host - On the disconnect of Host, vCLS VMs are not cleaned from these hosts as they are disconnected are not reachable. New vCLS VMs will not be created in the other hosts of the cluster as it is not clear how long the host is disconnected. When disconnected host is connected back, vCLS VM in this disconnected host will be registered again to the vCenter inventory. If a disconnected host is removed from inventory, then new vCLS VMs may be created in other hosts of the clusters if Quorum is not reached.
Datastore maintenance. For more information, see Impact of vSphere Cluster Services on storage workflows
Other VMware Product Interop
SRM - Planned migration
SRM 8.3.1 is not supported with vSphere 7.0 update.
VMware Aria Operations
Capacity reclaim- Capacity optimization workflow of vRealize Operations Manager might detect vCLS VMs as idle VMs and might include them in the recommendations for reclaiming the capacity. If vCLS VMs are deleted as part of reclaim workflow, vCLS service will recreate these VMs back. There might be a time when vCLS status for that cluster might turn unhealthy if DRS runs prior to bringing the VM back up. For more information, see the Reclaim section of the VMware Aria Operations Documentation. The recommended option is to exclude these VMs from capacity reclaim workflow. These VMs can be identified by their names (vCLS) or by looking at additional properties as explained in the documentation.
Cross cluster services - vRealize Operation Manager Workload Placement (WLP) workflows might be impacted if DRS is not functional on the cluster due to unhealthy vCLS, where WLP is recommending the placement of workloads.
vSAN
vRealize Automation
vCLS should not impact any partner workflows like Backup, monitoring etc., Since these VMs are managed by vCLS, there is no reason to configure backup on these VMs as restoring from backup in case of a recovery operation is not necessary or might fail. These VMs can be identified using APIs as listed above under “Identifying vCLS VMs” section.
Products/solutions without any interop issues
VMware Cloud Foundation - Cloud Builder and SDDC Manager will not have any impact, vRA, vROps and vSAN impact is addressed above
NSX Data Center for vSphere
NSX-T Data Center for vSphere
vCPP
vCD
vCDA
vXRail
Horizon Enterprise
Partners Impact
vCLS should not impact any partner workflows like Backup, monitoring etc., Since these VMs are managed by vCLS, there is no reason to configure backup on these VMs as restoring from backup in case of a recovery operation is not necessary or might fail. These VMs can be differentiated via API with below additional properties for these VMs.vm.config.extraConfig["HDCS.agent"] = "true"
This is the most reliable way to identify a general vCLS VM. A previous version of this article also included using the VM's "ManagedByInfo" to identify it as vCLS. This was not incorrect, but it only works for External vCLS, as Embedded vCLS uses a different value. For more extensive information on identifying vCLS VMs and differentiating their type, refer to Script Identification for Embedded vCLS has Changed Identifiers Including ManagedByInfo.
Attachments
https://knowledge.broadcom.com/external/article?legacyId=91890
Products
Issue/Introduction
There is no way to disable vCLS on a vSphere cluster and still have vSphere DRS remain functional on that cluster.
However, should it be necessary, you can disable vCLS on a cluster by following the Retreat Mode steps below, but this will impact some of the cluster services for that cluster.
Impact/Risks:
Note: Retreat Mode should be used with extra caution and should be used only for the purposes mentioned in this document. Below are the details of the impacted cluster services due to the enablement of Retreat Mode on a cluster:
vSphere DRS will not function on that cluster if DRS is enabled for that cluster. That means workloads running inside that cluster are not load-balanced, hence will not be migrated to different hosts within the cluster when the current host running that VM is running out of resources. When a user wants to take down a host for maintenance, running VMs will not be automatically migrated to other hosts within that cluster.
vSphere HA will not perform optimal placement during a host failure scenario as HA depends on DRS for placement recommendations. HA will still power-on the VMs but these VMs might be powered on in a less optimal host.
Environment
VMware vCenter Server 7.0.x
Resolution
Retreat Mode Steps
Note: Starting in vSphere 7.0 U3o and 8.0 U2, entering Retreat Mode is now available as a Cluster setting within the vCenter Server UI.vSphere 7.0 U3o/8.0 U2 and Later
Log into vCenter's HTML5 client
In Hosts and Clusters inventory, select a cluster.
Click on the Configure tab.
Under vSphere Cluster Services, select General.
In the top right, click on EDIT VCLS MODE.
In the Edit vCLS Mode pop up window, click on the second radio option Retreat Mode.
Click OK.
For Versions Prior to vSphere 7.0 U3o and 8.0 U2, Using the vSphere Client
Log in to the vSphere Client.
Navigate to the cluster on which vCLS should be disabled. Copy the cluster domain id from the URL of the browser. It should be similar to 'domain-c<number>', not the entire string.
Notes: You only need to copy
domain-c<number> part of the URL. For example: When you navigate to cluster in vSphere client, your URL will be similar to this: https://<fqdn-of-vCenter-server>/ui/app/cluster;nav=h/urn:vmomi:ClusterComputeResource:domain-c1006:ce4a7b9f-768c-2222-3333-############/summary. You only need to copy domain-c1006 to use in the steps below.Using other values, for example the cluster UUID, or a combination of the cluster ID and the UUID, will result in vpxd failing to start when you next restart it. Therefore please be careful to only use the ID domain-c<number>.
If you already did add the wrong value by accident, causing vpxd to no longer start, you can remove the VCLS retreat mode settings from the
vpxd.cfg configuration file. Take a backup of the vpxd.cfg, then run the following command:# sed '/<vcls>/,/<\/vcls>/d' -i /etc/vmware-vpx/vpxd.cfgThis will remove all retreat mode settings from all of the clusters in this vCenter, but it will allow vpxd to start again
Navigate to the vCenter Server and then to Configure tab.
Click on Advanced setting section and then on Edit settings button.
Add a new entry with name = config.vcls.clusters.domain-c<number>.enabled and value = False.
Note: True and False are case insensitive, so any case of these two values should be accepted.
Click Save.
vCLS monitoring service will initiate the clean-up of vCLS VMs and user will start noticing the tasks with the VM deletion.
If this cluster has DRS enabled, then it will not be functional and additional warning will be displayed in the cluster summary. DRS will be disabled until vCLS is re-enabled on this cluster.
To remove Retreat Mode from the cluster, change the value to True in step# 5 above.
Note: True and False are case insensitive, so any case of these two values should be accepted.
Once you configure retreat mode on a cluster, the entry for the cluster will stay in the vCenter Advanced Settings. There is no way to delete this entry from vSphere Client, there will be no issue with keeping this entry.
Using APIs/CLIs
Use the attached retreatModeConfiguration.py script to configure retreat mode on multiple clusters on the VC.
Usage: python retreatModeConfiguration.py -r disable or python retreatModeConfiguration.py -r enable
Identifying vCLS VMs
In the vSphere Client UI, vCLS VMs are named
vCLS (<number>)where the number field is auto-generated. All vCLS VMs with the Datacenter of a vSphere Client are visible in the VMs and Template tab of the client inside a VMs and Templates folder named vCLS.
If you click on the summary of these VMs, you will see a banner which reads vSphere Cluster Service VM is required to maintain the health of vSphere Cluster Services. Power state and resource of this VM is managed by vSphere Cluster Services, along with a Learn More link which takes you to the KB article.
Using vSphere Managed Object Browser (MOB)
Identifying all the vCLS VMs for a given datacenter
Sample MOB query examples:
Replace IP address and moid to a vCLS VM in these sample queries:https://<IP address>/mob/?moid=vm-1004&doPath=config.managedBy <then screenshot 1>https://<IP address>/mob/?moid=vm-1004&doPath=config.extraConfig%5b%22HDCS.agent%22%5d
Replace IP address and moid to a VM folder named vCLS in the sample query
https://<IP address>/mob/?moid=group-v16Attachments
本文链接:https://kinber.cn/post/4169.html 转载需授权!
推荐本站淘宝优惠价购买喜欢的宝贝:

支付宝微信扫一扫,打赏作者吧~
