vRA with Cloud-init and Static Networking

Overview

This article is all about leveraging the static network assignment construct in vRA Cloud/8 Cloud Assembly blueprints and the impact this has (currently) on provisioning vSphere VMs.

This topic seems to cause lots of head scratching and issues, really because currently if you try and leverage static address assignments (i.e. asking vRA to assign an IP address to a VM from one of its own address pools and set it up as a static entry in the guest) you generally end up breaking any cloud-init commands you may also want to run.

Why Is There An Issue?

To appreciate why there is an issue you need to understand what the static assignment construct in Cloud Assembly does. Any time you add “assignment: static” to a vSphere VM spec within a blueprints yaml it is telling Cloud Assembly to generate a dynamic vSphere customization spec and pass it to vCenter which in turns loads it into the guest VM. This is then processed via VMware Tools.

When a vSphere Customization Spec is applied to a VM the final operation is to reboot the machine. You can see this natively within vSphere by just cloning a VM using a spec from your vCenters inventory and watching what happens.

When you add cloud-init into the mix as well as an auto-generated customization spec you end up with 2 mechanisms that are not aware of one another. vCenter processing the dynamic spec generated by Cloud Assembly, setting the IP details and then reboots the machine. At the same time, cloud-init is executing its commands and can get interrupted by the automatic reboot. As cloud-init only executes the commands passed to it via Cloud Assembly on first boot there is no recovery when the VM restarts resulting in a VM that has the correct identity but none of the customization (i.e. app config, disk config etc.) you were hoping for.

Leveraging “assignment: dynamic” does not have this issue however DHCP usually does not fit most customer requirements.

How Do We Fix This?

Well, there are several approaches.

  • Use “assignment: static” together with the Ansible integration and let Ansible config handle all the things you wanted to do in cloud-init
  • As above, except via puppet
  • Use “assignment: static” with a modified template configuration AND leverage cloud-init to handle the identity/networking configuration

It is this third approach I am covering in this article, using a CentOS 7.6 template machine. There is a warning though. Some OS variants may behave differently and therefore you should test first before deciding on using cloud-init as your approach.

Modifying The Template

Just asking cloud-init to handle the network configuration is not enough. If we just add settings to the cloudConfig section of the blueprint yaml then we’re still going to end up in the same scenario, with vSphere restarting the VM before Cloud-Init has finished. To combat this we need to tell Cloud-Init not to allow any vSphere customization to run. We do this by editing “/etc/cloud/cloud.cfg” and flipping the “disable_vmware_customization” attribute to “true”.

In addition we also need to tell cloud-init not to go looking for network configuration to apply from any of the cloud-init configuration files on the machine (or OVF settings).

This may seem counter intuitive but it actually makes sense. Cloud-init initially applies the network configuration from the datasource (OVF – i.e. vRA as that is the standard vRA uses) once. This configuration is dynamically provided and once the machine reboots is not provided again. Cloud-init will look for the configuration and any other config file that might contain a configuration (in /etc/cloud/cloud.cfg.d) on each start-up and will not find it (unless your template specifically contains additional config files). If a cloud-init network config is not found and no disable option is specified then cloud-init will default to a fallback behaviour which is to use DHCP. By specifying the “disable” option we are telling cloud-init not to try and do anything with the network on each subsequent startup which allows the guest OS to use the config that was originally applied to the machine on first run.

As usual, cloud-init needs to be cleaned (“sudo cloud-init clean”) and logs cleared out before shutting down the VM and returning it to a template. If you don’t do this then cloud-init will not execute when a new machine is built from this template and the logs will be very messy!

Modifying The Blueprint

Now that the template has been amended we can turn our attention to the blueprint in Cloud Assembly. Specifically, the cloudConfig section needs to be amended.

As none of the identity customization is going to be done following the cloud.cfg update, all of the commands to do this must be added to cloudConfig for cloud-init to process. You could do this by adding a series of OS level commands to the generic runcmd’s module section or you can use specific cloud-init modules that are for each required job. Here we are going to use specific cloud-init modules.

First is setting the hostname of each VM provisioned. In this example I am taking the name of the resource auto-assigned by vRA, using both the hostname and fqdn modules to populate the VM configuration. You could equally specify an input here instead if your request normally prompts the user for a hostname.

Next I’m adding the address and gateway options using the “network” module. Specifying the version number is mandatory as the network module supports different network configuration methods. The version number controls which configuration method is used (e.g. netplan requires module version 2).

The config is matched to the interface by using the “name” attribute. In my template the first adapter is called “ens192” so that’s what I am specifying here. All other details are then added using the “subnets” attribute, containing the type of assignment, address and gateway attributes.

The values for the attributes are set by using resource bindings to access the configuration of the relevant deployed machine and the network it is attached to. Note that a VM might have more that one interface so it is necessary to specify the interface number in square brackets.

The Result

Following all these updates my VM now comes up with the right hostname.

The IP address and gateway details match across the IP range covered by the network, the address shown in the deployments view and the details visible within the guest OS. Note that my DHCP range on this network is 192.168.110.220-240).

In addition, my other cloud-init commands have also executed, in this case partitioning /dev/sdb, formatting /dev/sdb1 with XFS and mounting it to /mnt/data.

It is well worth reading the cloud-init documentation (https://cloudinit.readthedocs.io/en/latest/) to see what else you can do!