Deploying a production-ready Azure Kubernetes (AKS) cluster with PSAksDeployment

02/04/2019  |    11 minute read

Introducing PSAksDeployment

Azure Kubernetes Service (AKS) makes provisioning Kubernetes clusters very easy.
Unfortunately, the examples we can find out there, be it in official documentation or blog posts, are more “Hello World!” than “real world”.

Deploying a production-ready Kubernetes cluster requires additional components and considerations, like :

  • Monitoring
  • Kubectl configuration
  • How to deploy resources (Helm and Tiller)
  • Routing requests from the outside world to services in the cluster (ingress controller)
  • Issuing and managing TLS certificates for HTTPS endpoints

PSAksDeployment aims to bridge the gap between a “Hello World!” AKS cluster and a cluster on which we can run production apps.

It is an opinionated implementation, in the sense that :

  • The monitoring solution is Azure Monitor (with Log Analytics)
  • The ingress controller is NGINX Ingress Controller
  • Management Kubernetes resources are deployed into a “management” namespace
  • The solution to manage TLS certificates is cert-manager (with Let’s Encrypt)
  • The ingress controller TLS certificate is propagated to other namespaces (including namespaces created at a later point), to allow ingresses in any namespace to use it, using a custom tool : secret-propagator

Installing PSAksDeployment

To use PSAksDeployment, you need :

This means it is only supported on Windows at this point in time. I may make it work with PowerShell Core if there is a need for it.

PSAksDeployment is available on the PowerShell Gallery, so installing it is easy as :

Install-Module -Name 'PSAksDeployment' -Repository 'PSGallery'

PSAksDeployment leverages a few extra tools under the hood :

So PSAksDeployment provides the command Install-PSAksPrerequisites, which downloads and installs these tools (if they are not already installed) in a location specified via the InstallationFolder parameter, like so :

Install-PSAksPrerequisites -InstallationFolder 'C:\Tools'

If the specified installation folder is not in the PATH environment variable, it also takes care of adding it.

Note :

Be patient, the file downloads may take a while.

As soon as Install-PSAksPrerequisites completes, we can start deploying stuff.

Deploying an AKS Cluster

This is where the command Invoke-PSAksDeployment comes in.

It deploys the following :

It primarily acts as an input validation and orchestration layer. Under the hood, most of the work is done by Terraform.

Due to the nature of what it does, Invoke-PSAksDeployment takes in a large number of parameters :

PS C:\> Import-Module -Name 'PSAksDeployment'
PS C:\> (Get-Command -Name 'Invoke-PSAksDeployment').Parameters.Keys
ServicePrincipalID
ServicePrincipalSecret
AzureTenantID
Subscription
ClusterName
ClusterLocation
LogAnalyticsWorkspaceLocation
KubernetesVersion
NodeCount
NodeSize
OSDiskSizeGB
MaxPodsPerNode
Environment
LetsEncryptEmail
TerraformOutputFolder
ConfigPath

You may be wondering :

What are these for ?
Which ones are mandatory ?
What is the default value ?
What are the possible values ?

We can get this information with the cmdlet help, for example :

PS C:\> Get-Help 'Invoke-PSAksDeployment' -Parameter 'ServicePrincipalID'

-ServicePrincipalID <String>
    The application ID of the Service Principal used by Terraform (and the AKS cluster) to access Azure.

    Required?                    true
    Position?                    1
    Default value
    Accept pipeline input?       false
    Accept wildcard characters?  false

But doing this for each parameter can be tedious, so there is another way to specify all inputs to Invoke-PSAksDeployment : a configuration file.

New-PSAksDeploymentConfig scaffolds this configuration file, with helpful information for each parameter :

  • a description
  • the data type
  • valid values
  • the default value

Here is an example :

PS C:\> $ScaffoldParams = @{
>>     ServicePrincipalID     = '29x1ecd3-190f-42c9-8660-088f69d121zn'
>>     ServicePrincipalSecret = 'tsWpRr6/YCxNyh8efMvjWbe5JoOiOw03xR1o9S5CLhZ='
>>     AzureTenantID          = '96v3b174-9c1p-4a5e-9177-18c3bccc87cb'
>>     Subscription           = 'DevOps'
>>     ClusterLocation        = 'North Europe'
>>     Path                   = '.\my-k8s-prod.psd1'
>> }
PS C:\> New-PSAksDeploymentConfig @ScaffoldParams

We still need to provide quite a few bits of information, but this is needed so that the tool can connect to Azure and fetch more information, like :

  • Subscriptions the specified Azure Service Principal has access to
  • Azure regions where AKS is available
  • Kubernetes versions available in the specified region
  • Azure regions where a Log Analytics workspace can be provisioned

Here is what the generated file looks like :

@{
    <#
    The name of the Azure subscription where the AKS instance (and other Azure resources) will be deployed.
    Type : String
    Valid values : "DevOps", "ANY OTHER SUBSCRIPTION WHICH CAN BE ACCESSED BY THE SERVICE PRINCIPAL"
    #>
    Subscription = ""

    <#
    The Azure region where the AKS cluster (and other Azure resources) will be deployed.
    Type : String
    Valid values : "East US", "West Europe", "Central US", "Canada Central", "Canada East", "UK South", "West US", "West US 2", "Australia East", "North Europe", "Japan East", "East US 2", "Southeast Asia", "UK West", "South India", "East Asia"
    #>
    ClusterLocation = ""

    <#
    The Azure region where the Log Analytics workspace will be deployed.
    This might not be possible to provision the Log Analytics workspace in the same region as the AKS cluster, because Log Analytics is available in a limited set of regions.
    Type : String
    Valid values : "East US", "West Europe", "Southeast Asia", "Australia Southeast", "West Central US", "Japan East", "UK South", "Central India", "Canada Central", "West US 2"
    #>
    LogAnalyticsWorkspaceLocation = ""

    <#
    The application ID of the Service Principal used by Terraform (and the AKS cluster) to access Azure.
    Type : String
    Valid values : 
    #>
    ServicePrincipalID = ""

    <#
    The password of the Service Principal used by Terraform (and the AKS cluster) to access Azure.
    Type : String
    Valid values : 
    #>
    ServicePrincipalSecret = ""

    <#
    The ID of the Azure AD tenant where the Terraform Service Principal (and the target subscription) live.
    Type : String
    Valid values : 
    #>
    AzureTenantID = ""

    <#
    The name of the AKS cluster.
	The name of the resource group and the cluster DNS prefix are derived from this value.
    Type : String
    Valid values : The name can contain only letters, numbers, and hyphens. The name must start with a letter and must end with a letter or a number.
    #>
    ClusterName = ""

    <#
    The version of Kubernetes software running in the AKS Cluster.
    Type : String
    Valid values : "1.8.14", "1.8.15", "1.9.10", "1.9.11", "1.10.8", "1.10.9", "1.11.4", "1.11.5"
    #>
    KubernetesVersion = "1.11.5"

    <#
    The number of worker nodes in the AKS cluster.
    Type : Int32
    Valid values : Between 1 and 100.
    #>
    NodeCount = 3

    <#
    The VM size for the AKS cluster nodes.
	This is more descriptive version of Azure VM sizes, it follows a naming convention as :
	{VM Family}_{Number of vCPUs}_{Number of GB of RAM}
    Type : String
    Valid values : "B_2vCPU_8GB", "B_4vCPU_16GB", "D_2vCPU_8GB", "D_4vCPU_16GB", "D_8vCPU_32GB", "E_2vCPU_16GB", "E_4vCPU_32GB", "F_2vCPU_4GB", "F_4vCPU_8GB", "DS_2vCPU_7GB", "DS_4vCPU_14GB"
    #>
    NodeSize = "D_2vCPU_8GB"

    <#
    The OS disk size (GB) for the cluster nodes.
	If set to 0, the default osDisk size for the specified vmSize is applied.
    Type : Int32
    Valid values : Between 0 and 1024.
    #>
    OSDiskSizeGB = 30

  ...
}

(Output cut for brevity)

As we can see above, values are already set when there is a default value, otherwise, it’s an empty string.
We can populate and change the values to our needs, and when the file is ready, we can feed it to Invoke-PSAksDeployment, like so :

PS C:\> Invoke-PSAksDeployment -ConfigPath '.\my-k8s-prod.psd1'

While the deployment is in progress, there is quite a lot of logging written to the console, some of which will look very familiar to those who use Terraform.

The overall deployment duration depends on many variables, some of which pertain to the Azure infrastructure/platform.
That being said, in my experience, it takes between 20 and 40 minutes.

When it completes, we can take a look at the deployed resources in the Azure portal, but what we can see in the resource group (my-k8s-prod-rg in this case) is somewhat deceptive :

my-k8s-prod-rg

Azure AKS creates another resource group (MC_my-k8s-prod-rg_my-k8s-prod_northeurope in this case), which contains the infrastructure resources associated with the cluster : Kubernetes node VMs, virtual network, load balancer, storage, etc :

Infra resource group

Also, the usual Kubernetes management tools are ready to work with our new cluster.
For example, we can use our trusty kubectl to list the deployments in the “management” namespace :

PS C:\> kubectl get deployment -n management
NAME                            DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
cert-manager                    1         1         1            1           33m
nginx-ingress-controller        2         2         2            2           40m
nginx-ingress-default-backend   1         1         1            1           40m
secret-propagator               1         1         1            1           32m

Or use a helm command to list all Helm releases :

PS C:\> helm ls
NAME                 REVISION    UPDATED                     STATUS      CHART                   APP VERSION   NAMESPACE
cert-manager         1           Mon Jan 28 10:35:08 2019    DEPLOYED    cert-manager-v0.5.2     v0.5.2        management
cluster-issuer       1           Mon Jan 28 10:35:18 2019    DEPLOYED    cluster-issuer-0.1.0    1.0           default
nginx-ingress        1           Mon Jan 28 10:28:41 2019    DEPLOYED    nginx-ingress-1.1.5     0.21.0        management
secret-propagator    1           Mon Jan 28 10:36:32 2019    DEPLOYED    secret-propagator-0.1.0 1.0           default

Deleting an AKS Cluster

An AKS cluster deployed with Invoke-PSAksDeployment may need to be later deprovisioned.

In this case, the cmdlet Remove-PSAksDeployment automates tearing down the Azure Kubernetes Service instance and all associated resources, to stop incurring any Azure charges.

Warning :

This deletes all resources in both resource groups : the target resource group and the infrastructure resource group created by AKS.
Keep this in mind, especially if any resource(s) were added outside of PSAksDeployment’s purview.

Here is an example usage :

PS C:\> $DestroyParams = @{
>>     ServicePrincipalID     = '29x1ecd3-190f-42c9-8660-088f69d121zn'
>>     ServicePrincipalSecret = 'tsWpRr6/YCxNyh8efMvjWbe5JoOiOw03xR1o9S5CLhZ='
>>     AzureTenantID          = '96v3b174-9c1p-4a5e-9177-18c3bccc87cb'
>>     Subscription           = 'DevOps'
>>     ClusterName            = 'my-k8s-prod'
>> }
PS C:\> Remove-PSAksDeployment @DestroyParams

Zooming In On a Few Terraform Resources

Now that we know how to use PSAksDeployment, let’s take a look at a few sections of Terraform configurations which are worthy of mention/explanation.

resource "null_resource" "helm_init" {
  provisioner "local-exec" {
    command = "helm init --wait --replicas ${var.tiller_replica_count} --tiller-namespace kube-system --service-account=${kubernetes_service_account.tiller.metadata.0.name}"
  }

  depends_on = ["kubernetes_cluster_role_binding.tiller"]
}

This resource is just running a command on the local machine.
The purpose of the helm init command is to install and configure tiller (the server-side component of Helm) into the Kubernetes cluster.

After this command completes, we are ready to use Helm against our Kubernetes cluster to deploy components and applications via Helm charts.
Helm charts are essentially packages describing Kubernetes resources to deploy (pods, services, etc…) and how to deploy them. We actually use Helm charts in subsequent steps.

resource "helm_release" "nginx_ingress" {
  name      = "nginx-ingress"
  chart     = "stable/nginx-ingress"
  namespace = "${kubernetes_namespace.management.metadata.0.name}"

  // Giving Azure 10min to create a load-balancer and assign the Public IP to it
  timeout    = "600"
  depends_on = ["null_resource.helm_init"]

  values = [<<EOF
  controller:
    replicaCount: ${var.ingressctrl_replica_count}
    service:
      loadBalancerIP: "${var.ingressctrl_ip_address}"
EOF
  ]
}

The helm_release Terraform resource allows to deploy Helm charts into Kubernetes.

Here, the Helm chart being deployed is “nginx-ingress”.
The NGINX Ingress Controller is a popular solution to manage access to services running inside the cluster from the outside world.

The loadBalancerIP value is interesting :
it tells Kubernetes to create a service and expose it externally via a load-balancer. Then, Kubernetes asks the underlying cloud provider (Azure, in this case) to provision a load-balancer and attach it to the specified IP address.

The value of this IP address comes from a Public IP Azure resource which is created at an earlier step.

Sometimes, the provisioning of the Azure load-balancer takes more than 5 minutes (the default Helm timeout), this is why we set the timeout value to 600 (10 minutes).

resource "helm_release" "cert_manager" {
  name       = "cert-manager"
  chart      = "stable/cert-manager"
  // Since v0.6.0, cert-manager Helm chart doesn't provide
  // a good way of installing the cert-manager CRDs
  version    = "v0.5.2"
  namespace  = "${kubernetes_namespace.management.metadata.0.name}"
  timeout    = "540"
  depends_on = ["helm_release.nginx_ingress"]

  values = [<<EOF
  ingressShim:
    defaultIssuerName: letsencrypt-${var.letsencrypt_environment}
    defaultIssuerKind: ClusterIssuer
EOF
  ]
}

This time, we deploy the cert-manager Helm chart.
Cert-manager is a very cool tool which automates the issuance and renewal of TLS certificates needed by HTTPS-based services. The resulting certificates are ultimately stored in Kubernetes as secret resources.

cert-manager extends Kubernetes with custom resources, like : Certificate, Issuer, etc…
These CustomResourceDefinition (CRDs) are not shipped with the Helm chart anymore, which means we need to apply a separate YAML manifest prior to using cert-manager Helm chart. Besides, the URL of this manifest varies based on the cert-manager version.

So as a (hopefully temporary) workaround, we pin the cert-manager Helm chart version to the latest one which ships with the CRDs.

resource "helm_release" "cluster_issuer" {
  name       = "cluster-issuer"
  chart      = "..\\..\\Assets\\cluster-issuer"
  depends_on = ["helm_release.cert_manager"]

  values = [<<EOF
  email: ${var.letsencrypt_email_address}
  environment: ${var.letsencrypt_environment}
EOF
  ]
}

Again, we deploy a Helm chart, but this time we are not pulling the chart from the stable repository but from a local directory.
This is a custom chart to create a cluster-wide issuer resource for cert-manager.

cert-manager can talk to different certificate authorities, but in this case, we configure it to talk to Let’s Encrypt. Let’s Encrypt is easy to use, free, and suitable for production certificates.

The caveat to that is :
Let’s Encrypt certificates are only valid for 90 days, but this is not a problem here, because cert-manager takes care of renewing them automatically.

That’s pretty much it for now.
For more information about PSAksDeployment and a dive into the code, head over to the project on GitHub.
If you have any question, remark, issue, or feature request, feel free to open an issue.

Leave a Comment

Your email address will not be published. Required fields are marked *

Loading...