This document provides an overview of how to deploy a SQL Server 2019 Big Data Cluster on Kubernetes. It discusses setting up infrastructure with Ubuntu templates, installing Kubespray to manage the Kubernetes cluster lifecycle, and using azdata to deploy the Big Data Cluster. Key steps include creating an Ansible inventory, configuring storage with labels and profiles, and deploying the cluster. The document also offers tips on sizing, upgrades, and next steps like load balancing and monitoring.
2. Who Am I ?
▪ SQL Server Solution Architect at Pure Storage
▪ SQL Server user for 20 years
▪ Was heavily involved in the SQL Server 2019 EAP
▪ Co-author of the Microsoft workshop:
Big Data Clusters: From Bare Metal to Kubernetes
3. Why This Session ?
I’d like to deploy a Big Data Cluster,
are there any gotchas
I need to be aware of ?
Most orgs are familiar with Windows
and VMware as platforms, Kubernetes
and Linux, not so much
5. What We Will End Up With
Cluster Build Host
K8s Master 1
K8s Master 2
K8s Worker 1
K8s Worker 2
K8s Worker 3
kubespray, ansible, git, kubectl and azdata
Kubernetes cluster
SQL Server 2019
Big Data Cluster running
on the three worker
nodes
3 node etcd cluster
10. Template Creation – ISO
▪ Get Ubuntu 16.04 AMD 64 Server Image
https://releases.ubuntu.com/16.04/
▪ Upload image to your VMware ISO data
store
▪ Create a virtual machine with a DVD drive
that boots from this ISO
▪ Next up creating an Ubuntu guest
12. sudo apt-get install -–install-recommends linux-generic-hwe-16.04 –y
DO THIS BEFORE YOU CREATE YOUR
KUBERNETES CLUSTER ON EACH NODE HOST,
OTHERWISE YOU WILL BREAK YOUR CLUSTER
Kernel Update Gotcha
13. Post Seed VM Creation Steps
▪ sudo apt-get update
▪ sudo apt-get install yamllint
▪ sudo reboot
▪ VMware vcenter -> virtual machine -> Template -> Convert to Template
14. Ubuntu VM
Template
Cluster Build Host
K8s Master 1
K8s Master 2
K8s Worker 1
K8s Worker 2
K8s Worker 3
Infrastructure Build Out From The Template
As we create each host, we need to do two things:
▪ Give each host a unique name
▪ Give each host a unique ip address
Tip: We could do this with Terraform and the VMware provider (very popular)
16. iSCSI Gotcha
▪ If you are using an iSCSI based storage solution and cloned virtual machines . . .
▪ InitiatorName value in /etc/iscsi/initiatorname.iscsi needs to be unique for each node host
17. IP Address Configuration
1. Get name of your network adapter, it should be prefixed by ens
For iSCSI storage, you will need two adapters – here we just have the one
18. IP Address Configuration
2. Edit the netplan configuration file /etc/network/interfaces
auto <primary network interface>
iface <primary network interface> inet static
address <ip address>
netmask <netmask>
gateway <gateway ip address>
iface <secondary network interface> inet static
address <ip address>
netmask <netmask>
dns-nameservers <ip address>
Secondary NIC required,
if iSCSI storage is used
20. ▪ A tool based on Ansible playbooks and kubeadm
for managing a Kubernetes cluster’s life cycle:
▪ Cluster creation
▪ Cluster removal
▪ Upgrading a cluster
▪ Adding nodes
▪ Removing node
▪ Rebuilding master nodes
▪ Etc . . .
Kubespray – What Is It ?
24. ▪ cp –r kubespray/inventory/sample
kubespray/inventory/<cluster name>
▪ Edit inventory.ini file,
example on the right
▪ Inventory file path:
kubespray/inventory/<cluster name>/inventory.ini
Kubespray – Create An Ansible Inventory
25. Kubespray – Configure ssh Connectivity
The following commands are all to be run on the server hosting ansible
▪ ssh-keygen
▪ Carry the following out for each node host:
ssh-copy-id <username>@<hostname>
▪ ssh-agent /bin/bash
▪ ssh-add ~/.ssh/id_rsa
▪ Test ssh connectivity from the ansible server:
ansible -i inventory/<cluster name>/inventory.ini all -m ping
26. Storing ssh Passphrases With keychain
On the server you intend to run Kubespray from:
▪ sudo apt install keychain
▪ Add the following two lines to your .bashrc file, ~cadkin/.bashrc in my case:
/usr/bin/keychain $HOME/.ssh/id_rsa
source $HOME/.keychain/$HOSTNAME-sh
30. ▪ Install kubectl on Kubespray server:
snap install kubectl --classic
▪ Create directory on Kubespray server to hold context:
cd ~
mkdir .kube
▪ ssh onto any node in the cluster and then run:
sudo chmod 755 /etc/kubernetes/admin.conf
▪ On the Kubespray server - admin.conf only resides on master node hosts
sudo scp <username>@<hostname>:/etc/kubernetes/admin.conf ~/.kube/config
▪ ssh back onto the master node you got copied the admin.conf file from and issue:
sudo chmod 620 /etc/kubernetes/admin.conf
Post Deployment Steps
31. ▪Check the health of the cluster nodes
kubectl get nodes –o wide
▪Create the health of the system pods
kubectl get po –n kube-system
Some Quick Post Cluster Creation Sanity Checks
32. ▪ We need a storage plugin that
supports persistent volumes
▪ Never ever use ephemeral storage in
production
▪ Free options:
Portworx essentials
VMware Cloud Native Storage
A Word On Storage
33. Check That You Have A Storage Plugin Installed
kubectl get sc
34. Perform A Simple Test
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: test-pvc
spec:
storageClassName: <storage class>
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2Gi
And then . . .
kubectl apply –f test-pvc.yaml
kubectl get pvc
test-pvc.yml file contents:
36. Sizing Your Cluster
Can you give me a reference
architecture for the infrastructure I
need for a Big Data Cluster ?
What you need really depends on
your workload, but . . .
37. Storage Gotchas
Persistent volume extension
As of CU6 persistent volumes (PVs) cannot be resized
through either azdata or Azure Data Studio
Pro tip: size PVs upfront to allow for data growth
39. Working With Configuration Profiles
▪ Create a profile
azdata bdc config init --path ca-bdc-kubeadm-dev-test --source kubeadm-dev-test
▪ Specify the storage class for data
azdata bdc config replace --path ca-bdc-kubeadm-dev-test/control.json
--json-values "$.spec.storage.data.className=pure-block"
▪ Specify the size for data persistent volumes
azdata bdc config replace --path ca-bdc-kubeadm-dev-test/control.json
--json-values "$.spec.storage.data.size=10Gi"
▪ Specify the storage class for logs
azdata bdc config replace --path ca-bdc-kubeadm-dev-test/control.json
--json-values "$.spec.storage.logs.className=pure-block"
▪ Specify the size for log persistent volumes
azdata bdc config replace --path ca-bdc-kubeadm-dev-test/control.json
--json-values "$.spec.storage.logs.size=5Gi"
40. Configuring The HDFS Replication Factor
azdata bdc config replace --path ca-bdc-kubeadm-dev-test/bdc.json
--json-values "$.spec.services.hdfs.settings={"hdfs-site.dfs.replication":"1"}"
▪ By default data is replicated three times
▪ If the storage platform has built-in
resilience, e.g. erasure coding we can . . .
44. We’ve Covered The Basics - Where To Next ?
▪ Load balancer installation and configuration - metallb is the easiest option
▪ Deploying the Kubernetes dashboard in a secure manner
▪ Backup and recovery
▪ Using production profiles which include HA and active directory integration
▪ Kubernetes cluster upgrades
▪ Monitoring a Kubernetes cluster via its built-in Prometheus exporter
45. Bill Of Materials
Component Version
VMware vSphere 6.7
Linux distribution Ubuntu server edition 16.04.7 LTS
Linux kernel 4.15.0-118-generic
Kubernetes 1.19.1
SQL Server 2019 Big Data Cluster CU6
Kubernetes storage plugin Pure Service Orchestrator 6.0.2