Scaling Fabric with Kubernetes

Fabric cluster is deployed in Kubernetes where each pod represents a Fabric node.

Kubernetes is designed to automatically scale the number of pods in either a deployment or a ReplicaSet, based on observed metrics like CPU utilization, by using its Horizontal Pod Autoscaling (HPA) mechanism.

The HorizontalPodAutoscaler (HPA) controller calculates desired replicas based on the ratio of the current metric value to the desired value: desiredReplicas = [currentReplicas * ( currentMetricValue / desiredMetricValue )].

This article deals with K8S auto-scaling concepts with regards to Fabric aspects and does not refer to a specific or custom K8S autoscaler.

Prerequisites

Kubernetes Version: Ensure deployment on Kubernetes version 1.18 or later. Verify version compatibility from official Kubernetes documentation.
Metrics Server: Deploy the Metrics Server in the cluster, HPA utilizes it in order to gather resource metrics from pods and nodes.

Scaling Strategies for Fabric Pods

To effectively harness the scalability of Fabric pods, the approach varies based on each use case, for example:

For loads stemming from a high number of Web Services (WS's) calls, it is advisable to employ a Load Balancer (LB) that directs traffic to all Fabric pods.
When the load is due to the quantity of Fabric jobs, jobs are being redistributed as new nodes are integrated into the cluster.

Example

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
    labels:
        tenant: tenant
        space: space
        app: fabric
    name: fabric-hpa
spec:
    scaleTargetRef:
        apiVersion: apps/v1
        kind: StatefulSet
        name: fabric-stateful-sets
    minReplicas: 1
    maxReplicas: 3
    targetCPUUtilizationPercentage: 80

For more information about an advanced setup, read below:

Read here about scaling Fabric on-prem, within bare-metal or virtual machine environments.