This documentation outlines the procedures and code examples for scaling applications in production using the Kubernetes Python client. The focus is on configuring the Horizontal Pod Autoscaler (HPA) and managing resource metrics for effective scaling.

Prerequisites

Ensure that you have the Kubernetes Python client installed. You can include the client using a Dockerfile as shown below:

FROM nbgallery/jupyter-alpine:latest 

RUN pip install git+https://github.com/kubernetes-client/python.git

ENTRYPOINT ["/sbin/tini", "--"]
CMD ["jupyter", "notebook", "--ip=0.0.0.0"]

Step 1: Defining the Horizontal Pod Autoscaler (HPA)

To initiate scaling based on resource metrics, you need to create an HPA object. This allows Kubernetes to automatically scale the number of pods based on observed CPU usage or other select metrics.

HPA Configuration Example

The following is an example of how to define an HPA using the Python client:

from kubernetes import client, config

# Load the Kubernetes configuration
config.load_kube_config()

# Create an instance of the API class
api_instance = client.V2beta2HorizontalPodAutoscalerApi()

# Define the HPA settings
hpa = client.V2beta2HorizontalPodAutoscaler(
    api_version="autoscaling/v2beta2",
    kind="HorizontalPodAutoscaler",
    metadata=client.V1ObjectMeta(name="example-hpa"),
    spec=client.V2beta2HorizontalPodAutoscalerSpec(
        scale_target_ref=client.V2beta2CrossVersionObjectReference(
            api_version="apps/v1",
            kind="Deployment",
            name="example-deployment"
        ),
        min_replicas=1,
        max_replicas=10,
        metrics=[client.V2beta2MetricSpec(
            type="Resource",
            resource=client.V2beta2ResourceMetricSource(
                name="cpu",
                target=client.V2beta2MetricTarget(
                    type="Utilization",
                    average_utilization=50  # Target 50% CPU utilization
                )
            )
        )]
    )
)

# Create the HPA in the Kubernetes cluster
api_instance.create_namespaced_horizontal_pod_autoscaler(
    namespace="default",
    body=hpa
)

In this example, the HPA configuration is set to manage the example-deployment by adjusting the number of pods between 1 and 10 based on CPU utilization metrics.

Step 2: Monitoring Metrics for Scaling

Kubernetes uses metrics defined in its API to determine when to scale pods. The most common metrics used are CPU and memory, which you can set using the ResourceMetricSource as shown earlier.

Metrics Specification

The following code illustrates how to utilize the ContainerResourceMetricSource and PodsMetricSource for scaling based on pod-specific metrics.

# Example for using Container Resource Metric
resource_metric = client.V2beta2ContainerResourceMetricSource(
    name="memory",
    target=client.V2beta2MetricTarget(
        type="Utilization",
        average_utilization=80  # Target 80% memory utilization
    )
)

# Define a metric for pod-specific metrics
pods_metric = client.V2beta2PodsMetricSource(
    metric=client.V2beta2MetricIdentifier(
        name="transactions-processed",
    ),
    target=client.V2beta2MetricTarget(
        type="AverageValue",
        average_value="100"
    )
)

# Example of creating HPA with custom metrics
hpa_with_custom_metrics = client.V2beta2HorizontalPodAutoscaler(
    api_version="autoscaling/v2beta2",
    kind="HorizontalPodAutoscaler",
    metadata=client.V1ObjectMeta(name="example-custom-hpa"),
    spec=client.V2beta2HorizontalPodAutoscalerSpec(
        scale_target_ref=client.V2beta2CrossVersionObjectReference(
            api_version="apps/v1",
            kind="Deployment",
            name="example-deployment"
        ),
        min_replicas=1,
        max_replicas=10,
        metrics=[
            resource_metric,
            pods_metric,
        ]
    )
)

# Create the custom HPA
api_instance.create_namespaced_horizontal_pod_autoscaler(
    namespace="default",
    body=hpa_with_custom_metrics
)

Step 3: Managing HPA Behavior

Kubernetes allows further customization of scaling behavior according to specific needs. The behavior settings can dictate how aggressive or conservative the scaling should be.

Customized HPA Behavior Example

You can define behaviors for scaling up and down as follows:

behavior = client.V2beta2HorizontalPodAutoscalerBehavior(
    scale_up=client.V2beta2HPAScalingRules(
        stabilization_window_seconds=30,
        policies=[
            client.V2beta2HPAScalingPolicy(
                type="Pods",
                value=2,
                period_seconds=60
            )
        ]
    ),
    scale_down=client.V2beta2HPAScalingRules(
        stabilization_window_seconds=30,
        policies=[
            client.V2beta2HPAScalingPolicy(
                type="Pods",
                value=1,
                period_seconds=60
            )
        ]
    )
)

# Adding behavior to HPA
hpa_with_behavior = client.V2beta2HorizontalPodAutoscaler(
    api_version="autoscaling/v2beta2",
    kind="HorizontalPodAutoscaler",
    metadata=client.V1ObjectMeta(name="example-hpa-with-behavior"),
    spec=client.V2beta2HorizontalPodAutoscalerSpec(
        scale_target_ref=client.V2beta2CrossVersionObjectReference(
            api_version="apps/v1",
            kind="Deployment",
            name="example-deployment"
        ),
        min_replicas=1,
        max_replicas=10,
        behavior=behavior,
    )
)

# Create the HPA with customized behavior
api_instance.create_namespaced_horizontal_pod_autoscaler(
    namespace="default",
    body=hpa_with_behavior
)

Conclusion

Scaling applications in production with Kubernetes can be efficiently managed using the Kubernetes Python client. By understanding and utilizing HPA along with resource metrics, developers can automate scaling based on real-time application needs. The provided examples serve as a foundation for integrating these practices within applications.

References