This documentation outlines the procedures and code examples for scaling applications in production using the Kubernetes Python client. The focus is on configuring the Horizontal Pod Autoscaler (HPA) and managing resource metrics for effective scaling.
Prerequisites
Ensure that you have the Kubernetes Python client installed. You can include the client using a Dockerfile as shown below:
FROM nbgallery/jupyter-alpine:latest
RUN pip install git+https://github.com/kubernetes-client/python.git
ENTRYPOINT ["/sbin/tini", "--"]
CMD ["jupyter", "notebook", "--ip=0.0.0.0"]
Step 1: Defining the Horizontal Pod Autoscaler (HPA)
To initiate scaling based on resource metrics, you need to create an HPA object. This allows Kubernetes to automatically scale the number of pods based on observed CPU usage or other select metrics.
HPA Configuration Example
The following is an example of how to define an HPA using the Python client:
from kubernetes import client, config
# Load the Kubernetes configuration
config.load_kube_config()
# Create an instance of the API class
api_instance = client.V2beta2HorizontalPodAutoscalerApi()
# Define the HPA settings
hpa = client.V2beta2HorizontalPodAutoscaler(
api_version="autoscaling/v2beta2",
kind="HorizontalPodAutoscaler",
metadata=client.V1ObjectMeta(name="example-hpa"),
spec=client.V2beta2HorizontalPodAutoscalerSpec(
scale_target_ref=client.V2beta2CrossVersionObjectReference(
api_version="apps/v1",
kind="Deployment",
name="example-deployment"
),
min_replicas=1,
max_replicas=10,
metrics=[client.V2beta2MetricSpec(
type="Resource",
resource=client.V2beta2ResourceMetricSource(
name="cpu",
target=client.V2beta2MetricTarget(
type="Utilization",
average_utilization=50 # Target 50% CPU utilization
)
)
)]
)
)
# Create the HPA in the Kubernetes cluster
api_instance.create_namespaced_horizontal_pod_autoscaler(
namespace="default",
body=hpa
)
In this example, the HPA configuration is set to manage the example-deployment
by adjusting the number of pods between 1 and 10 based on CPU utilization metrics.
Step 2: Monitoring Metrics for Scaling
Kubernetes uses metrics defined in its API to determine when to scale pods. The most common metrics used are CPU and memory, which you can set using the ResourceMetricSource
as shown earlier.
Metrics Specification
The following code illustrates how to utilize the ContainerResourceMetricSource
and PodsMetricSource
for scaling based on pod-specific metrics.
# Example for using Container Resource Metric
resource_metric = client.V2beta2ContainerResourceMetricSource(
name="memory",
target=client.V2beta2MetricTarget(
type="Utilization",
average_utilization=80 # Target 80% memory utilization
)
)
# Define a metric for pod-specific metrics
pods_metric = client.V2beta2PodsMetricSource(
metric=client.V2beta2MetricIdentifier(
name="transactions-processed",
),
target=client.V2beta2MetricTarget(
type="AverageValue",
average_value="100"
)
)
# Example of creating HPA with custom metrics
hpa_with_custom_metrics = client.V2beta2HorizontalPodAutoscaler(
api_version="autoscaling/v2beta2",
kind="HorizontalPodAutoscaler",
metadata=client.V1ObjectMeta(name="example-custom-hpa"),
spec=client.V2beta2HorizontalPodAutoscalerSpec(
scale_target_ref=client.V2beta2CrossVersionObjectReference(
api_version="apps/v1",
kind="Deployment",
name="example-deployment"
),
min_replicas=1,
max_replicas=10,
metrics=[
resource_metric,
pods_metric,
]
)
)
# Create the custom HPA
api_instance.create_namespaced_horizontal_pod_autoscaler(
namespace="default",
body=hpa_with_custom_metrics
)
Step 3: Managing HPA Behavior
Kubernetes allows further customization of scaling behavior according to specific needs. The behavior settings can dictate how aggressive or conservative the scaling should be.
Customized HPA Behavior Example
You can define behaviors for scaling up and down as follows:
behavior = client.V2beta2HorizontalPodAutoscalerBehavior(
scale_up=client.V2beta2HPAScalingRules(
stabilization_window_seconds=30,
policies=[
client.V2beta2HPAScalingPolicy(
type="Pods",
value=2,
period_seconds=60
)
]
),
scale_down=client.V2beta2HPAScalingRules(
stabilization_window_seconds=30,
policies=[
client.V2beta2HPAScalingPolicy(
type="Pods",
value=1,
period_seconds=60
)
]
)
)
# Adding behavior to HPA
hpa_with_behavior = client.V2beta2HorizontalPodAutoscaler(
api_version="autoscaling/v2beta2",
kind="HorizontalPodAutoscaler",
metadata=client.V1ObjectMeta(name="example-hpa-with-behavior"),
spec=client.V2beta2HorizontalPodAutoscalerSpec(
scale_target_ref=client.V2beta2CrossVersionObjectReference(
api_version="apps/v1",
kind="Deployment",
name="example-deployment"
),
min_replicas=1,
max_replicas=10,
behavior=behavior,
)
)
# Create the HPA with customized behavior
api_instance.create_namespaced_horizontal_pod_autoscaler(
namespace="default",
body=hpa_with_behavior
)
Conclusion
Scaling applications in production with Kubernetes can be efficiently managed using the Kubernetes Python client. By understanding and utilizing HPA along with resource metrics, developers can automate scaling based on real-time application needs. The provided examples serve as a foundation for integrating these practices within applications.
References
- Resource Metrics in Kubernetes: V2ResourceMetricSource
- HPA Specification: V1ScaleSpec
- HPA Behavior configuration: V2HorizontalPodAutoscalerBehavior