failed to garbage collect required amount of images. Wanted to free 473842483 bytes, but freed 0 bytes

Issue Summary: Kubernetes Garbage Collection Failing to Free Disk Space

Issue Details

  • Title: Failed to garbage collect required amount of images
  • Reported By: samuela
  • Date: December 8, 2018
  • Kubernetes Version: 1.10.7 (Client and Server)
  • Environment: GKE; nodes using Container-Optimized OS; minimum disk size set to 10 GB

Description

The kubelet on GKE nodes has repeatedly failed to free necessary disk space through image garbage collection (GC). The kubelet logs indicate a persistent issue, attempting to free millions of bytes, but failing to reclaim any space. The specific error messages include:

  • “failed to garbage collect required amount of images. Wanted to free X bytes, but freed 0 bytes.”

This has resulted in the eviction of several pods due to disk pressure, as observed in the eviction warnings and kubelet events.

Expected Behavior

  • Image GC should successfully reclaim disk space, or the system should prevent scheduling new pods on nodes with insufficient disk space.

Reproduction Steps

  1. Deploy and delete multiple pods on a node to observe disk pressure.
  2. Monitor the kubelet’s garbage collection attempts and related events.

Additional Context

  • The issue seems reproducible on other environments, with users on GKE, AWS, and AKS reporting similar garbage collection failures and disk pressure incidents.
  • Cordon and draining nodes temporarily relieves pressure, but the root cause remains unresolved.
  • The problem may relate to improper configuration or thresholds in GC, potential bugs in the kubelet’s image management logic, or the size limitations encountered with underpowered node disks.

Recent Activity

  • Users are encouraged to investigate kubelet logs for insights.
  • Contributors have been assigned and discussions continue regarding root causes and possible solutions.

Labels

  • kind/bug
  • sig/node
  • good first issue
  • help wanted
  • triage/accepted

This issue remains active, with ongoing efforts to investigate and resolve the GC inefficiencies on GKE. Advanced developers are welcome to contribute to finding a solution, considering the overall impact on Kubernetes reliability and performance.