Quick Facts
- Category: Education & Careers
- Published: 2026-05-04 00:18:09
- How to Secure a Mac Mini or Mac Studio Amid Ongoing Supply Shortages
- Microsoft Rushes Out Critical Patch for ASP.NET Zero-Day Allowing Full System Takeover on Linux and macOS
- Squid and Cuttlefish Survived Mass Extinctions by Hiding in Deep-Sea Oases, New Genome Study Reveals
- Fedora Asahi Remix 44: A Comprehensive Q&A on the Latest Release
- Python 3.15 Alpha 3: Key Features and Developer Insights
Kubernetes v1.36 has advanced the ability to modify container resource requests and limits within the pod template of a suspended Job to beta status. Originally introduced as an alpha feature in v1.35, this capability empowers queue controllers and cluster administrators to fine-tune CPU, memory, GPU, and extended resource specifications on a Job while it remains suspended, whether before it starts or resumes executing. This addresses a critical need for dynamic resource allocation in batch and machine learning workflows, where optimal resource requirements often depend on real-time cluster conditions.
What is the mutable pod resources feature for suspended Jobs?
This feature allows you to update the resources.requests and resources.limits fields in the pod template of a Job that is currently suspended (i.e., spec.suspend: true). Previously, these fields were immutable once the Job was created. Now, during suspension, you can modify CPU, memory, GPU, and any extended resource values. After updating, you can resume the Job with the adjusted resource specifications. This is particularly useful for queue-based systems that need to adapt resource allocation based on current cluster capacity, priority levels, or hardware availability.
Why is this feature important for batch and machine learning workloads?
Batch and ML workloads often have resource requirements that aren't precisely known at Job creation time. Optimal allocation depends on factors like current cluster utilization, queue priorities, and the availability of specialized hardware such as GPUs. Before this feature, if a queue controller determined a suspended Job needed different resources, the only recourse was to delete and recreate the Job, losing metadata, status, and history. This feature eliminates that costly workaround. It also allows a CronJob instance to progress slowly with reduced resources under heavy load, rather than failing entirely. This flexibility improves cluster efficiency and workload success rates.
How does the mutable pod resources feature work technically?
The Kubernetes API server relaxes the immutability constraint on the pod template's resource fields specifically for Jobs that are in a suspended state. No new API types or objects were introduced; the existing batch/v1.Job and pod template structures accommodate the change through a relaxed validation rule. When a Job is suspended (spec.suspend: true), clients can patch the spec.template.spec.containers[*].resources section directly. The update is validated to ensure the new values are within acceptable ranges and that no other immutable fields are altered. Once the resources are updated, resuming the Job (by setting spec.suspend: false) triggers the creation of new Pods with the newly adjusted resource requests and limits.
Can you show a concrete example of adjusting resources on a suspended Job?
Consider an ML training Job initially requesting 4 GPUs. When this Job is created with suspend: true, a queue controller like Kueue can inspect the cluster and find only 2 GPUs available. Using the mutable pod resources feature, the controller updates the Job's pod template: it changes example-hardware-vendor.com/gpu from "4" to "2", and matching reduces CPU from "8" to "4" and memory from "32Gi" to "16Gi". After the patch, the controller sets spec.suspend: false. The Job then creates new Pods with the scaled-down resources. This entire process happens without deleting the Job, preserving its metadata, annotations, and history. The updated Job manifest reflects the new resource specifications immediately.
Does this feature affect CronJob instances that are suspended?
Yes, indirectly. A CronJob creates new Job objects according to its schedule. When a CronJob’s spec.jobTemplate contains a suspended Job template, the automatically created Jobs can benefit from this feature. For example, if the CronJob is configured to create Jobs with suspend: true, a queue controller can adjust resources on those individual suspended Jobs before resuming them. This allows the system to handle overloaded clusters gracefully: instead of failing to run a Job because resources are insufficient, the controller can reduce the resource requests so the Job runs slower but still progresses. This behavior is particularly valuable for long-running batch processes or data pipelines with flexible resource needs.
What are the benefits for queue controllers like Kueue?
Queue controllers that manage batch workloads (e.g., Kueue, Volcano) gain significant flexibility. Previously, when they determined a suspended Job needed different resources, they had to delete and recreate it—an expensive operation that lost all associated context. With mutable pod resources, controllers can simply patch the resource fields while the Job remains suspended. This preserves Job identity, status, and any external references (e.g., monitoring dashboards, alerts). It also reduces API server load by avoiding delete/create cycles. For large clusters managing thousands of batch Jobs, this improvement can substantially reduce overhead and improve response times when adjusting to changing resource availability.
Are there any limitations or considerations to keep in mind?
While powerful, the feature has a few caveats. First, modifications are only allowed while the Job is suspended—once resumed (suspend: false), the pod template resources become immutable again. Second, the change applies to the entire pod template, so all containers in the pod are affected; there is no per-container granularity beyond what the resource fields already offer. Third, the feature does not allow changing other immutable fields (e.g., container image, command, args) simultaneously—only resources. Administrators should also ensure that updated resource requests do not exceed cluster capacity or violate any ResourceQuota constraints. Finally, because this is still beta in v1.36, it must be enabled via the appropriate feature gate (JobMutablePodResources) if running an older cluster version.