Cluster Operations
Node Lifecycle and Capacity Stewardship
Scheduling pressure, disruption budgets, and drain choreography for teams that own both nodes and tenant expectations.
- Duration
- 3 weeks
- Format
- Hybrid studio days
- Tuition (informational)
- 1,180,000 KRW
Operators learn to read scheduler decisions, justify resource profiles, and coordinate drains with workload SLOs. The syllabus spends time on mixed-arch fleets and GPU reservations without diving into vendor-specific drivers beyond what Kubernetes surfaces.
What is included
- Taints, tolerations, and topology spread constraints in paired labs
- PriorityClass tradeoffs with fair sharing examples
- DisruptionBudget authoring against real microservice graphs
- Metrics signals that precede eviction storms
- Node problem detector patterns without vendor lock-in
- Hands-on drain sequencing with staged traffic shifts
- Written post-lab decision memo template
Outcomes
- Schedule a maintenance window with defensible PDB coverage
- Tune a resource profile with measurable latency guardrails
- Communicate node health signals to service owners
Lead instructor
Jonas Iyer
Spent six years on regional bare-metal fleets before moving to cloud-agnostic training.
Participant notes
-
“The PDB lab used our own service graph template—surprisingly close to reality.”
— Eun · 4/5 · Google
Common questions
Will we cover autoscaling?
Limitations?
Prerequisite?
Refund rules live under Returns & Refunds. No payments are processed on this marketing site.
Schedule a call