Cluster Operations

Node Lifecycle and Capacity Stewardship

Scheduling pressure, disruption budgets, and drain choreography for teams that own both nodes and tenant expectations.

Duration: 3 weeks
Format: Hybrid studio days
Tuition (informational): 1,180,000 KRW

Operators learn to read scheduler decisions, justify resource profiles, and coordinate drains with workload SLOs. The syllabus spends time on mixed-arch fleets and GPU reservations without diving into vendor-specific drivers beyond what Kubernetes surfaces.

What is included

Taints, tolerations, and topology spread constraints in paired labs
PriorityClass tradeoffs with fair sharing examples
DisruptionBudget authoring against real microservice graphs
Metrics signals that precede eviction storms
Node problem detector patterns without vendor lock-in
Hands-on drain sequencing with staged traffic shifts
Written post-lab decision memo template

Outcomes

Schedule a maintenance window with defensible PDB coverage
Tune a resource profile with measurable latency guardrails
Communicate node health signals to service owners

Lead instructor

Jonas Iyer

Spent six years on regional bare-metal fleets before moving to cloud-agnostic training.

Participant notes

“The PDB lab used our own service graph template—surprisingly close to reality.”

— Eun · 4/5 · Google

Common questions

Will we cover autoscaling?

Horizontal pod autoscaling yes; cluster autoscaler only at the conceptual level unless your cohort requests a deep dive in week three office hours.

Limitations?

We do not install proprietary node agents; bring your curiosity about upstream signals only.

Prerequisite?

Comfort with kubectl and YAML manifests; prior on-call exposure helps but is not mandatory.

Refund rules live under Returns & Refunds. No payments are processed on this marketing site.

Schedule a call