Senior IaaS / Kubernetes Platform Engineer

Remote $115k–$196k senior 2 months ago full-time quality 9.1/10
KubernetesIaaSCephTerraformAnsibleGitOpsNetworking

What You Will Do

  • Kubernetes Platform Engineering (Primary Focus — 40%)
  • Design, build, and operate a multi-tenant Kubernetes platform using Cluster API (CAPI) with bare-metal providers (Metal3/Sidero).
  • Implement hard multi-tenancy using vCluster (Loft Labs) or similar technology, providing isolated Kubernetes API servers per tenant.
  • Deploy and manage KubeVirt for VM orchestration within Kubernetes, including CPU pinning, NUMA awareness, and HugePages configuration.
  • Implement GitOps-driven infrastructure using ArgoCD or Flux as the single source of truth for all cluster configurations.
  • Deploy and manage Policy-as-Code using Kyverno or OPA Gatekeeper for admission control, resource quotas, and security policies.
  • Build self-service capabilities using Crossplane or similar Kubernetes-native infrastructure provisioning tools.
  • Storage Engineering (20%)
  • Operate and optimize Ceph distributed storage clusters (currently 1 PiB raw, 149 OSDs, Quincy 17.2.5).
  • Manage Rook-Ceph operator deployments at scale on modern Kubernetes (v1.28+).
  • Implement storage tiering: Ceph for bulk storage, local NVMe for high-IOPS workloads, LINSTOR/DRBD or TopoLVM for ultra-fast replicated storage.
  • Design and implement per-VM / per-tenant I/O isolation on shared Ceph clusters.
  • Manage CDI (Containerized Data Importer) for VM image lifecycle in KubeVirt environments.
  • Networking (15%)
  • Deploy and manage overlay networks for pod networking, micro-segmentation, and WireGuard/IPsec encryption.
  • Implement Cluster Mesh for multi-datacenter pod-to-pod connectivity.
  • Configure Multus CNI and SR-IOV for multi-NIC VM support in KubeVirt.
  • Work with physical network infrastructure: Juniper switches (JunOS), BGP (eBGP/iBGP), EVPN/VXLAN, VLANs.
  • Maintain IPSec site-to-site connectivity between datacenters.
  • Reliability and Operations (15%)
  • Practice SRE discipline: define and maintain SLOs with error budgets, implement proactive capacity management with 6-12 month forecasting.
  • Design and execute chaos engineering experiments to validate system resilience.
  • Participate in on-call rotation for IaaS infrastructure (OpenNebula, Ceph, networking).
  • Write and maintain runbooks, DRP documentation, and postmortem analyses.
  • Drive proactive improvement: identify reliability risks, performance bottlenecks, and toil — then propose and implement solutions without waiting for incidents.
  • Infrastructure as Code and Automation (10%)
  • Develop and maintain Terraform/OpenTofu modules for multi-cloud infrastructure provisioning.
  • Write Ansible playbooks for bare-metal server configuration and fleet management.
  • Automate infrastructure lifecycle: PXE

Requirements

  • Proven experience in Kubernetes platform engineering and IaaS.
  • Strong understanding of cloud infrastructure, networking, and storage solutions.
  • Experience with GitOps practices and tools (ArgoCD, Flux).
  • Familiarity with Ceph and distributed storage management.
  • Proficiency in Terraform and Ansible for automation and infrastructure as code.
  • Ability to work independently and collaboratively in a remote team environment.

What We Offer

  • Competitive salary ranging from $115,000 to $195,500 USD.
  • Fully remote work environment.
  • Supportive team culture focused on collaboration and success.
  • Opportunities for professional growth and development.

Similar jobs

Before you apply

  • Legitimate employers never ask you to pay anything to apply or get hired.
  • Never share seed phrases or private keys. No real job needs them.
  • Do not install software ("test tasks", "trading tools", "video call clients") sent during hiring.
  • Check that the application page's domain really belongs to CloudLinux.