Cloud Infrastructure Engineer

Remote $135k–$240k senior 2 months ago full-time quality 9.2/10
KubernetesTerraformAWSGCPPrometheusGrafanaOpenTelemetryHelmGitOpsArgoCDIstioClaude Code
  • Architect and operate scalable, self-healing infrastructure leveraging Kubernetes, Terraform, and cloud-native tools across multi-region deployments.
  • Drive AI enablement across engineering — ensuring repos, tooling, and workflows are optimized for agentic development with tools like Claude Code, Cursor, and Codex.
  • Build AI-powered infrastructure tooling and automation (e.g., automated K8s upgrades, IaC plan analysis, cost optimization advisors, MCP servers, n8n workflows).
  • Build and maintain internal developer platform (IDP) capabilities for self-service deployments, observability, and reliability.
  • Develop observability frameworks using Prometheus and Grafana for metrics, dashboards, and alerting.
  • Lead incident management with blameless post-mortems; define and enforce SLIs, SLOs, and error budgets across services.
  • Design and manage multi-cloud, multi-region network architecture — VPC design, IPAM, DNS (Cloudflare), cross-cloud connectivity, security groups, and edge-proxy/istio gateway configuration.
  • Collaborate with security teams to embed compliance into infrastructure, including IaC scanning and runtime protection.
  • Provide technical leadership and mentorship to elevate the team’s operational capabilities.
  • 5+ years as an Infrastructure Engineer focused on reliability (SRE, Production Engineer, Platform Engineer).
  • Experience driving company-wide reliability efforts, including SLO frameworks and error budget policies.
  • Strong proficiency with observability stacks: OpenTelemetry, Prometheus/Grafana.
  • Deep experience with cloud infrastructure (AWS/GCP), Kubernetes, and multi-region architectures.
  • Skilled with Terraform, Helm, and GitOps workflows (e.g., ArgoCD) with an automation-first mindset.
  • Experience leveraging agentic development tools (Claude Code, Cursor, Codex) and workflow automation (n8n) to accelerate IaC and build internal tooling is a strong plus.
  • Solid networking fundamentals — VPC design, DNS, IPAM, security groups, cross-cloud connectivity, and service mesh (e.g., Istio) experience is a plus.
  • Calm and effective incident responder with a focus on systemic improvement.
  • Strong cross-functional communicator across SRE, security, and product engineering.
  • Blockchain infrastructure, distributed systems, or high-throughput RPC experience — not required but a plus.
  • Medical, Dental, & Vision
  • Gym Reimbursement
  • Home Office Build-out Budget
  • In-Office Group Meals
  • Wellbeing & Mental Health Perks
  • Learning & Development Stipend
  • Company Sponsored Conferences & Events
  • HSA and FSA Plans
  • Fertility Benefits
  • Competitive compensation, including base salary as well as equity
  • Comprehensive medical, dental, and vision coverage
  • 401k and unlimited flexible time off

Similar jobs

Before you apply

  • Legitimate employers never ask you to pay anything to apply or get hired.
  • Never share seed phrases or private keys. No real job needs them.
  • Do not install software ("test tasks", "trading tools", "video call clients") sent during hiring.
  • Check that the application page's domain really belongs to Alchemy.