Remote
$135k–$240k
senior
2 months ago
full-time
quality 9.2/10
- Architect and operate scalable, self-healing infrastructure leveraging Kubernetes, Terraform, and cloud-native tools across multi-region deployments.
- Drive AI enablement across engineering — ensuring repos, tooling, and workflows are optimized for agentic development with tools like Claude Code, Cursor, and Codex.
- Build AI-powered infrastructure tooling and automation (e.g., automated K8s upgrades, IaC plan analysis, cost optimization advisors, MCP servers, n8n workflows).
- Build and maintain internal developer platform (IDP) capabilities for self-service deployments, observability, and reliability.
- Develop observability frameworks using Prometheus and Grafana for metrics, dashboards, and alerting.
- Lead incident management with blameless post-mortems; define and enforce SLIs, SLOs, and error budgets across services.
- Design and manage multi-cloud, multi-region network architecture — VPC design, IPAM, DNS (Cloudflare), cross-cloud connectivity, security groups, and edge-proxy/istio gateway configuration.
- Collaborate with security teams to embed compliance into infrastructure, including IaC scanning and runtime protection.
- Provide technical leadership and mentorship to elevate the team’s operational capabilities.
- 5+ years as an Infrastructure Engineer focused on reliability (SRE, Production Engineer, Platform Engineer).
- Experience driving company-wide reliability efforts, including SLO frameworks and error budget policies.
- Strong proficiency with observability stacks: OpenTelemetry, Prometheus/Grafana.
- Deep experience with cloud infrastructure (AWS/GCP), Kubernetes, and multi-region architectures.
- Skilled with Terraform, Helm, and GitOps workflows (e.g., ArgoCD) with an automation-first mindset.
- Experience leveraging agentic development tools (Claude Code, Cursor, Codex) and workflow automation (n8n) to accelerate IaC and build internal tooling is a strong plus.
- Solid networking fundamentals — VPC design, DNS, IPAM, security groups, cross-cloud connectivity, and service mesh (e.g., Istio) experience is a plus.
- Calm and effective incident responder with a focus on systemic improvement.
- Strong cross-functional communicator across SRE, security, and product engineering.
- Blockchain infrastructure, distributed systems, or high-throughput RPC experience — not required but a plus.
- Medical, Dental, & Vision
- Gym Reimbursement
- Home Office Build-out Budget
- In-Office Group Meals
- Wellbeing & Mental Health Perks
- Learning & Development Stipend
- Company Sponsored Conferences & Events
- HSA and FSA Plans
- Fertility Benefits
- Competitive compensation, including base salary as well as equity
- Comprehensive medical, dental, and vision coverage
- 401k and unlimited flexible time off
Similar jobs
Cloud Infrastructure Engineer
Alchemy · Remote
$135k–$240k
2 months ago
View →
Staff Development Experience Engineer
Galaxydigitalservices · Remote
$112k–$188k
9 days ago
View →
Senior Developer Experience Engineer
Galaxydigitalservices · Remote
$98k–$162k
9 days ago
View →
Senior Infrastructure Engineer
Cointracker · Remote
$166k–$195k
14 days ago
View →
Staff Software Engineer, EAA CX
Coinbase · Remote
$218k–$257k
24 days ago
View →
Senior Cloud Engineer
Dvtrading · Remote
$210k–$260k
27 days ago
View →