Role in brief
Coinbase seeks a Senior Software Engineer for their Compute Platform team to design and operate critical Kubernetes infrastructure. This role involves building automation and developer tooling, enhancing platform reliability, and integrating AI-driven processes. Candidates with significant Kubernetes, AWS/GCP, and CNCF ecosystem experience, who can drive operational excellence, should apply.
About the role
This role focuses on the core compute platform, specifically Kubernetes, within Coinbase. The engineer will be responsible for the entire lifecycle of Kubernetes cluster management tooling and automation, from design to operation, ensuring the platform is reliable and self-healing at scale. A key aspect involves improving how other engineers interact with Kubernetes through new tooling and workflows, with an emphasis on integrating AI to boost productivity.
The position also involves delivering new compute capabilities for service owners, such as job scheduling, deployment strategies, and automated resource right-sizing. A significant part of the work is dedicated to operational excellence, which includes automating routine tasks, reducing the burden of on-call duties, and continuously improving platform observability and incident response mechanisms. This ensures the infrastructure meets high standards for performance and security.
Success in this role means contributing to a highly reliable and efficient compute platform that supports Coinbase's mission. It requires partnering with security, reliability, and observability teams to maintain top-tier standards. The ideal candidate will leverage their experience with distributed infrastructure systems to diagnose complex issues, drive root-cause resolutions, and apply AI tools to enhance infrastructure workflows and developer productivity.
The annual base salary for this position ranges from $186,065 to $218,900 USD, excluding equity and bonus.
Skills that matter here
- Kubernetes: This role involves designing, building, and operating Kubernetes cluster management tooling and automation, and improving engineer interaction with Kubernetes.
- AWS: Experience with AWS infrastructure services in a production environment is required for managing the compute platform.
- GCP: Experience with GCP infrastructure services in a production environment is required for managing the compute platform.
- CNCF: Hands-on experience with the CNCF ecosystem, including tools like Helm, Prometheus, ArgoCD, and Envoy, is necessary for solving infrastructure problems.
- Prometheus: Experience with Prometheus, as part of the CNCF ecosystem, is required to apply these tools to infrastructure challenges.
- ArgoCD: Experience with ArgoCD, as part of the CNCF ecosystem, is required to apply these tools to infrastructure challenges.
Who this role suits
- You have a background in building and operating large-scale distributed infrastructure systems.
- You are driven to automate processes and reduce operational toil, always seeking efficiency.
- You are adept at diagnosing complex system failures and pursuing root-cause resolutions.
- You are comfortable integrating new technologies, including AI, to improve engineering workflows and productivity.
From the employer
- Own the design, build, and operation of Kubernetes cluster management tooling and automation that keeps our compute platform reliable and self-healing at scale.
- Build developer-facing tooling and workflows that improve how engineers across Coinbase interact with Kubernetes, with a heavy emphasis on integrating AI-driven processes and support.
- Deliver net-new compute capabilities for service owners, including one-off jobs, cron scheduling, deployment strategies, EFS support, and automated right-sizing.
- Drive operational excellence by automating toil, reducing on-call burden, and continuously improving platform observability and incident response.
- Partner with Security, Reliability, and Observability teams to ensure the compute platform meets Coinbase's standards for security, uptime, and performance.
- 5+ years of software engineering experience, including 3+ years building and operating Kubernetes or similar compute orchestration systems (e.g., Mesos, Nomad, ECS).
- Hands-on experience with AWS and/or GCP infrastructure services (e.g., EC2, EKS, IAM, VPC, networking) in a production environment at scale.
- Demonstrated ability to design, implement, and operate distributed infrastructure systems, including diagnosing complex failures and driving them to root-cause resolution.
- Hands-on experience with the CNCF ecosystem (e.g., Helm, Prometheus, ArgoCD, Envoy) and a track record of applying these tools to solve real infrastructure problems.
- Proven ability to apply AI tooling to infrastructure workflows, improving automation, developer productivity, or operational efficiency.
- Utilizes generative AI responsibly, maintaining human oversight to deliver business-ready outputs and drive measurable improvements in workflow efficiency, cost, and quality.
- Base salary varies by location (see range below). Total compensation may also include equity and bonus eligibility, and benefits (medical, dental, vision, 401(k)).
- Annual base salary range (excluding equity and bonus): $186,065—$218,900 USD.
Questions about this role
What is the remote work policy for this role?
This is a remote-first position, but not remote-only. Quarterly in-person working sessions called “surges” are expected.
What level of seniority is expected for this position?
This is a senior-level software engineering role, requiring at least 5 years of software engineering experience, with 3+ years specifically in compute orchestration systems like Kubernetes.
What are the core technical skills required?
Key technical skills include Kubernetes, AWS and/or GCP infrastructure services, and hands-on experience with the CNCF ecosystem, including tools like Helm, Prometheus, ArgoCD, and Envoy.