Senior Site Reliability Engineer

Remote $82k–$112k senior English B2 2 months ago full-time quality 8.6/10

KubernetesArgoCDTerraformCrossplaneGrafanaGoPythonLinuxAI tools

Design and implement infrastructure and tools that empower our product teams to rapidly and securely iterate, emphasizing reliability and automation.
Influence the strategic direction of our infrastructure and operational practices, ensuring that we are well-positioned to scale and support our growing organization.
Take a proactive role in the resolution of production issues, ensuring that we are well-prepared to handle incidents and that we learn from them in a blameless manner.
Work closely with product teams on crucial initiatives such as production deployments, release management, and incident handling, aiming for seamless operations.
Offer technical expertise and input to support the continual adoption and modernization of our platform and infrastructure.
Build and deploy AI-powered tooling (autonomous coding agents, LLM-assisted CI/CD, automated incident triage) that makes the engineering org more productive. Think: sandboxed environments where agents can write, test, and verify code without human babysitting.
Foster a culture of continuous learning and improvement, encouraging constructive review and adaptation processes.

Kubernetes expertise, with a strong understanding of its core concepts and the ability to manage and maintain clusters.
Expertise within modern cloud native tools, e.g. ArgoCD for GitOps, Terraform/Crossplane for IaC, and the Grafana LGTM stack (Loki, Grafana, Tempo, Mimir) for observability.
3-5 years of experience in using Infrastructure as Code and tools for cloud provisioning - Must
3-5 years of practice in development and scripting in languages like Go, Python, or similar - Must
Proficient in both written and spoken English, with exceptional communication abilities.
Expertise when it comes to Linux environments, containerization, and cloud technologies.
Comprehensive knowledge of production management concepts for distributed systems.
A history of 3-5 years in operational roles, overseeing production settings.
AI fluency. You use AI coding tools daily and have opinions about what works. More importantly, you can build and deploy LLM-powered developer tooling and autonomous agents, not just consume them. We want someone who thinks about how to make an entire engineering team more productive with AI.
Networking knowledge: bonus points for service mesh experience, platform engineering and cross-cloud networking.
Familiarity with the Ethereum ecosystem, staking, and blockchain technologies - Advantage.

Similar jobs