Senior DevOps/Site Reliability Engineer

Remote $160k–$218k senior 2 months ago full-time quality 8.4/10
KubernetesAWSTerraformAnsibleCI/CDGitHub ActionsJenkinsGitLab CIAuroraRDSElasticSearchGrafana
  • Identify, propose and execute improvements to performance and scalability bottlenecks across our multi-cluster EKS environment on AWS.
  • Measure systems health, scalability and performance metrics and identify areas of improvement.
  • Deploy services and troubleshoot production issues day-to-day, using code to solve broad operational challenges within the Limit Break Infrastructure and Platform.
  • Work with the wider engineering team to identify how we can provide the most production-like environment for running both manual and automated testing.
  • Define SLOs, SLIs, monitoring, alerting and incident response practices — and continuously improve our observability stack (Grafana, Thanos, Loki) to be ready for worldwide scale.
  • 5+ years experience in SRE, DevOps or Systems engineering.
  • Strong background in Kubernetes, including operating multiple EKS clusters in production.
  • Extensive experience in Terraform and Ansible.
  • CI/CD and automation experience with tools such as GitHub Actions, Jenkins, or GitLab CI.
  • Solid background in AWS, including experience with Aurora, RDS (MySQL/SQL), and networking.
  • Ability to participate in an on-call rotation.
  • Effective communication skills to clearly explain your reasoning and thought process.
  • Excellent collaboration skills to work closely with product engineers and product owners.
  • Implementation of in-house monitoring and observability infrastructure (e.g. Grafana, Thanos, Loki, or equivalents).
  • Implementation of ElasticSearch stack or equivalent solutions for capturing logs from all environments.
  • Experience with CloudFlare, CDN technologies, and edge/perimeter networking.
  • Exposure to cloud security and perimeter tooling such as Wiz (or equivalent CSPM/vulnerability detection), AWS GuardDuty, CloudFlare Zero Trust, and secrets management platforms.
  • Experience addressing vulnerabilities — comfortable finding issues, digging deep to root cause, and driving remediation.
  • Implement various tools to monitor and protect the environment in real-time.

Similar jobs

Before you apply

  • Legitimate employers never ask you to pay anything to apply or get hired.
  • Never share seed phrases or private keys. No real job needs them.
  • Do not install software ("test tasks", "trading tools", "video call clients") sent during hiring.
  • Check that the application page's domain really belongs to Limit Break.