Senior DevOps/Site Reliability Engineer
Remote
$160k–$218k
senior
2 months ago
full-time
quality 8.4/10
- Identify, propose and execute improvements to performance and scalability bottlenecks across our multi-cluster EKS environment on AWS.
- Measure systems health, scalability and performance metrics and identify areas of improvement.
- Deploy services and troubleshoot production issues day-to-day, using code to solve broad operational challenges within the Limit Break Infrastructure and Platform.
- Work with the wider engineering team to identify how we can provide the most production-like environment for running both manual and automated testing.
- Define SLOs, SLIs, monitoring, alerting and incident response practices — and continuously improve our observability stack (Grafana, Thanos, Loki) to be ready for worldwide scale.
- 5+ years experience in SRE, DevOps or Systems engineering.
- Strong background in Kubernetes, including operating multiple EKS clusters in production.
- Extensive experience in Terraform and Ansible.
- CI/CD and automation experience with tools such as GitHub Actions, Jenkins, or GitLab CI.
- Solid background in AWS, including experience with Aurora, RDS (MySQL/SQL), and networking.
- Ability to participate in an on-call rotation.
- Effective communication skills to clearly explain your reasoning and thought process.
- Excellent collaboration skills to work closely with product engineers and product owners.
- Implementation of in-house monitoring and observability infrastructure (e.g. Grafana, Thanos, Loki, or equivalents).
- Implementation of ElasticSearch stack or equivalent solutions for capturing logs from all environments.
- Experience with CloudFlare, CDN technologies, and edge/perimeter networking.
- Exposure to cloud security and perimeter tooling such as Wiz (or equivalent CSPM/vulnerability detection), AWS GuardDuty, CloudFlare Zero Trust, and secrets management platforms.
- Experience addressing vulnerabilities — comfortable finding issues, digging deep to root cause, and driving remediation.
- Implement various tools to monitor and protect the environment in real-time.
Similar jobs
Senior Site Reliability Engineer
Manychat · Remote
$88k–$130k
5 days ago
View →
(Senior) DevOps Engineer (f/m/d)
adjoe · Remote
$90k–$135k
13 days ago
View →
Senior DevOps Engineer
TradingView · Remote
$80k–$130k
13 days ago
View →
Senior Site Reliability Engineer, Workforce Identity
Coinbase · Remote
$112k–$188k
23 days ago
View →
Senior Site Reliability Engineer, Core AI Infrastructure
Coinbase · Remote
$112k–$188k
23 days ago
View →
Senior DevOps Engineer
Incode · Remote
$115k–$196k
23 days ago
View →