Senior Site Reliability Engineer

Remote $88k–$130k senior 5 days ago full-time quality 8.7/10

Role in brief

Manychat is hiring a Senior Site Reliability Engineer to manage and enhance their cloud infrastructure, focusing on AWS and Kubernetes. This role involves maintaining platform reliability, automating infrastructure, and improving CI/CD pipelines. It suits an experienced SRE professional with strong Linux, Kubernetes, and AWS skills who can partner with development teams to ensure system stability.

LinuxAWSKubernetesTerraformAnsiblePythonGitHub ActionsPrometheusGrafana

About the role

As a Senior Site Reliability Engineer at Manychat, you will be responsible for the stability and performance of the company's cloud infrastructure. This involves maintaining AWS services like EC2, ALB/NLB, and IAM, as well as evolving their EKS clusters that power Python-based AI services. A key part of the role is migrating existing services to Kubernetes and codifying infrastructure using Terraform and Ansible.

The role also includes building and improving CI/CD pipelines with GitHub Actions and leading observability efforts using Prometheus and Grafana. You will support essential infrastructure hygiene tasks such as OS-level patching and WAF rules. Success in this position means partnering closely with other engineers to promote best practices and ensure the overall reliability of the platform.

Manychat helps creators with automations across platforms like Instagram, Messenger, WhatsApp, and TikTok, aiming to improve engagement and sales. The company has a diverse team of over 350 people, serving more than 1.5 million customers globally. This role offers the opportunity to contribute to a widely used platform within a distributed team environment.

The salary for this position ranges from $88,000 to $130,000 USD.

Skills that matter here

  • Linux: This role requires over five years of experience managing Linux in a production environment, specifically Ubuntu and Amazon Linux.
  • AWS: You will maintain and harden various AWS infrastructure components, including EC2, ALB/NLB, WAF, IAM, and CloudWatch.
  • Kubernetes: The position involves operating and evolving EKS clusters, migrating services to Kubernetes, and debugging Python workloads within containers.
  • Terraform: You will use Terraform to codify infrastructure and manage the migration of services to Kubernetes.
  • Ansible: This role uses Ansible for host-level automation and requires strong skills beyond basic usage.
  • Python: You should be comfortable running and debugging Python workloads within containerized environments.

Who this role suits

  • Someone with at least five years of experience managing Linux in a production setting.
  • An individual who excels at explaining complex infrastructure concepts clearly to developers.
  • A professional who is proactive in codifying infrastructure and improving CI/CD processes.
  • A person who understands networking, IAM, and cloud security best practices.

From the employer

What You’ll Do

  • Maintain and harden AWS infrastructure (EC2, ALB/NLB, WAF, IAM, CloudWatch)
  • Operate and evolve our EKS clusters powering Python-based AI services
  • Migrate existing services to Kubernetes using Terraform and Helm
  • Codify infrastructure with Terraform and manage host-level automation via Ansible
  • Build and improve CI/CD pipelines with GitHub Actions
  • Own observability efforts: Prometheus, Grafana, alerting, and on-call readiness
  • Support OS-level patching, certs, WAF rules, and general infra hygiene
  • Partner with engineers to guide best practices and drive platform reliability
  • Create clean, maintainable infrastructure documentation and playbooks
  • Occasionally support rare off-hours incidents (don’t worry, really rare)

To Shine in This Role

  • 5+ years of experience managing Linux in production (Ubuntu, Amazon Linux)
  • Strong experience with Kubernetes (ideally EKS), Helm, and Terraform
  • Comfort with running and debugging Python workloads in containers
  • Solid understanding of networking, IAM, and cloud security best practices
  • Hands-on Nginx experience (Ingress and reverse proxy setups)
  • Excellent communication skills; you can explain complex infra to devs clearly

Nice to Have Skills

  • Strong Ansible skills beyond the basics
  • PostgreSQL or Amazon RDS tuning and operations experience
  • Deep understanding of observability tools (Prometheus, Grafana, Loki, etc.)
  • Familiarity with PHP production environments
  • Experience with TDD, CI/CD best practices, and agile development
  • Any previous SRE-like exposure such as building resilience, automation, or incident tooling

What We Offer

  • Hybrid onboarding to start work remotely and relocation support for you and your family.
  • Comprehensive health insurance for both you and your family.
  • Professional development budget for conference tickets, online courses, and other relevant resources to help you grow.
  • Flexible benefits package to tailor perks that matter most for you.
  • Hybrid work and generous leave options to prioritize your work-life balance.
  • In-office perks, including free meals and snacks.
  • Company-funded sport activities, annual offsites and team-building events.

Questions about this role

What is the remote work policy for this role?

This is a fully remote position, though there is an option for hybrid onboarding and relocation support.

What level of seniority is this position?

This is a senior-level role, requiring significant experience in site reliability engineering.

What are the key technical skills required for this job?

Key technical skills include strong experience with Linux, AWS, Kubernetes, Terraform, Ansible, Python, GitHub Actions, Prometheus, and Grafana.

Similar jobs

Before you apply

  • Legitimate employers never ask you to pay anything to apply or get hired.
  • Never share seed phrases or private keys. No real job needs them.
  • Do not install software ("test tasks", "trading tools", "video call clients") sent during hiring.
  • Check that the application page's domain really belongs to Manychat.