Site Reliability Engineer

Remote $98k–$162k middle 1 month ago full-time quality 8.5/10

Role in brief

Alpaca, a fintech company providing brokerage infrastructure, seeks a Site Reliability Engineer. This role focuses on maintaining the reliability and operability of their platform, working with cloud infrastructure, Kubernetes, and PostgreSQL. Candidates with experience in SRE, DevOps, or platform engineering, and strong skills in incident response and database management, should apply.

Incident ResponseSRELinuxPythonKubernetesCloud NetworkingPostgreSQLPerformance TuningObservabilitySecurity ComplianceCloud InfrastructureBrokerage Platform

About the role

This role involves ensuring the stability and performance of Alpaca's brokerage platform. The engineer will be responsible for day-to-day production operations, including incident response and post-mortems, as well as defining and refining reliability practices like SLIs/SLOs. A key part of the job is enhancing observability across metrics, logs, traces, and alerting to maintain a robust system.

The position requires shipping infrastructure through code using a GitOps workflow, managing both cloud resources and Kubernetes workloads. A significant focus is on PostgreSQL, involving performance tuning, schema reviews, online migrations, and ensuring high availability and disaster recovery. The engineer will also contribute to continuous data capture pipelines.

Success in this role means contributing to a reliable, observable, and operable platform as the company scales. The engineer will mentor others on reliability and database fundamentals through code and design reviews, fostering a culture of operational excellence within the team. This position is integral to supporting Alpaca's mission of expanding access to financial services globally.

The annual salary for this position ranges from $98,000 to $162,000.

Skills that matter here

  • Incident Response: The role requires managing production incidents, debugging issues under pressure, and conducting post-mortems to drive improvements.
  • Kubernetes: The engineer will operate production services on Kubernetes and manage Kubernetes workloads through infrastructure as code.
  • PostgreSQL: This position demands deep knowledge of PostgreSQL for performance tuning, schema management, online migrations, and ensuring high availability.
  • Cloud Networking: Understanding cloud networking fundamentals like VPCs, routing, and load balancing is necessary for debugging connectivity issues.
  • GitOps: The role involves shipping infrastructure as code using a GitOps workflow for both cloud resources and Kubernetes.
  • Python: Working proficiency in Python or Go is required for scripting and automation tasks.

Who this role suits

  • A person with at least four years of experience in SRE, DevOps, or platform engineering, who has owned production operations.
  • Someone who is calm and structured during incidents, capable of debugging complex systems under pressure.
  • An individual with a genuine interest in databases, particularly PostgreSQL, and a desire to grow their DBA expertise.
  • A candidate comfortable mentoring other engineers on reliability and database fundamentals.

From the employer

Your Role:

As a Site Reliability Engineer at Alpaca, you'll help keep our brokerage platform reliable, observable, and operable as we grow - working across our cloud infrastructure, Kubernetes platform, observability stack, messaging layer, and data layer.

Things You Get To Do:

  • Operate production day-to-day - oncall, incident response, postmortems, and the follow-ups that actually close the loop.
  • Own reliability practice - define and refine SLIs/SLOs and error budgets, and help product teams live within them.
  • Strengthen our observability across metrics, logs, traces, and alerting.
  • Ship infrastructure through code in a GitOps workflow - cloud resources and Kubernetes workloads alike.
  • Look after PostgreSQL: performance tuning, schema and migration review, online migrations on large tables, HA/DR, and CDC pipelines.
  • Mentor engineers on reliability and database fundamentals through code review, design review, and pairing.

Who You Are (must-haves):

  • 4+ years in SRE, DevOps, Platform/Infrastructure, or backend engineering with significant production operations ownership.
  • Hands-on experience operating production services on Kubernetes, and shipping infrastructure as code in a GitOps workflow.
  • Solid working knowledge of PostgreSQL in production — query plans, pg_stat_*, indexing and schema trade-offs, and what a safe online migration looks like on a non-trivial table.
  • Cloud networking fundamentals (VPCs, routing, L4/L7 load balancing, DNS, TLS) and comfort debugging cross-service connectivity.
  • Comfortable with a modern observability stack and proficient with Linux at the operator level.
  • Practiced in incident response - calm under pressure, structured debugging, postmortems that drive change.
  • At least working proficiency in Go or Python, plus strong written and verbal communication.
  • Genuine interest in databases and in growing your PostgreSQL/DBA expertise.

How We Take Care of You:

  • Competitive Salary & Stock Options
  • Health Benefits
  • New Hire Home-Office Setup: One-time USD $500
  • Monthly Stipend: USD $150 per month via a Brex Card.

Alpaca is proud to be an equal opportunity workplace dedicated to pursuing and hiring a diverse workforce.

Questions about this role

What is the seniority level for this role?

This is a middle-seniority position.

What is the remote work policy for this role?

This is a fully remote position.

What programming languages are required?

Working proficiency in Go or Python is required for this role.

Similar jobs

Before you apply

  • Legitimate employers never ask you to pay anything to apply or get hired.
  • Never share seed phrases or private keys. No real job needs them.
  • Do not install software ("test tasks", "trading tools", "video call clients") sent during hiring.
  • Check that the application page's domain really belongs to Alpaca.