Role in brief
Alpaca, a financial services company, seeks a Senior AI Platform Engineer to build and maintain AI infrastructure. This role involves designing execution environments, creating reusable platform services, and establishing AI governance. Candidates with experience in agentic AI systems, cloud-native technologies, and Google Cloud Platform should apply.
About the role
This role focuses on developing and maintaining the core infrastructure that supports AI workflows across Alpaca. The engineer will be responsible for creating the connector and service integration layers, and for designing secure execution environments for agentic AI systems. This includes implementing isolation boundaries and access controls to ensure safe and scalable AI usage.
A key aspect of this position involves building reusable platform services, self-service templates, and "golden paths" to streamline AI development for various teams. The goal is to productize the onboarding process for both technical and non-technical users, reducing the need for manual intervention and ensuring reliable setup. This work aims to reduce reliance on individual efforts by establishing repeatable processes for handling exceptions.
The successful candidate will also define and enforce technical standards for AI agent execution, evaluation, and deployment. This includes partnering with security and IT teams to implement deployable patterns for high-risk AI capabilities. The role involves owning the AI governance layer, which covers access controls, audit trails, and deployment boundaries for agentic workflows, while also setting standards for reliability and observability of AI-specific infrastructure.
The salary for this position ranges from $105,000 to $175,000 USD annually.
Skills that matter here
- Infrastructure as Code: This skill is used to manage and provision the underlying infrastructure for AI systems.
- Google Cloud Platform: Direct experience with GCP is required as Alpaca's operations are based on this cloud platform.
- Kubernetes: This skill is used for orchestrating containers and managing cloud-native systems within the AI infrastructure.
- APIs: Strong experience with APIs is necessary for building the connector and service integration layers that power AI workflows.
- AI Governance: This skill is crucial for implementing access controls, audit trails, and approval criteria for agentic workflows.
- Agentic AI systems: Hands-on experience with building agentic AI systems, such as LLM-powered workflows and tool-calling agents, is a core requirement for this role.
Who this role suits
- A person who has spent at least eight years in software, platform, or infrastructure engineering roles.
- Someone with direct, hands-on experience building agentic AI systems, not just classical machine learning or data pipelines.
- An individual comfortable navigating both fast-paced, low-process environments and more structured, compliance-aware settings.
- A candidate who prioritizes simplification, standardization, and operational reliability over complex, one-off solutions.
From the employer
Your Role
- Own the connector and service integration layer that powers AI workflows across the company.
- Design and ship execution environments for agents and higher-autonomy AI workflows, including isolation boundaries and access controls.
- Build reusable platform services, golden paths, and self-service templates that reduce setup friction for teams building on AI.
- Productize onboarding so it works reliably for both developers and non-developers without depending on manual intervention or tribal knowledge.
- Define and enforce technical standards for agent execution, evaluation loops, and deployment.
- Partner with Security and IT to ship deployable patterns for higher-risk AI capabilities.
- Own the AI governance layer: access controls, audit trails, approval criteria, and deployment boundaries for agentic workflows.
- Set the reliability, observability, and operational bar for AI-specific infrastructure.
- Act as the technical escalation point when onboarding or platform issues block rollout.
- Reduce the company's dependence on individual heroics by turning exception handling into repeatable paths.
Who You Are (Must-Haves)
- 8+ years in software, platform, infrastructure, or adjacent engineering roles.
- Hands-on experience building agentic AI systems: LLM-powered workflows, tool-calling agents, evaluation loops, or autonomous execution — using frameworks like the Claude SDK, Google Agent Development Kit (ADK), LangGraph, or similar. Not classical ML or data pipelines.
- Direct experience with GCP. We run on Google Cloud and you should be comfortable there.
- Strong experience with APIs, auth, OAuth, secrets, CLI tooling, and deployment patterns.
- Cloud-native systems experience with containers, orchestration (Kubernetes), and infrastructure-as-code.
- Experience implementing AI governance controls: access boundaries, audit logging, approval workflows, and safe deployment standards for higher-autonomy systems.
- Comfortable operating in both fast-moving, low-process environments and more structured, compliance-aware ones. You know when to move fast and when to slow down.
- Strong bias toward simplification, standardization, and operational reliability over clever one-off solutions.
- Excellent communication skills with the ability to work across engineering, security, and non-technical stakeholders.
How We Take Care of You:
- Competitive Salary & Stock Options
- Health Benefits
- New Hire Home-Office Setup: One-time USD $500
- Monthly Stipend: USD $150 per month via a Brex Card.
Questions about this role
What is the remote work policy for this role?
This is a fully remote position.
What level of seniority is expected for this position?
This is a senior-level role, indicating a need for substantial experience and expertise.
What are the key technical skills required for this role?
Key technical skills include Infrastructure as Code, Google Cloud Platform, Kubernetes, APIs, Security standards, AI Governance, Agentic AI systems, and LLM workflows.