Role in brief
Risk Labs seeks a Senior LLM Systems Engineer to enhance the accuracy, performance, and resilience of their oracle automation stack. This role involves improving LLM-driven components, focusing on prompt engineering, evaluation, and operational quality. Candidates with strong software engineering experience in Python or TypeScript and hands-on experience building production LLM systems should apply.
About the role
This Senior LLM Systems Engineer role at Risk Labs focuses on the core components of their oracle automation stack that rely on large language models. The primary goal is to ensure these systems are accurate, performant, resilient, and maintain high operational quality. This involves improving how models interpret prediction market rules, evidence, and oracle outcomes, directly impacting the reliability of the system's decisions.
The person in this role will take ownership of several key areas, including enhancing LLM accuracy through better prompts, model selection, and structured outputs, as well as developing robust evaluation methods. They will also work on system performance to reduce latency and cost while maintaining decision quality. A significant part of the role involves building resilience into the systems, designing solutions for validation, retries, and handling ambiguous or adversarial inputs.
Success in this position means continuously improving the system's ability to make correct decisions, ensuring efficient operation, and building robust defenses against potential failures. This involves developing comprehensive evaluation frameworks, monitoring tools, and debugging procedures to proactively identify and resolve issues. The role also extends to refining agent orchestration and tool usage across various internal and external data sources.
The salary for this role ranges from $100,000 to $200,000 USD, with payment available in stablecoins or fiat.
Skills that matter here
- Python: This role requires professional software engineering experience in Python for building and improving production systems.
- TypeScript: Professional software engineering experience in TypeScript is also a key requirement for developing production systems in this role.
- LLM APIs: The role involves hands-on experience building production systems that utilize LLMs, agents, and retrieval through LLM APIs.
- Postgres: Experience with Postgres is beneficial for managing databases within the system architecture.
- GCP: Familiarity with Google Cloud Platform (GCP) and its services like Cloud Run is helpful for infrastructure management.
- Solidity: Experience with Solidity is a nice-to-have, particularly for those familiar with DeFi protocols or crypto infrastructure.
Who this role suits
- You have a strong background in software engineering and have specifically built production systems leveraging LLMs.
- You are meticulous about system correctness and can design robust evaluation and monitoring frameworks for AI systems.
- You are adept at debugging complex issues across various technical components, from APIs to model outputs.
- You demonstrate high agency and ownership, coupled with clear written communication skills.
From the employer
What You'll Own:
- LLM Accuracy: improve prompts, model selection, tool usage, structured outputs, retrieval, and evaluation coverage so the system gets more decisions right over time.
- System Performance: reduce latency, token usage, and cost while preserving decision quality and operational reliability.
- Resilience: design validation, retries, fallbacks, uncertainty handling, and human review paths for ambiguous, adversarial, incomplete, or conflicting inputs.
- Evaluation and Monitoring: build datasets, regression tests, dashboards, traces, and review loops that make model quality visible and prevent repeated failures.
- Agent and Tooling Architecture: Improve agent orchestration and tool use across internal services, APIs, search workflows, databases, and external data sources.
- Production Operations: help debug live issues, investigate regressions, improve runbooks, and reduce repeated operator friction.
Skills & Experience
Required
- 3+ years of professional software engineering experience in Python, TypeScript, or similar production languages.
- Hands-on experience building production systems that use LLMs, agents, retrieval, structured outputs, or model-powered workflows.
- Experience designing evaluations, test datasets, regression checks, quality metrics, or manual review loops for AI systems.
- Strong debugging ability across APIs, databases, queues, logs, model outputs, and external data sources.
- Practical understanding of prompt engineering, tool calling, structured output validation, retrieval, and common LLM failure modes.
- Ability to reason carefully about correctness in uncertain or adversarial environments.
- High agency, strong ownership, and clear written communication.
Nice to Have
- Experience with oracle systems, prediction markets, DeFi protocols, or other crypto infrastructure.
- Experience with UMA, optimistic oracle mechanisms, Polymarket, or similar systems.
- Experience building agentic systems that use tools, search, browser automation, APIs, or database queries.
- Experience with LLM tracing, model monitoring, evaluation frameworks, or AI observability tools.
- Experience optimizing model cost and latency at scale.
- Experience with Postgres, data pipelines, queue-based systems, background jobs, or event-driven architectures.
- Familiarity with blockchain operational constraints, especially RPC limits, indexing, event logs, finality, and chain-specific behavior.
- Experience with GCP, Cloud Run, GitHub Actions, Terraform, or similar infrastructure.
Compensation and Benefits
- Pay packages include competitive salaries & meaningful long term equity participation.
- Salaries for this role range from $100-200k (USD).
- Will pay in stablecoins or fiat.
- Philosophies for a culture that show we care: Take vacation when you need it, family care, training and development (just to name a few).
- 100% remote, which means we encourage you to create the work environment that you thrive in.
- At least two team wide offsites a year.
Questions about this role
What is the remote work policy for this role?
This is a fully remote position, allowing candidates to create their preferred work environment.
What level of seniority is expected for this position?
This is a senior-level role, requiring significant professional experience.
What are the core technical skills required?
Candidates need professional software engineering experience in Python or TypeScript, and hands-on experience building production systems with LLMs, agents, and retrieval.