Role in brief
IEX, a financial technology company, seeks a Senior Systems Reliability Engineer to manage and automate the technical operations of its trading platform. This role involves building monitoring tools, troubleshooting system issues, and guiding other teams on reliability best practices. Candidates with strong Linux, Python, and automation experience in distributed systems environments should apply.
About the role
This role focuses on ensuring the smooth and reliable operation of a trading platform. Key responsibilities include participating in the engineering process from design to management, creating tools for monitoring and automation, and actively engaging in change management and learning reviews. The position requires troubleshooting issues across various layers, including hardware, software, applications, and networks, to maintain system integrity and performance.
A core aspect of this position is translating customer needs and product usage into specific operational reliability targets. The Senior Systems Reliability Engineer will also educate and mentor other teams within the company on best practices for operational reliability, application functionality, and system performance. This involves guiding development, market operations, business development, and subscriber teams.
Success in this role means consistently optimizing processes and maintaining high availability for the trading platform. This involves proactive monitoring, efficient incident management, and continuous improvement of system configurations and policies. The engineer will contribute to a robust and transparent capital markets environment by ensuring the underlying technology is stable and performant.
The annual base salary for this position ranges from $150,000 to $225,000, with eligibility for bonus and equity.
Skills that matter here
- Linux: The role requires hands-on experience with Linux for managing and troubleshooting the trading platform's infrastructure.
- python: Python is used for coding automation scripts and building tools to enhance system reliability.
- bash: Bash scripting is essential for automating tasks and managing system operations efficiently.
- git: Git is used for version control of code and configurations related to system automation and tools.
- Ansible: Ansible or similar configuration management tools are critical for automating infrastructure provisioning and management.
- Java: Familiarity with Java applications may be necessary for troubleshooting across the full technology stack.
Who this role suits
- A candidate who thrives on optimizing complex, distributed systems and enjoys building tools to enhance operational efficiency.
- Someone who is proactive in identifying potential issues and skilled in troubleshooting across various technology layers.
- An individual who enjoys mentoring others and guiding different teams on best practices for system reliability.
- A person with a strong background in host-side networking and packet-level analysis.
From the employer
What you will do:
- Responsible for the technical operations of our trading platform
- Participate in the engineering process as we design, build, and manage our systems.
- Build tools to monitor and automate processes.
- Be a core contributor in our change management and learning review processes
- Troubleshoot issues across the whole stack - hardware, software, application, and network.
- Document current and future configuration processes and policies.
- Translate customer needs and projected product utilization to operational reliability targets.
- Guides other functions (e.g., development, market operations, business development, subscribers, etc.) on reliability techniques, application, and system functionality.
- Educate and mentor team/company on operational best practices
Your background:
- Automation experience with Ansible or a similar configuration management tools
- Hands-on experience with Linux, python, bash, and git
- Experience supporting large, complex, and distributed systems
- Host side networking
- Packet level understanding of network traffic, working experience troubleshooting with packet captures, etc.
- TCP/IP Stack, routes
- Multicast
- General familiarity with Data Center workflows and working with DC personnel to implement changes
- Hardware familiarity
- Arista and Cisco switches and their CLI’s
- Corvil
- Solarflare, Mellanox
Why you should apply:
- Comprehensive [Benefits](https://iex.io/careers/)
- Unlimited PTO
- 100% coverage for medical, dental, and vision
- New hire stock equity (RSUs)
- 401K employer match
- OneMedical membership
- 16 weeks paid parental leave
- Flexible workplace
- Employer charity match
- Learning stipend
- Commuter benefits
- Jump Start onboarding program
- Internal mentor program cross-departmentally
- Friendly and inclusive workplace culture
The starting annual base pay is between $150,000 and $225,000 for this California or Oregon state based position. The actual base pay is dependent upon many factors, such as: training, transferable skills, work experience, business needs and market demands. The annual base pay range is subject to change and may be modified in the future. This role is eligible for bonus and equity.
Questions about this role
What is the remote work policy for this position?
This is a remote position for candidates located in California or Oregon.
What level of seniority is expected for this role?
This is a senior-level position, indicating a need for substantial experience and expertise.
What are the core technical skills required for this role?
Key technical skills include Linux, Python, Bash, Git, and automation experience with tools like Ansible, along with knowledge of host-side networking and troubleshooting distributed systems.