Role in brief
Wynd Labs is building Grass, a network that makes public web data accessible for AI by rewarding users for sharing bandwidth. This Data Engineer role involves designing, building, and optimizing data pipelines and ETL workflows. It suits a proactive engineer with strong SQL and cloud data platform experience who thrives in an autonomous, fast-paced startup environment.
About the role
This role focuses on developing and maintaining the data infrastructure for Grass, a project that aims to democratize access to web data for AI. You will be responsible for creating scalable data pipelines, managing ETL/ELT processes, and integrating database systems. The work involves transforming raw data into structured formats for analysis and ensuring data quality and flow consistency.
The team operates with a flat organizational structure, emphasizing autonomy and direct responsibility. Success in this role means consistently delivering high-quality data solutions, proactively identifying and resolving issues, and contributing to a system that fairly distributes rewards based on data contributions. The environment values initiative and a strong sense of urgency.
You will collaborate with data scientists and analysts to ensure data accessibility and usability. This includes automating workflows and infrastructure setup using various tools, working with cloud platforms like AWS, GCP, and Azure, and implementing best practices for data governance and security. Staying current with data engineering trends is also a key aspect of the position.
The listed salary range for this position is $100,000 to $140,000 USD, in addition to an equity package.
Skills that matter here
- SQL: Advanced proficiency in SQL is required for optimizing complex queries and managing database systems like Redshift and Snowflake.
- Apache Airflow: You will use Apache Airflow, or similar tools, for automating data workflows and building and managing data pipelines.
- AWS Glue: Experience with AWS Glue is needed for building and managing data pipelines and ETL processes.
- ETL: A solid understanding of ETL processes and best practices is essential for transforming raw data into structured formats.
- Terraform: You will use Terraform, or similar tools, for automating infrastructure setup and managing data ecosystems.
- Python: Knowledge of Python, Scala, or Java is required for pipeline orchestration and data manipulation.
Who this role suits
- Someone who thrives on autonomy and takes initiative in a lean team setting.
- An individual driven by ambitious goals and a sense of urgency to deliver results.
- A problem-solver with strong analytical skills who can troubleshoot and resolve data flow issues.
- A collaborative team player with strong communication skills to work across different functions.
From the employer
- Designing, building, and optimizing scalable data pipelines to process and integrate data from various sources in real-time or batch modes.
- Developing and managing ETL/ELT workflows to transform raw data into structured formats for analysis and reporting.
- Integrating and configuring database infrastructure, ensuring performance, scalability, and data security.
- Automating data workflows and infrastructure setup using tools like Apache Airflow, Terraform, or similar.
- Collaborating with data scientists, analysts, and other stakeholders to ensure efficient data accessibility and usability.
- Monitoring, troubleshooting, and improving the performance of data pipelines and infrastructure to ensure data quality and flow consistency.
- Working with cloud infrastructure (AWS, GCP, Azure) to manage databases, storage, and compute resources efficiently.
- Implementing best practices for data governance, data security, and disaster recovery in all infrastructure designs.
- Staying current with the latest trends and technologies in data engineering, pipeline automation, and infrastructure as code.
- Bachelor’s degree in Computer Science, Information Systems, Data Engineering, or a related technical field.
- Extensive experience with database systems such as Redshift, Snowflake, or similar cloud-based solutions.
- Advanced proficiency in SQL and experience with optimizing complex queries for performance.
- Hands-on experience with building and managing data pipelines using tools such as Apache Airflow, AWS Glue, or similar technologies.
- Solid understanding of ETL (Extract, Transform, Load) processes and best practices for data integration.
- Experience with infrastructure automation tools (e.g., Terraform, CloudFormation) for managing data ecosystems.
- Knowledge of programming languages such as Python, Scala, or Java for pipeline orchestration and data manipulation.
- Strong analytical and problem-solving skills, with an ability to troubleshoot and resolve data flow issues.
- Familiarity with containerization (e.g., Docker) and orchestration (e.g., Kubernetes) technologies for data infrastructure deployment.
- Collaborative team player with strong communication skills to work with cross-functional teams.
- Opportunity to work at the forefront of developing a web-scale crawler and knowledge graph that allows ordinary people to participate in the process, and share in the benefits of AI development.
- Culture of a lean team working together to achieve a very ambitious goal of improving access to public web data and distributing the value of AI to the people.
- Competitive salary and equity package.
Questions about this role
What is the remote work policy for this role?
This is a fully remote position.
What kind of experience is required?
Candidates should have extensive experience with database systems like Redshift or Snowflake, advanced SQL proficiency, and hands-on experience building and managing data pipelines with tools like Apache Airflow or AWS Glue. A Bachelor’s degree in a technical field is also required.
What is the salary range for this position?
The salary range for this role is between $100,000 and $140,000 USD.