This position requires a detail-oriented data engineer who can independently architect and implement data pipelines, while also serving as a trusted technical partner in client engagements and stakeholder meetings. You’ll work hands-on with PySpark, Airflow, Python, and SQL, driving end-to-end data migration and platform modernization efforts across Azure and AWS.
In addition to technical execution, you’ll contribute to sprint planning, backlog prioritization, and continuous integration/deployment of data infrastructure. This is a senior-level individual contributor role with direct visibility across engineering, product, and client delivery functions.
Key Responsibilities
- Lead design and development of enterprise-grade data pipelines and cloud data migration architectures.
- Build scalable, maintainable ETL/ELT pipelines using Apache Airflow, PySpark, and modern data services.
- Write efficient, modular, and well-tested Python code, grounded in clean architecture and performance principles.
- Develop and optimize complex SQL queries across diverse relational and analytical databases.
- Contribute to and uphold standards for data modeling, data governance, and pipeline performance.
- Own the implementation of CI/CD pipelines to enable reliable deployment of data workflows and infrastructure (e.g., GitHub Actions, Azure DevOps, Jenkins).
- Embed unit testing, integration testing, and monitoring in all stages of the data pipeline lifecycle.
- Participate actively in Agile ceremonies: sprint planning, daily stand-ups, retrospectives, and backlog grooming.
- Collaborate directly with clients, stakeholders, and cross-functional teams to translate business needs into scalable technical solutions.
- Act as a technical authority within the team—leading architectural decisions and contributing to internal best practices and documentation.
Requirements
Required Qualifications
- 4+ years of hands-on experience in data engineering, with proven success delivering complex data solutions in production environments.
- Expert-level programming skills in Python, including a deep understanding of OOP, performance tuning, and testing strategies.
- Advanced SQL skills: complex joins, CTEs, window functions, indexing, and query optimization.
- Strong experience with Apache Airflow, PySpark, and distributed data processing.
- Proficiency in architecting and delivering data solutions on Microsoft Azure, Amazon Web Services (AWS), or both.
- Demonstrated CI/CD experience for data pipelines and infrastructure (IaC and workflow deployments).
- Hands-on experience with Agile frameworks (Scrum, Kanban) and collaboration tools (Jira, Confluence, Git, etc.).
- Comfortable interfacing directly with clients, product owners, and non-technical stakeholders.
- Experience in regulated industries such as Healthcare or Financial Services, with understanding of privacy and compliance best practices (HIPAA, SOC 2, etc.).
Preferred Qualifications
- Familiarity with Snowflake, Databricks, or other modern data warehouse platforms.
- Experience with MLOps pipelines and tools such as MLflow, PyTorch, and cloud-based ML model delivery.
- Exposure to ETL platforms like Apache NiFi, Talend, or Informatica.
- Proficiency with DBT (Data Build Tool) for modular SQL transformations.
- Strong communication skills with a proven ability to provide mentorship, support knowledge sharing, and document engineering decisions.
- Experience with data visualization and supporting analytics teams through well-structured data marts or APIs.