We are seeking a talented and experienced Senior Data Engineer to join our Catalog Management department. In this role, you will be responsible for designing, developing, and maintaining robust and scalable data pipelines and infrastructure on Google Cloud Platform (GCP). You will work closely with data scientists, analysts, and other engineers to ensure data is readily available, reliable, and optimized for various analytical and operational needs. A strong focus on building automated testing into our data solutions is a must. The ideal candidate will have a strong background in Java development, Apache Spark, and GCP services, with a passion for building high-quality data solutions.
Responsibilities:
- Data      Pipeline Development: Design, develop, and maintain efficient and scalable      data pipelines using Apache Spark (primarily with Java) or Apache Beam or      Kubeflow to ingest, process, and transform large datasets from various      sources.
- GCP      Infrastructure Management: Build, configure, and manage data      infrastructure components on GCP, including BigQuery, Dataflow, Dataproc,      Cloud Storage, Pub/Sub, and Cloud Functions.
- API      Development and Maintenance: Develop and maintain RESTful APIs using      Spring Boot to provide secure and reliable access to processed data and      data services.
- Data      Modeling and Design: Design and implement optimized data models for      analytical and operational use cases, considering performance,      scalability, and data integrity.
- Data      Quality Assurance: Implement comprehensive data quality checks and      monitoring systems to ensure data accuracy, consistency, and reliability      throughout the data lifecycle.
- Test      Automation: Develop and maintain automated unit, integration, and      end-to-end tests for data pipelines and APIs to ensure code quality and      prevent regressions.
- Performance      Optimization and Monitoring: Proactively monitor system performance,      reliability, and scalability. Analyze system performance metrics (CPU,      memory, network) to identify bottlenecks, optimize system health, and      ensure cost-efficiency.
- Collaboration      and Communication: Collaborate effectively with data scientists, analysts,      product managers, architects, and other engineers to understand data      requirements, translate them into technical solutions, and deliver      effective data solutions.
- Documentation:      Create and maintain clear, comprehensive, and up-to-date documentation for      data pipelines, infrastructure, and APIs, including design specifications,      operational procedures, and troubleshooting guides.
- CI/CD      Implementation: Implement and maintain robust CI/CD pipelines for      automated deployment of data solutions, ensuring rapid and reliable      releases.
- Production      Support and Incident Management: Provide timely and effective support for      production systems, including incident management, root cause analysis,      and resolution.
- Continuous      Learning: Stay current with the latest trends and technologies in data      engineering, GCP, and related fields, and proactively identify      opportunities to improve existing systems and processes.
Qualifications:
- Bachelor's      degree in Computer Science, Engineering, or a related field.
- 4-6      years of experience in data engineering or a related role.
- Strong      proficiency in Java programming.
- Extensive      experience with Apache Spark for data processing.
- Solid      experience with Google Cloud Platform (GCP) services, including BigQuery,      Dataflow, Dataproc, Cloud Storage, and Pub/Sub.
- Experience      developing RESTful APIs using Spring Boot.
- Experience      with test automation frameworks (e.g., JUnit, Mockito, REST Assured).
- Experience      with CI/CD pipelines (e.g., Jenkins, GitLab CI, Cloud Build).
- Excellent      problem-solving and analytical skills.
- Strong      communication and collaboration skills.
- Preferred      Qualifications:
- Experience      with other data processing technologies (e.g., Apache Beam, Flink).
- Experience      with infrastructure-as-code tools (e.g., Terraform, Cloud Deployment      Manager).
- Experience      with data visualization tools (e.g., Tableau, Looker).
- Experience      with containerization technologies (e.g., Docker, Kubernetes).
- Understanding      of AI/GenAI concepts and their data requirements is a plus.
- Experience      building data pipelines to support AI/ML models is a plus.
- Strong      expertise in API testing tools (e.g., Postman).
- Solid      experience in performance testing using JMeter.
- Proven      experience with modern test automation frameworks.
- Proficient      in using JUnit for unit testing.
Technical Skills:
- Strong      proficiency in Java programming, including functional programming concepts      for scripting and automation.
- Solid      understanding of cloud platforms, with a strong preference for Google      Cloud Platform (GCP).
- Proven      experience with modern test automation frameworks (e.g., JUnit, Mockito).
- Familiarity      with system performance monitoring and analysis (CPU, memory, network).
- Experience      with monitoring and support best practices.
- Strong      debugging and troubleshooting skills to identify and resolve complex      technical issues.
- Strong      analytical, problem-solving, and communication skills.