Job
Description
As an experienced data engineer, your role will involve enabling data frameworks to automate domain-specific data and analytics workflows. You will be responsible for incorporating GenAI into data engineering workflows, applications, or internal data and analytics tools. Additionally, optimizing ETL workflows to enhance data processing efficiency and reliability will be a key part of your responsibilities. Ensuring data integrity and performance in big data workflows and building frameworks to automate test suites and streamline data operations will also be crucial tasks. You will be developing and deploying AI-powered support tools for developers, such as code suggestion bots, to help identify vulnerabilities and ensure policy compliance. Furthermore, integrating data pipelines into the broader ML Operations (MLOps) process, automating code review processes, and creating technical specifications, instrumentation specs, and API contracts are essential aspects of this role. Qualifications Required: - Bachelors or Masters degree in Computer Science or a related technical field or equivalent experience - 4+ years of experience in designing, developing, and deploying data engineering or analytics engineering solutions - Strong proficiency in SQL, Scala, Python, or Java, with hands-on experience in data pipeline tools (e.g., Apache Spark, Kafka, Airflow), CI/CD practices, and version control - Familiarity with cloud platforms (AWS, Azure, GCP), big data technologies, and data analytics tools like Snowflake, Databricks, and Tableau - Familiarity with RAG-LLM solutions, GenAI models, APIs, and prompt engineering - Expertise in CI/CD tools like Jenkins, GitHub Actions - Strong analytical skills to optimize developer workflows Preferred Qualifications: - Expertise in building and refining large-scale data pipelines, as well as developing tools and frameworks for data platforms - Hands-on experience with big data technologies such as distributed querying(Trino), real-time analytics(OLAP), near-real-time data processing(NRT), and decentralized data architecture (Apache Mesh) - Experience enabling ML pipelines including automating the data flow for feature engineering, model retraining, performance monitoring models in production, drift detection, and ensuring scalability - Familiarity with GenAI concepts like Retrieval-Augmented Generation (RAG), Large Language Models (LLMs), prompt engineering, vector embeddings, and LLM fine-tuning - Familiarity with observability tools like DataDog, Prometheus, Grafana - Expertise in building user journey workflow and test suite automation - Familiarity with data governance, security protocols, and compliance - Proven ability to work independently, escalate blockers, and propose thoughtful, long-term solutions - Demonstrates sound judgment, applies technical principles to complex projects, evaluates solutions, and proposes new ideas and process improvements - Seeks new opportunities for growth, demonstrates a thorough understanding of technical concepts, exercises independence in problem-solving, and delivers impactful results at the team level Please omit any additional details of the company from the job description.,