Posted:2 months ago|
Platform:
Work from Office
Full Time
Job Title: Product Engineer - Big Data Location: Mumbai Experience: 3 - 8 Yrs Job Summary: As a Product Engineer - Big Data , you will be responsible for designing, building, and optimizing large-scale data processing pipelines using cutting-edge Big Data technologies. Collaborating with cross-functional teams including data scientists, analysts, and product managers you will ensure data is easily accessible, secure, and reliable. Your role will focus on delivering high-quality, scalable solutions for data storage, ingestion, and analysis, while driving continuous improvements throughout the data lifecycle. Key Responsibilities: ETL Pipeline Development Optimization: Design and implement complex end-to-end ETL pipelines to handle large-scale data ingestion and processing. Utilize AWS services like EMR, Glue, S3, MSK (Managed Streaming for Kafka), DMS (Database Migration Service), Athena, and EC2 to streamline data workflows, ensuring high availability and reliability. Big Data Processing: Develop and optimize real-time and batch data processing systems using Apache Flink, PySpark, and Apache Kafka . Focus on fault tolerance, scalability, and performance. Work with Apache Hudi for managing datasets and enabling incremental data processing. Data Modeling Warehousing: Design and implement data warehouse solutions that support both analytical and operational use cases. Model complex datasets into optimized structures for high performance, easy access, and query efficiency for internal stakeholders. Cloud Infrastructure Development: Build scalable cloud-based data infrastructure leveraging AWS tools. Ensure data pipelines are resilient and adaptable to changes in data volume and variety, while optimizing costs and maximizing efficiency using Managed Apache Airflow for orchestration and EC2 for compute resources. Data Analysis Insights: Collaborate with business teams and data scientists to understand data needs and deliver high-quality datasets. Conduct in-depth analysis to derive insights from the data, identifying key trends, patterns, and anomalies to drive business decisions. Present findings in a clear, actionable format. Real-time Batch Data Integration: Enable seamless integration of real-time streaming and batch data from systems like AWS MSK . Ensure consistency in data ingestion and processing across various formats and sources, providing a unified view of the data ecosystem. CI/CD Automation: Use Jenkins to establish and maintain continuous integration and delivery pipelines. Implement automated testing and deployment workflows, ensuring smooth integration of new features and updates into production environments. Data Security Compliance: Collaborate with security teams to ensure data pipelines comply with organizational and regulatory standards such as GDPR, HIPAA , or other relevant frameworks. Implement data governance practices to ensure integrity, security, and traceability throughout the data lifecycle. Collaboration Cross-Functional Work: Partner with engineers, data scientists, product managers, and business stakeholders to understand data requirements and deliver scalable solutions. Participate in agile teams, sprint planning, and architectural discussions. Troubleshooting Performance Tuning: Identify and resolve performance bottlenecks in data pipelines. Ensure optimal performance through proactive monitoring, tuning, and applying best practices for data ingestion and storage. Skills Qualifications: Must-Have Skills: AWS Expertise: Hands-on experience with core AWS services related to Big Data, including EMR, Managed Apache Airflow, Glue, S3, DMS, MSK, Athena, and EC2 . Strong understanding of cloud-native data architecture. Big Data Technologies: Proficiency in PySpark and SQL for data transformations and analysis. Experience with large-scale data processing frameworks like Apache Flink and Apache Kafka . Data Frameworks: Strong knowledge of Apache Hudi for data lake operations, including CDC (Change Data Capture) and incremental data processing. Database Modeling Data Warehousing: Expertise in designing scalable data models for both OLAP and OLTP systems. In-depth understanding of data warehousing best practices. ETL Pipeline Development: Proven experience in building robust, scalable ETL pipelines for processing real-time and batch data across platforms. Data Analysis Insights: Strong problem-solving skills with a data-driven approach to decision-making. Ability to conduct complex data analysis to extract actionable business insights. CI/CD Automation: Basic to intermediate knowledge of CI/CD pipelines using Jenkins or similar tools to automate deployment and monitoring of data pipelines.
UST
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
My Connections UST
18.0 - 20.0 Lacs P.A.
0.5 - 0.6 Lacs P.A.
Chennai, Tamil Nadu, India
6.0 - 10.0 Lacs P.A.
Chennai, Tamil Nadu, India
7.0 - 10.0 Lacs P.A.
Bengaluru / Bangalore, Karnataka, India
3.0 - 7.0 Lacs P.A.
Hyderabad / Secunderabad, Telangana, Telangana, India
3.0 - 7.0 Lacs P.A.
Delhi, Delhi, India
3.0 - 7.0 Lacs P.A.
Noida, Uttar Pradesh, India
3.0 - 9.5 Lacs P.A.
Gurgaon / Gurugram, Haryana, India
7.0 - 14.0 Lacs P.A.
Noida, Uttar Pradesh, India
7.0 - 14.0 Lacs P.A.