Data Engineer with Machine Learning Specialization

5 - 10 years

15 - 30 Lacs

Bengaluru

Posted:3 days ago| Platform: Naukri logo

Apply Now

Skills Required

Pyspark Data Engineering Machine Learning Gcp Cloud Python Airflow Hadoop Data Bricks Hive SCALA Data Flow Spark ETL

Work Mode

Remote

Job Type

Full Time

Job Description

Job Requirement for Offshore Data Engineer (with ML expertise) Work Mode: Remote Base Location: Bengaluru Experience: 5+ Years Technical Skills & Expertise: PySpark & Apache Spark: Extensive experience with PySpark and Spark for big data processing and transformation. Strong understanding of Spark architecture, optimization techniques, and performance tuning. Ability to work with Spark jobs in distributed computing environments like Databricks. Data Mining & Transformation: Hands-on experience in designing and implementing data mining workflows. Expertise in data transformation processes, including ETL (Extract, Transform, Load) pipelines. Experience in large-scale data ingestion, aggregation, and cleaning. Programming Languages: Python & Scala: Proficient in Python for data engineering tasks, including using libraries like Pandas and NumPy. Scala proficiency is preferred for Spark job development. Big Data Concepts: In-depth knowledge of big data frameworks and paradigms, such as distributed file systems, parallel computing, and data partitioning. Big Data Technologies: Cassandra & Hadoop: Experience with NoSQL databases like Cassandra and distributed storage systems like Hadoop. Data Warehousing Tools: Proficiency with Hive for data warehousing solutions and querying. ETL Tools: Experience with Beam architecture and other ETL tools for large-scale data workflows. Cloud Technologies (GCP): Expertise in Google Cloud Platform (GCP), including core services like Cloud Storage, BigQuery, and DataFlow. Experience with DataFlow jobs for batch and stream processing. Familiarity with managing workflows using Airflow for task scheduling and orchestration in GCP. Machine Learning & AI: GenAI Experience: Familiarity with Generative AI and its applications in ML pipelines. ML Model Development: Knowledge of basic ML model building using tools like Pandas, NumPy, and visualization with Matplotlib. ML Ops Pipeline: Experience in managing end-to-end ML Ops pipelines for deploying models in production, particularly LLM (Large Language Models) deployments. RAG Architecture: Understanding and experience in building pipelines using Retrieval-Augmented Generation (RAG) architecture to enhance model performance and output. Tech stack : Spark, Pyspark, Python, Scala, GCP data flow, Data composer (Air flow), ETL, Databricks, Hadoop, Hive, GenAI, ML Modeling basic knowledge, ML Ops experience , LLM deployment, RAG

Mock Interview

Boost Confidence & Sharpen Skills

Start Pyspark Interview Now

RecommendedJobs for You

Kolkata, Mumbai, New Delhi, Hyderabad, Pune, Chennai, Bengaluru

Hyderabad, Pune, Bengaluru

Chennai, Bengaluru, Mumbai (All Areas)

Noida, Pune, Bengaluru