Posted:1 day ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Job Title:Data Scientist – Data Engineering & ETL Pipeline Specialist

Location:

Employment

Shift:

Experience:

Salary:

Notice:


About the Role

We are looking for a Data Scientist with strong Data Engineering expertise to own the entire data lifecycle — from large-scale data acquisition (web crawling/scraping) to processing, transformation, and storage in analytics-ready formats.

This role is ideal for someone who thrives at building scalable, distributed data systems and optimizing pipelines for performance, reliability, and cost efficiency.


Key Responsibilities

  • Data Acquisition: Architect and implement scalable web crawling/scraping systems for diverse verticals (e-commerce, social, technology, etc.).
  • ETL/ELT Development: Design, build, and maintain real-time and batch data pipelines for ingestion, transformation, and loading into analytics platforms.
  • Data Processing: Clean, enrich, and transform raw data into structured formats ready for machine learning and business intelligence.
  • Data Storage: Manage and optimize data in data lakes, data warehouses, and OLTP/OLAP systems.
  • System Observability: Implement robust logging and monitoring solutions for seamless system health checks and log analysis.
  • Collaboration & Mentorship: Work closely with cross-functional teams and mentor junior developers in data engineering best practices.
  • Continuously optimize infrastructure cost while maintaining high performance and reliability.


Core Competencies

  • Large-scale web data extraction (crawling/scraping) using custom & distributed systems.
  • Real-time and batch processing with Kafka, Spark, and Storm.
  • Data storage in MongoDB, HDFS, S3, Athena, and other warehouse/lake environments.
  • Strong foundation in ETL pipeline design and optimization.
  • Hands-on experience with distributed microservices architecture.


Tech Stack & Tools

  • Languages & Frameworks: Python, Apache Storm, Apache Spark, Apache Hadoop
  • Message Queues & Caches: Kafka Cluster, Redis Cluster
  • Databases & Storage: MongoDB, HDFS, Amazon S3, Athena
  • Cloud Platforms: AWS, Azure, DigitalOcean
  • Others: Distributed Systems, GitHub/Bitbucket, JIRA, Kibana


Qualifications

  • Bachelor’s or Master’s degree in Computer Science, Data Science, or related field.
  • 3+ years of professional experience in Data Engineering, ETL/ELT pipeline development, and large-scale web data acquisition.
  • Strong problem-solving, debugging, and optimization skills.
  • Excellent communication and collaboration abilities.

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You

Serilingampalli, Telangana, India

Bengaluru, Karnataka, India

Bengaluru, Karnataka, India

Gurgaon, Haryana, India

Matar, Gujarat, India

Bangalore Urban, Karnataka, India