Create and maintain optimal data pipeline architecture; assemble large, complex data sets that meet functional non-functional requirements.
Design the right schema to support the functional requirement and consumption patter.
Design and build production data pipelines from ingestion to consumption.
Build the necessary datamarts, data warehouse required for optimal extraction, transformation, and loading of data from a wide variety of data sources.
Create necessary preprocessing and postprocessing for various forms of data for training/ retraining and inference ingestions as required
Create data visualization and business intelligence tools for stakeholders and data scientists for necessary business/ solution insights
Identify, design, and implement internal process improvements: automating manual data processes, optimizing data delivery, etc.
Ensure our data is separated and secure across national boundaries through multiple data centers and AWS regions.
You should have a bachelors or masters degree in computer science, Information Technology or other quantitative fields
You should have at least 5 years working as a data engineer in supporting large data transformation initiatives related to machine learning, with experience in building and optimizing pipelines and data sets
Strong analytic skills related to working with unstructured datasets.

Must-have Programming Skills:

Significant programming experience python programming, spark is must.
Good Hands -on experience in SQL, writing analytical queries and windows functions.
Good Hands on experience in creating external tables, partitioning, parquet files.
3-5 years of solid experience in Big Data technologies a must.
Data Engineering experience using AWS core services (Lambda, Glue, EMR and RedShift)
Knowledge of Python and Pyspark is an absolute must.

Desired Candidate Profile

Cloud formation scripts implementation
Pyspark
Python
Glue job using Pyspark.
Exposure to AWS services and networking.
Experience with AWS cloud services: EC2, EMR, RDS, Redshift, S3, Athena and familiarity with various log formats from AWS.
Experience with object-oriented/object function scripting languages: Python, Pyspark, Java, C++, etc.
Experience in, Dbeaver tool, AWS Glue ETL, AWS Crawler, AWS Lambda, Glue Data Catalog, AWS Glue Studio.

Good to Have Skill sets:

Experience with big data tools: Hadoop, Spark, Kafka, etc.
Experience with data pipeline and workflow management tools: Azkaban, Luigi, Airflow, etc.
Experience with stream-processing systems: Storm, Spark-Streaming, etc.
You should be a good team player and committed for the success of team and overall project.

More Jobs at Search Advisers Services ( GUJ ) Pvt. Ltd.

Deputy Manager- Internal Communications

gurugram

3.0 - 7.0 yrs

INR 8 - 13 Lacs

Talent Acquisition Manager

bengaluru

15.0 - 18.0 yrs

INR 70 - 75 Lacs

Sr. Manager Tendering

gurugram

7.0 - 10.0 yrs

INR 8 - 17 Lacs

SAP HCM Functional Consultant

noida

6.0 - 11.0 yrs

INR 13 - 20 Lacs

HRBP

bengaluru

3.0 - 7.0 yrs

INR 17 - 18 Lacs

Mock Interview

Practice Video Interview with JobPe AI

Start PySpark Interview

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.