Posted:1 day ago|
Platform:
Work from Office
Full Time
Role Proficiency:
This role requires proficiency in data pipeline development including coding and testing data pipelines for ingesting wrangling transforming and joining data from various sources. Must be skilled in ETL tools such as Informatica Glue Databricks and DataProc with coding expertise in Python PySpark and SQL. Works independently and has a deep understanding of data warehousing solutions including Snowflake BigQuery Lakehouse and Delta Lake. Capable of calculating costs and understanding performance issues related to data solutions.
Outcomes:
Measures of Outcomes:
Outputs Expected:
Code Development:
Documentation:
Configuration:
Testing:
Domain Relevance:
Project Management:
Defect Management:
Estimation:
Knowledge Management:
Release Management:
Design Contribution:
Customer Interface:
Team Management:
Certifications:
Skill Examples:
Knowledge Examples:
Knowledge Examples
Additional Comments:
As a Data Engineer, you will be responsible for designing, building and maintaining data pipelines and infrastructure that support the client Data Science team. You will work with Google Cloud Platform (GCP) tools and technologies to collect, store and process data from various sources and formats. You will also write scripts to transform and load data explicitly for use in the machine learning (ML) pipelines process. You will optimize data access and performance for ML applications and ensure data quality and availability for analytics/ML teams. You will collaborate with data scientists, analysts, engineers and other stakeholders to deliver data solutions that enable data-driven decision making and innovation. Key Responsibilities As a Data Engineer for the Data Science team, you will perform the following tasks: - Architect, build and maintain scalable, reliable and secure data pipelines and infrastructure using GCP tools such as BigQuery, Dataflow, Dataproc, Pub/Sub, Cloud Storage, etc. - Write scripts to transform and load data for use in ML pipelines using languages such as Python, SQL, etc. - Optimize data access and performance for ML applications using techniques such as partitioning, indexing, caching, etc. - Ensure data quality and availability for analytics/ML teams using tools such as Data Catalog, Data Studio, Data Quality, etc. - Monitor, troubleshoot and debug data issues and errors using tools such as Stackdriver, Cloud Logging, Cloud Monitoring, etc. - Document and maintain data pipelines and infrastructure specifications, standards and best practices using tools such as Git, Jupyter, etc. - Collaborate with data scientists, analysts, engineers, and other stakeholders to understand data requirements, provide data solutions and support data-driven projects and initiatives. - Support generative AI projects by providing data for training, testing and evaluation of generative models such as Gemini, Claude 3, etc. - Implement data augmentation, data anonymization and data synthesis techniques to enhance data quality and diversity for generative AI projects. Qualifications To be successful in this role, you will need to have the following qualifications: - Bachelor's degree in Computer Science, Engineering, Mathematics, Statistics or related field. - At least years of experience in data engineering, data warehousing, data integration or related field. - Experience 6 in a health care-related field or working with health care data is preferred. - Proficient in GCP tools and technologies for data engineering, such as BigQuery, Dataflow, Dataproc, Pub/Sub, Cloud Storage, etc., or equivalents in other cloud platforms - Proficient in scripting languages such as Python, SQL, etc. for data transformation and loading. - Knowledge of data modeling, data quality, data governance and data security principles and practices. - Knowledge of ML concepts, frameworks and tools such as TensorFlow, Keras, Scikit-learn, etc. - Knowledge of generative AI concepts, frameworks and tools. - Excellent communication, collaboration and problem-solving skills.
Required SkillsGcp,Machine Learning,Architect
UST
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
Practice Python coding challenges to boost your skills
Start Practicing Python Nowchennai, bengaluru, thiruvananthapuram
5.0 - 5.5 Lacs P.A.
karnataka
Salary: Not disclosed
pune, maharashtra, india
Experience: Not specified
Salary: Not disclosed
thiruvananthapuram, kerala
Salary: Not disclosed
Thiruvananthapuram
8.0 - 10.0 Lacs P.A.
Trivandrum, Kerala, India
Salary: Not disclosed
Hyderabad, Telangana, India
Experience: Not specified
Salary: Not disclosed
Trivandrum, Kerala, India
Salary: Not disclosed
Trivandrum, Kerala, India
Salary: Not disclosed
Bengaluru
25.0 - 30.0 Lacs P.A.