Posted:Just now|
Platform:
Hybrid
Full Time
Job Description : As a Data Engineer for our Large Language Model Project, you will play a crucial role in designing, implementing, and maintaining the data infrastructure. Your expertise will be instrumental in ensuring the efficient flow of data, enabling seamless integration with various components, and optimizing data processing pipelines. 5+ years of relevant experience in data engineering roles. Key Responsibilities : Data Pipeline Development - Design, develop, and maintain scalable and efficient data pipelines to support the training and deployment of large language models. Implement ETL processes to extract, transform, and load diverse datasets into suitable formats for model training. Data Integration - Collaborate with cross-functional teams, including data scientists and software engineers, to integrate data sources and ensure the availability of relevant and high-quality data. Implement solutions for real-time data processing and integration, fostering model development agility. Data Quality Assurance - Establish and maintain robust data quality checks and validation processes to ensure the accuracy and consistency of datasets. Troubleshoot data quality issues, identify root causes, and implement corrective measures. Infrastructure Management - Work closely with DevOps and IT teams to manage and optimize the data storage infrastructure, ensuring scalability and performance. Implement best practices for data security, access control, and compliance with data governance policies. Performance Optimization - Identify bottlenecks and inefficiencies in data processing pipelines and implement optimizations to enhance overall system performance. Continuously monitor and evaluate system performance metrics, making proactive adjustments as needed. Skills & Tools Programming Languages - Proficiency in languages such as Python for building robust data processing applications. Big Data Technologies - Experience with distributed computing frameworks like Apache Spark, Databricks & DBT for large-scale data processing. Database Systems - In-depth knowledge of both relational databases (e.g., MySQL, PostgreSQL) and NoSQL databases (e.g., Vector databases, MongoDB, Cassandra etc). Data Warehousing - Familiarity with data warehousing solutions such as Amazon Redshift, Google BigQuery, or Snowflake. ETL Tools - Hands-on experience with ETL tools like Apache NiFi, Talend, or Apache Airflow. Knowledge of NLP will be an added advantage. Cloud Services - Experience with cloud platforms like AWS, Azure, or Google Cloud for deploying and managing data infrastructure. Problem Solving - Analytical mindset with a proactive approach to identifying and solving complex data engineering challenges.
HCLTech
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
My Connections HCLTech
Chennai
25.0 - 30.0 Lacs P.A.
Hyderabad, Pune, Bengaluru
10.0 - 20.0 Lacs P.A.
Chennai
0.5 - 0.6 Lacs P.A.
Hyderabad, Chennai, Bengaluru
9.5 - 15.0 Lacs P.A.
Bengaluru
7.0 - 17.0 Lacs P.A.
Hyderabad
15.0 - 30.0 Lacs P.A.
Pune
15.0 - 30.0 Lacs P.A.
Chennai, Bengaluru
15.0 - 20.0 Lacs P.A.
Hyderabad, Chennai, Bengaluru
10.0 - 19.0 Lacs P.A.
Hyderābād
2.51046 - 7.5 Lacs P.A.