Posted:1 day ago|
Platform:
On-site
Part Time
Role Proficiency:
This role requires proficiency in data pipeline development including coding and testing data pipelines for ingesting wrangling transforming and joining data from various sources. Must be skilled in ETL tools such as Informatica Glue Databricks and DataProc with coding expertise in Python PySpark and SQL. Works independently and has a deep understanding of data warehousing solutions including Snowflake BigQuery Lakehouse and Delta Lake. Capable of calculating costs and understanding performance issues related to data solutions.
Outcomes:
Measures of Outcomes:
Outputs Expected:
Code Development:
Documentation:
Configuration:
Testing:
Domain Relevance:
Project Management:
Defect Management:
Estimation:
Knowledge Management:
Release Management:
Design Contribution:
Customer Interface:
Team Management:
Certifications:
Skill Examples:
Knowledge Examples:
Knowledge Examples
Additional Comments:
Data Engineering Role Summary: Skilled Data Engineer with strong Python programming skills and experience in building scalable data pipelines across cloud environments. The candidate should have a good understanding of ML pipelines and basic exposure to GenAI solutioning. This role will support large-scale AI/ML and GenAI initiatives by ensuring high-quality, contextual, and real-time data availability. ________________________________________ Key Responsibilities: • Design, build, and maintain robust, scalable ETL/ELT data pipelines in AWS/Azure environments. • Develop and optimize data workflows using PySpark, SQL, and Airflow. • Work closely with AI/ML teams to support training pipelines and GenAI solution deployments. • Integrate data with vector databases like ChromaDB or Pinecone for RAG-based pipelines. • Collaborate with solution architects and GenAI leads to ensure reliable, real-time data availability for agentic AI and automation solutions. • Support data quality, validation, and profiling processes. ________________________________________ Key Skills & Technology Areas: • Programming & Data Processing: Python (4–6 years), PySpark, Pandas, NumPy • Data Engineering & Pipelines: Apache Airflow, AWS Glue, Azure Data Factory, Databricks • Cloud Platforms: AWS (S3, Lambda, Glue), Azure (ADF, Synapse), GCP (optional) • Databases: SQL/NoSQL, Postgres, DynamoDB, Vector databases (ChromaDB, Pinecone) – preferred • ML/GenAI Exposure (basic): Hands-on with Pandas, scikit-learn, knowledge of RAG pipelines and GenAI concepts • Data Modeling: Star/Snowflake schema, data normalization, dimensional modeling • Version Control & CI/CD: Git, Jenkins, or similar tools for pipeline deployment ________________________________________ Other Requirements: • Strong problem-solving and analytical skills • Flexible to work on fast-paced and cross-functional priorities • Experience collaborating with AI/ML or GenAI teams is a plus • Good communication and a collaborative, team-first mindset • Experience in Telecom, E- Commerce, or Enterprise IT Operations is a plus.
ETL,BIGDATA,PYSPARK,SQL
UST Global
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
Practice Python coding challenges to boost your skills
Start Practicing Python NowThiruvananthapuram
Salary: Not disclosed
Thiruvananthapuram
Salary: Not disclosed
Trivandrum, Kerala, India
Experience: Not specified
Salary: Not disclosed
Trivandrum, Kerala, India
Experience: Not specified
Salary: Not disclosed
Trivandrum, Kerala, India
Salary: Not disclosed
Thiruvananthapuram
Salary: Not disclosed
Thiruvananthapuram
Salary: Not disclosed
Trivandrum, Kerala, India
Experience: Not specified
Salary: Not disclosed
Trivandrum, Kerala, India
Experience: Not specified
Salary: Not disclosed
Trivandrum, Kerala, India
Salary: Not disclosed