Job
Description
We are searching for a skilled and adaptable Data Engineer with proficiency in PySpark, Apache Spark, and Databricks, combined with knowledge in analytics, data modeling, and Generative AI/Agentic AI solutions. This position suits individuals who excel at the convergence of data engineering, AI systems, and business insights, contributing to impactful programs with clients. Your responsibilities will include designing, constructing, and enhancing distributed data pipelines utilizing PySpark, Apache Spark, and Databricks to cater to both analytics and AI workloads. You will also be tasked with supporting RAG pipelines, embedding generation, and data pre-processing for LLM applications. Additionally, creating and maintaining interactive dashboards and BI reports using tools like Power BI, Tableau, or Looker for business stakeholders and consultants will be part of your role. Furthermore, your duties will involve conducting adhoc data analysis to facilitate data-driven decision-making and rapid insight generation. You will be expected to develop and sustain robust data warehouse schemas, star/snowflake models, and provide support for data lake architecture. Integration with and support for LLM agent frameworks like LangChain, LlamaIndex, Haystack, or CrewAI for intelligent workflow automation will also fall under your purview. In addition, ensuring data pipeline monitoring, cost optimization, and scalability in cloud environments (Azure/AWS/GCP) will be important aspects of your work. Collaboration with cross-functional teams, including AI scientists, analysts, and business teams to drive use-case delivery, is key. Lastly, maintaining robust data governance, lineage, and metadata management practices using tools such as Azure Purview or DataHub will also be part of your responsibilities.,