Let s do this. Let s change the world. We are looking for highly motivated expert Principal Data Engineer who can own the design & development of complex data pipelines, solutions and frameworks. The ideal candidate will be responsible to design, develop, and optimize data pipelines, data integration frameworks, and metadata-driven architectures that enable seamless data access and analytics. This role prefers deep expertise in big data processing, distributed computing, data modeling, and governance frameworks to support self-service analytics, AI-driven insights, and enterprise-wide data management.
Roles & Responsibilities:
-
Architect and maintain robust, scalable data pipelines using Databricks, Spark, and Delta Lake, enabling efficient batch and real-time processing.
-
Lead efforts to evaluate, adopt, and integrate emerging technologies and tools that enhance productivity, scalability, and data delivery capabilities.
-
Drive performance optimization efforts, including Spark tuning, resource utilization , job scheduling, and query improvements.
-
Identify and implement innovative solutions that streamline data ingestion, transformation, lineage tracking, and platform observability.
-
Build frameworks for metadata-driven data engineering, enabling reusability and consistency across pipelines.
-
Foster a culture of technical excellence, experimentation, and continuous improvement within the data engineering team.
-
Collaborate with platform, architecture, analytics, and governance teams to align platform enhancements with enterprise data strategy.
-
Define and uphold SLOs, monitoring standards, and data quality KPIs for production pipelines and infrastructure.
-
Partner with cross-functional teams to translate business needs into scalable, governed data products.
-
Mentor engineers across the team, promoting knowledge sharing and adoption of modern engineering patterns and tools.
-
Collaborate with cross-functional teams, including data architects, business analysts, and DevOps teams, to align data engineering strategies with enterprise goals.
-
Stay up to date with emerging data technologies and best practices, ensuring continuous improvement of Enterprise Data Fabric architectures.
Must-Have Skills:
-
Hands-on experience in data engineering technologies such as Databricks, PySpark, SparkSQL Apache Spark, AWS, Python, SQL, and Scaled Agile methodologies.
-
Proficiency in workflow orchestration, performance tuning on big data processing.
-
Strong understanding of AWS services
-
Experience with Data Fabric, Data Mesh, or similar enterprise-wide data architectures.
-
Ability to quickly learn, adapt and apply new technologies
-
Strong problem-solving and analytical skills
-
Excellent communication and teamwork skills
-
Experience with Scaled Agile Framework (SAFe), Agile delivery practices, and DevOps practices.
Good-to-Have Skills:
-
Good to have deep expertise in Biotech & Pharma industries
-
Experience in writing APIs to make the data available to the consumers
-
Experienced with SQL/NOSQL database, vector database for large language models
-
Experienced with data modeling and performance tuning for both OLAP and OLTP databases
-
Experienced with software engineering best-practices, including but not limited to version control (Git, Subversion, etc.), CI/CD (Jenkins, Maven etc.), automated unit testing, and Dev Ops
Education and Professional Certifications
-
12 to 17 years of experience in Computer Science, IT or related field
-
AWS Certified Data Engineer preferred
-
Databricks Certificate preferred
-
Scaled Agile SAFe certification preferred
Soft Skills:
-
Excellent analytical and troubleshooting skills.
-
Strong verbal and written communication skills
-
Ability to work effectively with global, virtual teams
-
High degree of initiative and self-motivation.
-
Ability to manage multiple priorities successfully.
-
Team-oriented, with a focus on achieving team goals.
-
Ability to learn quickly, be organized and detail oriented.
-
Strong presentation and public speaking skills.
EQUAL OPPORTUNITY STATEMENT
.