Job Summary
- Were looking for a Principal Cloud Engineer with a strong foundation in Multi-Cloud & multi region deployment, data architecture, distributed systems, and modern cloud-native platforms to architect, build, and maintain intelligent infrastructure and systems that power our AI, GenAI and data-intensive workloads.
- Youll work closely with cross-functional teams, including data scientists, ML & software engineers, and product managers & play a key role in designing a highly scalable platform to manage the lifecycle of data pipelines, APIs, real-time streaming, and agentic GenAI workflows, while enabling federated data architectures.
- The ideal candidate will have a strong background in building and maintaining scalable AI & Data Platform, optimizing workflows, and ensuring the reliability and performance of Data Platform systems.
Responsibilities
Cloud Architecture & Engineering
- Deep expertise in designing, implementing, and managing architectures across multiple cloud platforms (e.g., AWS, Azure, GCP)
- Proven experience in architecting hybrid and multi-cloud solutions, including interconnectivity, security, workload placement, and DR strategies
- Strong knowledge of cloud-native services (e.g., serverless, containers, managed databases, storage, networking)
- Experience with enterprise-grade IAM, security controls, and compliance frameworks across cloud environments
AI & GenAI Platform Integration
- Integrate LLM APIs (OpenAI, Gemini, Claude, etc.) into platform workflows for intelligent automation and enhanced user experience
- Build and orchestrate multi-agent systems using frameworks like CrewAI, LangGraph, or AutoGen for use cases such as pipeline debugging, code generation, and MLOps
- Experience in developing and integrating GenAI applications using MCP and orchestration of LLM-powered workflows (e.g., summarization, document Q&A, chatbot assistants, and intelligent data exploration)
- Hands-on expertise building and optimizing vector search and RAG pipelines using tools like Weaviate, Pinecone, or FAISS to support embedding-based retrieval and real-time semantic search across structured and unstructured datasets
Engineering Enablement
- Create extensible CLIs, SDKs, and blueprints to simplify onboarding, accelerate development, and standardize best practices
- Streamline onboarding, documentation, and platform implementation & support using GenAI and conversational interfaces
- Collaborate across teams to enforce cost, reliability, and security standards within platform blueprints.
- Work with engineering by introducing platform enhancements, observability, and cost optimization techniques
- Foster a culture of ownership, continuous learning, and innovation
Automation, IaC, CI/CD
- Mastery of Infrastructure as Code (IaC) tools especially Terraform, Terragrunt, and CloudFormation / ARM / Deployment Manager
- Experience building and managing cloud automation frameworks (e.g., using Python, Go, or Bash for orchestration and tooling)
- Hands-on experience with CI/CD pipelines (e.g., GitHub Actions) for cloud resource deployments
- Expertise in implementing policy-as-code & Compliance-as-code (e.g., Open Policy Agent, Sentinel)
Security, Governance & Cost
- Strong background in implementing cloud security best practices (network segmentation, encryption, secrets management, key management, etc.).
- Experience with multi-account / multi-subscription / multi-project governance models, including landing zones, service control policies, and organizational structures
- Ability to design for cost optimization, tagging strategies, and usage monitoring across cloud providers
Monitoring & Operations
- Familiarity with cloud monitoring, logging, and observability tools (e.g., CloudWatch, Azure Monitor, GCP Operations Suite, Datadog, Prometheus)
- Experience with incident management and building self-healing cloud architectures
Platform & Cloud Engineering
- Develop and maintain real-time and batch data pipelines using tools like Airflow, dbt, Dataform, and Dataflow/Spark
- Design and develop event-driven architectures using Apache Kafka, Google Pub/Sub, or equivalent messaging systems
- Build and expose high-performance data APIs and microservices to support downstream applications, ML workflows, and GenAI agents
- Architect and manage multi-cloud and hybrid cloud platforms (e.g., GCP, AWS, Azure) optimized for AI, ML, and real-time data processing workloads
- Build reusable frameworks and infrastructure-as-code (IaC) using Terraform, Kubernetes, and CI/CD to drive self-service and automation
- Ensure platform scalability, resilience, and cost efficiency through modern practices like GitOps, observability, and chaos engineering
Leadership & Collaboration
- Experience leading cloud architecture reviews, defining standards, and mentoring engineering teams
- Ability to work cross-functionally with security, networking, application, and data teams to deliver integrated cloud solutions
- Strong communication skills to engage stakeholders at various levels, from engineering to executives
Qualifications
- 15+ years of hands-on experience in Platform or Data Engineering, Cloud Architecture, Multi-Cloud Multi-Region Deployment & Architecture, AI Engineering roles
- Strong programming background in Java, Python, SQL, and one or more general-purpose languages
- Deep knowledge of data modeling, distributed systems, and API design in production environments
- Proficiency in designing and managing Kubernetes, serverless workloads, and streaming systems (Kafka, Pub/Sub, Flink, Spark)
- Experience with metadata management, data catalogs, data quality enforcement, and semantic modeling & automated integration with Data Platform
- Proven experience building scalable, efficient data pipelines for structured and unstructured data
- Experience with GenAI/LLM frameworks and tools for orchestration and workflow automation
- Experience with RAG pipelines, vector databases, and embedding-based search
- Familiarity with observability tools (Prometheus, Grafana, OpenTelemetry) and strong debugging skills across the stack
- Experience with ML Platforms (MLFlow, Vertex AI, Kubeflow) and AI/ML observability tools
- Prior implementation of data mesh or data fabric in a large-scale enterprise
- Experience with Looker Modeler, LookML, or semantic modeling layers
Preferred Certifications
- AWS Certified Solutions Architect Professional
- Google Professional Cloud Architect
- Microsoft Certified: Azure Solutions Architect Expert
- HashiCorp Certified: Terraform Associate
- Other relevant certifications (CKA, CKS, CISSP cloud concentration) are a plus.
Why Youll Love This Role
- Drive technical leadership across AI-native data platforms, automation systems, and self-service tools
- Collaborate across teams to shape the next generation of intelligent platforms in the enterprise
- Work with a high-energy, mission-driven team that embraces innovation, open-source, and experimentation