Jobs
Interviews

127 Kustomize Jobs

Setup a job Alert
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

5.0 - 10.0 years

7 - 17 Lacs

mumbai

Work from Office

An understanding of product development methodologies and microservices architecture. Hands-on experience with at least two major cloud providers (AWS, GCP, Azure). Multi-cloud experience is a strong advantage. Expertise in designing, implementing, and managing cloud architectures focusing on scalability, security, and resilience. Understanding and experience with cloud fundamentals like Networking, IAM, Compute, and Managed Services like DB, Storage, GKE/EKS, and KMS. Hands-on experience with cloud architecture design & setup. An in-depth understanding of Infrastructure as Code tools like Terraform, HELM is a must. Practical experience in deploying, maintaining, and scaling applications on Kubernetes clusters using Helm Charts or Kustomize Hands-on experience with any CI/CD tools like Gitlab CI, Jenkins, Github Actions. GitOps tools like ArgoCD, FluxCD is a must. Experience with Monitoring and Logging tools like Prometheus, Grafana and Elastic Stack. Experience working with PaaS is a plus Experience deploying on-prem data centre. Experience with k3s OSS / OpenShift / Rancher Kubernetes Cluster is a plus What are we looking for Learn, Architect & Build Skills & Technologies as highlighted above Product-Oriented Delivery Design, Build, and Operate Cloud Architecture & DevOps Pipeline Build on Open Source Technologies Collaboration with teams across 5 products GitOps Philosophy DevSecOps Mindset - Highly Secure Platform

Posted 1 day ago

Apply

0 years

0 Lacs

pune, maharashtra, india

On-site

SailPoint is the leader in identity security for the cloud enterprise. Our identity security solutions secure and enable thousands of companies worldwide, giving our customers unmatched visibility into the entirety of their digital workforce, ensuring workers have the right access to do their job – no more, no less. Built on a foundation of AI and ML, our Identity Security Cloud Platform delivers the right level of access to the right identities and resources at the right time—matching the scale, velocity, and changing needs of today’s cloud-oriented, modern enterprise. About the role: Want to be on a team that full of results-driven individuals who are constantly seeking to innovate? Want to make an impact? At SailPoint, our Data Platform team does just that. SailPoint is seeking a Senior Data/Software Engineer to help build robust data ingestion and processing system to power our data platform. We are looking for well-rounded engineers who are passionate about building and delivering reliable, scalable data pipelines. This is a unique opportunity to build something from scratch but have the backing of an organization that has the muscle to take it to market quickly, with a very satisfied customer base. Responsibilities : Spearhead the design and implementation of ELT processes, especially focused on extracting data from and loading data into various endpoints, including RDBMS, NoSQL databases and data-warehouses. Develop and maintain scalable data pipelines for both stream and batch processing leveraging JVM based languages and frameworks. Collaborate with cross-functional teams to understand diverse data sources and environment contexts, ensuring seamless integration into our data ecosystem. Utilize AWS service-stack wherever possible to implement lean design solutions for data storage, data integration and data streaming problems. Develop and maintain workflow orchestration using tools like Apache Airflow. Stay abreast of emerging technologies in the data engineering space, proactively incorporating them into our ETL processes. Thrive in an environment with ambiguity, demonstrating adaptability and problem-solving skills. Qualifications : BS in computer science or a related field. 5+ years of experience in data engineering or related field. Demonstrated system-design experience orchestrating ELT processes targeting data Must be willing to work 4 overlapping hours with US timezone. will work closely with US based managers and engineers Hands-on experience with at least one streaming or batch processing framework, such as Flink or Spark. Hands-on experience with containerization platforms such as Docker and container orchestration tools like Kubernetes. Proficiency in AWS service stack. Experience with DBT, Kafka, Jenkins and Snowflake. Experience leveraging tools such as Kustomize, Helm and Terraform for implementing infrastructure as code. Strong interest in staying ahead of new technologies in the data engineering space. Comfortable working in ambiguous team-situations, showcasing adaptability and drive in solving novel problems in the data-engineering space. Preferred Experience with AWS Experience with Continuous Delivery Experience instrumenting code for gathering production performance metrics Experience in working with a Data Catalog tool ( Ex: Atlan / Alation ) What success looks like in the role Within the first 30 days you will: Onboard into your new role, get familiar with our product offering and technology, proactively meet peers and stakeholders, set up your test and development environment. Seek to deeply understand business problems or common engineering challenges and propose software architecture designs to solve them elegantly by abstracting useful common patterns. By 90 days: Proactively collaborate on, discuss, debate and refine ideas, problem statements, and software designs with different (sometimes many) stakeholders, architects and members of your team. Take a committed approach to prototyping and co-implementing systems alongside less experienced engineers on your team—there’s no room for ivory towers here. By 6 months: Collaborates with Product Management and Engineering Lead to estimate and deliver small to medium complexity features more independently. Occasionally serve as a debugging and implementation expert during escalations of systems issues that have evaded the ability of less experienced engineers to solve in a timely manner. Share support of critical team systems by participating in calls with customers, learning the characteristics of currently running systems, and participating in improvements. SailPoint is an equal opportunity employer and we welcome everyone to our team. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or veteran status.

Posted 3 days ago

Apply

0 years

0 Lacs

pune, maharashtra, india

On-site

SailPoint is the leader in identity security for the cloud enterprise. Our identity security solutions secure and enable thousands of companies worldwide, giving our customers unmatched visibility into the entirety of their digital workforce, ensuring workers have the right access to do their job – no more, no less. Built on a foundation of AI and ML, our Identity Security Cloud Platform delivers the right level of access to the right identities and resources at the right time—matching the scale, velocity, and changing needs of today’s cloud-oriented, modern enterprise. About the role: Want to be on a team that full of results-driven individuals who are constantly seeking to innovate? Want to make an impact? At SailPoint, our Data Platform team does just that. SailPoint is seeking a Senior Data/Software Engineer to help build robust data ingestion and processing system to power our data platform. We are looking for well-rounded engineers who are passionate about building and delivering reliable, scalable data pipelines. This is a unique opportunity to build something from scratch but have the backing of an organization that has the muscle to take it to market quickly, with a very satisfied customer base. Responsibilities : Spearhead the design and implementation of ELT processes, especially focused on extracting data from and loading data into various endpoints, including RDBMS, NoSQL databases and data-warehouses. Develop and maintain scalable data pipelines for both stream and batch processing leveraging JVM based languages and frameworks. Collaborate with cross-functional teams to understand diverse data sources and environment contexts, ensuring seamless integration into our data ecosystem. Utilize AWS service-stack wherever possible to implement lean design solutions for data storage, data integration and data streaming problems. Develop and maintain workflow orchestration using tools like Apache Airflow. Stay abreast of emerging technologies in the data engineering space, proactively incorporating them into our ETL processes. Thrive in an environment with ambiguity, demonstrating adaptability and problem-solving skills. Qualifications : BS in computer science or a related field. 5+ years of experience in data engineering or related field. Demonstrated system-design experience orchestrating ELT processes targeting data Must be willing to work 4 overlapping hours with US timezone. will work closely with US based managers and engineers Hands-on experience with at least one streaming or batch processing framework, such as Flink or Spark. Hands-on experience with containerization platforms such as Docker and container orchestration tools like Kubernetes. Proficiency in AWS service stack. Experience with DBT, Kafka, Jenkins and Snowflake. Experience leveraging tools such as Kustomize, Helm and Terraform for implementing infrastructure as code. Strong interest in staying ahead of new technologies in the data engineering space. Comfortable working in ambiguous team-situations, showcasing adaptability and drive in solving novel problems in the data-engineering space. Preferred Experience with AWS Experience with Continuous Delivery Experience instrumenting code for gathering production performance metrics Experience in working with a Data Catalog tool ( Ex: Atlan / Alation ) What success looks like in the role Within the first 30 days you will: Onboard into your new role, get familiar with our product offering and technology, proactively meet peers and stakeholders, set up your test and development environment. Seek to deeply understand business problems or common engineering challenges and propose software architecture designs to solve them elegantly by abstracting useful common patterns. By 90 days: Proactively collaborate on, discuss, debate and refine ideas, problem statements, and software designs with different (sometimes many) stakeholders, architects and members of your team. Take a committed approach to prototyping and co-implementing systems alongside less experienced engineers on your team—there’s no room for ivory towers here. By 6 months: Collaborates with Product Management and Engineering Lead to estimate and deliver small to medium complexity features more independently. Occasionally serve as a debugging and implementation expert during escalations of systems issues that have evaded the ability of less experienced engineers to solve in a timely manner. Share support of critical team systems by participating in calls with customers, learning the characteristics of currently running systems, and participating in improvements. SailPoint is an equal opportunity employer and we welcome everyone to our team. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or veteran status.

Posted 3 days ago

Apply

10.0 years

0 Lacs

hyderabad, telangana, india

On-site

Skill: GCP Cloud Infra Architect/Lead Exp : 10+ Years Location : Hyderabad Work Mode : Hybrid Notice Period : Immediate/ 1 Week Qualifications : o 10+ years of experience in an Infrastructure Engineer or similar role. o Extensive experience with Google Cloud Platform (GCP) and/or Amazon Web Services (AWS). o Proven ability to architect for scale, availability, and high-performance workloads. o Deep knowledge of Infrastructure as Code (IaC) with Terraform. o Strong experience with Kubernetes and related tools (Helm, Kustomize). o Solid understanding of CI/CD pipelines and deployment strategies. o Experience with security, audit, and compliance best practices. o Excellent problem-solving and analytical skills. o Strong communication and interpersonal skills, with the ability to engage with both technical and non-technical stakeholders. o Experience in technical leadership and mentoring. o Experience with client relationship management and project planning. Certifications : ● Relevant certifications (e.g., Kubernetes Certified Administrator, Google Cloud Certified Professional Cloud Architect, Google Cloud Networking Certifications, Google Cloud Security Certifications, etc.). ● Software development experience (e.g., Terraform, Python). ● Experience with machine learning infrastructure. Education : ● Bachelor's degree in Computer Science, a related field, or equivalent experience. Regards, Sandeep Kumar sandeep.vinaganti@quesscorp.com

Posted 3 days ago

Apply

5.0 - 7.0 years

17 - 18 Lacs

mumbai

Work from Office

An understanding of product development methodologies and microservices architecture. Hands-on experience with at least two major cloud providers (AWS, GCP, Azure). Multi-cloud experience is a strong advantage. Expertise in designing, implementing, and managing cloud architectures focusing on scalability, security, and resilience. Understanding and experience with cloud fundamentals like Networking, IAM, Compute, and Managed Services like DB, Storage, GKE/EKS, and KMS. Hands-on experience with cloud architecture design & setup. An in-depth understanding of Infrastructure as Code tools like Terraform, HELM is a must. Practical experience in deploying, maintaining, and scaling applications on Kubernetes clusters using Helm Charts or Kustomize Hands-on experience with any CI/CD tools like Gitlab CI, Jenkins, Github Actions. GitOps tools like ArgoCD, FluxCD is a must. Experience with Monitoring and Logging tools like Prometheus, Grafana and Elastic Stack. Experience working with PaaS is a plus Experience deploying on-prem data centre. Experience with k3s OSS / OpenShift / Rancher Kubernetes Cluster is a plus What are we looking for Learn, Architect & Build Skills & Technologies as highlighted above Product-Oriented Delivery Design, Build, and Operate Cloud Architecture & DevOps Pipeline Build on Open Source Technologies Collaboration with teams across 5 products GitOps Philosophy DevSecOps Mindset - Highly Secure Platform

Posted 5 days ago

Apply

6.0 - 8.0 years

0 Lacs

bengaluru, karnataka, india

On-site

Title: Senior DevOps Engineer Reporting to: Senior Director, Product Development Location: Bengaluru (Bangalore) Opportunity: Responsibilities: Infrastructure Development & Integration Design, implement, and manage cloud-native infrastructure (AWS, Azure, GCP) to support healthcare platforms, AI agents, and clinical applications. Build and maintain scalable CI/CD pipelines to enable rapid and reliable delivery of software, data pipelines, and AI/ML models. Design and manage Kubernetes (K8s) clusters for container orchestration, workload scaling, and high availability with integrated monitoring to ensure cluster health and performance Implement Kubernetes-native tools (Helm, Kustomize, ArgoCD) for deployment automation and environment management ensuring observability through monitoring dashboards and alerts Collaborate with Staff Engineers/Architects to align infrastructure with enterprise goals for scalability, reliability, and performance leveraging monitoring insights to inform architectural decisions. System Optimization & Reliability Implement and maintain comprehensive monitoring, logging, and alerting mechanisms (Prometheus, Grafana, ELK, Datadog, AWS cloudwatch, AWS cloud trail) to ensure real-time visibility into system performance, resource utilization, and potential incidents. Implement monitoring, logging, and alerting mechanisms (Prometheus, Grafana, ELK, Datadog) to ensure system reliability and proactive incident response. Ensure data pipeline workflows (ETL/ELT, real-time streaming, batch processing) are observable, reliable, and auditable. Support observability and monitoring of GenAI pipelines, embeddings, vector databases, and agentic AI workflows. Proactively analyze monitoring data to identify bottlenecks, predict failures, and drive continuous improvement in system reliability. Compliance & Security Support audit trails and compliance reporting through automated DevSecOps practices. Implement security controls for LLM-based applications, AI agents, and healthcare data pipelines, including prompt injection prevention, API rate limiting, and data governance. Collaboration & Agile Practices Partner closely with software engineers, data engineers, AI/ML engineers, and product managers to deliver integrated, secure, and scalable solutions. Contribute to agile development processes including sprint planning, stand-ups, and retrospectives. Mentor junior engineers and share best practices in cloud-native infrastructure, CI/CD, Kubernetes, and automation. Innovation & Technical Expertise Stay informed about emerging DevOps practices, cloud-native architectures, MLOps/LLMOps, and data engineering tools. Prototype and evaluate new frameworks and tools to enhance infrastructure for data pipelines, GenAI, and Agentic AI applications. Advocate for best practices in infrastructure design, focusing on modularity, maintainability, and scalability. Requirements Education & Experience Bachelor&aposs or Master&aposs degree in Computer Science, Engineering, or related technical discipline. 6+ years of experience in DevOps, Site Reliability Engineering, or related roles, with at least 5+ years building cloud-native infrastructure. Proven track record of managing production-grade Kubernetes clusters and cloud infrastructure in regulated environments. Experience supporting GenAI/LLM applications (e.g., OpenAI, Hugging Face, LangChain) and vector databases (e.g., Pinecone, Weaviate, FAISS). Hands-on experience supporting data pipeline products using ETL/ELT frameworks (Apache Airflow, dbt, Prefect) and streaming systems (Kafka, Spark, Flink). Experience deploying AI agents and orchestrating agent workflows in production environments. Technical Proficiency Expertise in Kubernetes (K8s) for orchestration, scaling, and managing containerized applications. Strong proficiency in containerization (Docker) and Kubernetes ecosystem tools (Helm, ArgoCD, Istio/Linkerd for service mesh). Hands-on experience with Infrastructure as Code (Terraform, CloudFormation, or Pulumi). Proficiency with CI/CD tools (Jenkins, GitHub Actions, GitLab CI, ArgoCD, Spinnaker). Familiarity with monitoring and observability tools (Prometheus, Grafana, ELK, Datadog, AWS cloud watch and AWS cloud trail), including setting up dashboards, alerts, and custom metrics for cloud-native and AI systems. Good to have: knowledge of healthcare data standards (FHIR, HL7) and secure deployment practices for AI/ML and data pipelines. Professional Skills Strong problem-solving skills with a focus on reliability, scalability, and security. Excellent collaboration and communication skills across cross-functional teams. Proactive, detail-oriented, and committed to technical excellence in a fast-paced healthcare environment. About Get Well: Now part of the SAI Group family, Get Well is redefining digital patient engagement by putting patients in control of their personalized healthcare journeys, both inside and outside the hospital. Get Well is combining high-tech AI navigation with high-touch care experiences driving patient activation, loyalty, and outcomes while reducing the cost of care. For almost 25 years, Get Well has served more than 10 million patients per year across over 1,000 hospitals and clinical partner sites, working to use longitudinal data analytics to better serve patients and clinicians. AI innovator SAI Group led by Chairman Romesh Wadhwani is the lead growth investor in Get Well. Get Well&aposs award-winning solutions were recognized again in 2024 by KLAS Research and AVIA Marketplace. Learn more at Get Well and follow-us on LinkedIn?and Twitter. Get Well is proud to be an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability, age or veteran status. About SAI Group: SAIGroup commits to $1 Billion capital, an advanced AI platform that currently processes 300M+ patients, and 4000+ global employee base to solve enterprise AI and high priority healthcare problems. SAIGroup - Growing companies with advanced AI; https://www.cnbc.com/2023/12/08/75-year-old-tech-mogul-betting-1-billion-of-his-fortune-on-ai-future.html Bio of our Chairman Dr. Romesh Wadhwani: Team - SAIGroup (Informal at Romesh Wadhwani - Wikipedia) TIME Magazine recently recognized Chairman Romesh Wadhwani as one of the Top 100 AI leaders in the world - Romesh and Sunil Wadhwani: The 100 Most Influential People in AI 2023 | TIME Show more Show less

Posted 5 days ago

Apply

5.0 - 7.0 years

0 Lacs

bengaluru, karnataka, india

On-site

Title: DevOps Engineer Reporting to: Senior Director, Product Development Location: Bengaluru (Bangalore) Opportunity: Responsibilities: Infrastructure Development & Integration Design, implement, and manage cloud-native infrastructure (AWS, Azure, GCP) to support healthcare platforms, AI agents, and clinical applications. Build and maintain scalable CI/CD pipelines to enable rapid and reliable delivery of software, data pipelines, and AI/ML models. Design and manage Kubernetes (K8s) clusters for container orchestration, workload scaling, and high availability with integrated monitoring to ensure cluster health and performance Implement Kubernetes-native tools (Helm, Kustomize, ArgoCD) for deployment automation and environment management ensuring observability through monitoring dashboards and alerts Collaborate with Staff Engineers/Architects to align infrastructure with enterprise goals for scalability, reliability, and performance leveraging monitoring insights to inform architectural decisions. System Optimization & Reliability Implement and maintain comprehensive monitoring, logging, and alerting mechanisms (Prometheus, Grafana, ELK, Datadog, AWS cloudwatch, AWS cloud trail) to ensure real-time visibility into system performance, resource utilization, and potential incidents. Implement monitoring, logging, and alerting mechanisms (Prometheus, Grafana, ELK, Datadog) to ensure system reliability and proactive incident response. Ensure data pipeline workflows (ETL/ELT, real-time streaming, batch processing) are observable, reliable, and auditable. Support observability and monitoring of GenAI pipelines, embeddings, vector databases, and agentic AI workflows. Proactively analyze monitoring data to identify bottlenecks, predict failures, and drive continuous improvement in system reliability. Compliance & Security Support audit trails and compliance reporting through automated DevSecOps practices. Implement security controls for LLM-based applications, AI agents, and healthcare data pipelines, including prompt injection prevention, API rate limiting, and data governance. Collaboration & Agile Practices Partner closely with software engineers, data engineers, AI/ML engineers, and product managers to deliver integrated, secure, and scalable solutions. Contribute to agile development processes including sprint planning, stand-ups, and retrospectives. Mentor junior engineers and share best practices in cloud-native infrastructure, CI/CD, Kubernetes, and automation. Innovation & Technical Expertise Stay informed about emerging DevOps practices, cloud-native architectures, MLOps/LLMOps, and data engineering tools. Prototype and evaluate new frameworks and tools to enhance infrastructure for data pipelines, GenAI, and Agentic AI applications. Advocate for best practices in infrastructure design, focusing on modularity, maintainability, and scalability. Requirements Education & Experience Bachelor&aposs or Master&aposs degree in Computer Science, Engineering, or related technical discipline. 5+ years of experience in DevOps, Site Reliability Engineering, or related roles, with at least 3+ years building cloud-native infrastructure. Proven track record of managing production-grade Kubernetes clusters and cloud infrastructure in regulated environments. Experience supporting GenAI/LLM applications (e.g., OpenAI, Hugging Face, LangChain) and vector databases (e.g., Pinecone, Weaviate, FAISS). Hands-on experience supporting data pipeline products using ETL/ELT frameworks (Apache Airflow, dbt, Prefect) and streaming systems (Kafka, Spark, Flink). Experience deploying AI agents and orchestrating agent workflows in production environments. Technical Proficiency Expertise in Kubernetes (K8s) for orchestration, scaling, and managing containerized applications. Strong proficiency in containerization (Docker) and Kubernetes ecosystem tools (Helm, ArgoCD, Istio/Linkerd for service mesh). Hands-on experience with Infrastructure as Code (Terraform, CloudFormation, or Pulumi). Proficiency with CI/CD tools (Jenkins, GitHub Actions, GitLab CI, ArgoCD, Spinnaker). Familiarity with monitoring and observability tools (Prometheus, Grafana, ELK, Datadog, AWS cloud watch and AWS cloud trail), including setting up dashboards, alerts, and custom metrics for cloud-native and AI systems. Good to have: knowledge of healthcare data standards (FHIR, HL7) and secure deployment practices for AI/ML and data pipelines. Professional Skills Strong problem-solving skills with a focus on reliability, scalability, and security. Excellent collaboration and communication skills across cross-functional teams. Proactive, detail-oriented, and committed to technical excellence in a fast-paced healthcare environment. About Get Well: Now part of the SAI Group family, Get Well is redefining digital patient engagement by putting patients in control of their personalized healthcare journeys, both inside and outside the hospital. Get Well is combining high-tech AI navigation with high-touch care experiences driving patient activation, loyalty, and outcomes while reducing the cost of care. For almost 25 years, Get Well has served more than 10 million patients per year across over 1,000 hospitals and clinical partner sites, working to use longitudinal data analytics to better serve patients and clinicians. AI innovator SAI Group led by Chairman Romesh Wadhwani is the lead growth investor in Get Well. Get Well&aposs award-winning solutions were recognized again in 2024 by KLAS Research and AVIA Marketplace. Learn more at Get Well and follow-us on LinkedIn?and Twitter. Get Well is proud to be an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability, age or veteran status. About SAI Group: SAIGroup commits to $1 Billion capital, an advanced AI platform that currently processes 300M+ patients, and 4000+ global employee base to solve enterprise AI and high priority healthcare problems. SAIGroup - Growing companies with advanced AI; https://www.cnbc.com/2023/12/08/75-year-old-tech-mogul-betting-1-billion-of-his-fortune-on-ai-future.html Bio of our Chairman Dr. Romesh Wadhwani: Team - SAIGroup (Informal at Romesh Wadhwani - Wikipedia) TIME Magazine recently recognized Chairman Romesh Wadhwani as one of the Top 100 AI leaders in the world - Romesh and Sunil Wadhwani: The 100 Most Influential People in AI 2023 | TIME Show more Show less

Posted 5 days ago

Apply

10.0 - 20.0 years

40 - 90 Lacs

chennai

Hybrid

Short Description: We are seeking a highly skilled and passionate GKE Platform Engineer to join our growing team. This role is ideal for someone with deep experience in managing Google Kubernetes Engine (GKE) platforms at scale, particularly with enterprise-level workloads on Google Cloud Platform (GCP). As part of a dynamic team, you will design, develop, and optimize Kubernetes-based solutions, using tools like GitHub Actions, ACM, KCC, and workload identity to provide high-quality platform services to developers. You will drive CI/CD pipelines across multiple lifecycle stages, manage GKE environments on a scale, and enhance the developer experience on the platform. You should have a strong mindset for developer experience, focused on creating reliable, scalable, and efficient infrastructure to support developer needs. This is a fast-paced environment where collaboration across teams is key to delivering impactful results. Responsibilities: GKE Platform Management at Scale: Manage and optimize large-scale GKE environments in a multi-cloud and hybrid-cloud context, ensuring the platform is highly available, scalable, and secure. CI/CD Pipeline Development: Build and maintain CI/CD pipelines using tools like GitHub Actions to automate deployment workflows across the GKE platform. Ensure smooth integration and delivery of services throughout their lifecycle. Enterprise GKE Management: Leverage advanced features of GKE such as ACM (Anthos Config Management) and KCC (Kubernetes Cluster Config) to manage GKE clusters efficiently at the enterprise scale. Workload Identity & Security: Implement workload identity and security best practices to ensure secure access and management of GKE workloads. Custom Operators & Controllers: Develop custom operators and controllers for GKE, automating the deployment and management of custom services to enhance the developer experience on the platform. Developer Experience Focus: Maintain a developer-first mindset to create an intuitive, reliable, and easy-to-use platform for developers. Collaborate with development teams to ensure seamless integration with the GKE platform. GKE Deployment Pipelines: Provide guidelines and best practices for GKE deployment pipelines, leveraging tools like Kustomize and Helm to manage and deploy GKE configurations effectively. Ensure pipelines are optimized for scalability, security, and repeatability. Zero Trust Model: Ensure GKE clusters operate effectively within a Zero Trust security model. Maintain a strong understanding of the principles of Zero Trust security, including identity and access management, network segmentation, and workload authentication. Ingress Patterns: Design and manage multi-cluster and multi-regional ingress patterns to ensure seamless traffic management and high availability across geographically distributed Kubernetes clusters. Deep Troubleshooting & Support: Provide deep troubleshooting knowledge and support to help developers pinpoint issues across the GKE platform, focusing on debugging complex Kubernetes issues, application failures, and performance bottlenecks. Utilize diagnostic tools and debugging techniques to resolve critical platform-related issues. Observability & Logging Tools: Implement and maintain observability across GKE clusters, using monitoring, logging, and alerting tools like Prometheus , Dynatrace , and Splunk . Ensure proper logging and metrics are in place to enable developers to effectively monitor and diagnose issues within their applications. Platform Automation & Integration: Automate platform management tasks, such as scaling, upgrading, and patching, using tools like Terraform, Helm, and GKE APIs. Continuous Improvement & Learning: Stay up-to-date with the latest trends and advancements in Kubernetes, GKE, and Google Cloud services to continuously improve platform capabilities. Qualifications: Experience: 8+ years of overall experience in cloud platform engineering, infrastructure management, and enterprise-scale operations. 5+ years of hands-on experience with Google Cloud Platform (GCP) , including designing, deploying, and managing cloud infrastructure and services. 5+ years of experience specifically with Google Kubernetes Engine (GKE) , managing large-scale, production-grade clusters in enterprise environments. Experience with deploying, scaling, and maintaining GKE clusters in production environments. Hands-on experience with CI/CD practices and automation tools like GitHub Actions. Proven track record of building and managing GKE platforms in a fast-paced, dynamic environment. Experience developing custom Kubernetes operators and controllers for managing complex workloads. Deep Troubleshooting Knowledge: Strong ability to troubleshoot complex platform issues, with expertise in diagnosing problems across the entire GKE stack. Technical Skills: Must Have: Google Cloud Platform (GCP): Extensive hands-on experience with GCP, particularly Kubernetes Engine (GKE), Cloud Storage, Cloud Pub/Sub, Cloud Logging, and Cloud Monitoring. Kubernetes (GKE) at Scale: Expertise in managing large-scale GKE clusters, including security configurations, networking, and workload management. CI/CD Automation: Strong experience with CI/CD pipeline automation tools, particularly GitHub Actions , for building, testing, and deploying applications. Kubernetes Operators & Controllers: Ability to develop custom Kubernetes operators and controllers to automate and manage applications on GKE. Workload Identity & Security: Solid understanding of Kubernetes workload identity and access management (IAM) best practices, including integration with GCP Identity and Google Cloud IAM. Anthos & ACM: Hands-on experience with Anthos Config Management (ACM) and Kubernetes Cluster Config (KCC) to manage and govern GKE clusters and workloads at scale. Infrastructure as Code (IaC): Experience with tools like Terraform to manage GKE infrastructure and cloud resources. Helm & Kustomize: Experience in using Helm and Kustomize for packaging, deploying, and managing Kubernetes resources efficiently. Ability to create reusable and scalable Kubernetes deployment templates. Observability & Logging Tools: Experience with observability tools such as Prometheus , Dynatrace , and Splunk to monitor and log GKE performance, providing developers with actionable insights for troubleshooting. Nice to Have: Zero Trust Security Model: Strong understanding of implementing and maintaining security in a Zero Trust model for GKE, including workload authentication, identity management, and network security. Ingress Patterns: Experience with designing and managing multi-cluster and multi-regional ingress in Kubernetes to ensure fault tolerance, traffic management, and high availability. Familiarity with Open Policy Agent (OPA) for policy enforcement in Kubernetes environments. Education & Certification: Bachelors degree in Computer Science, Engineering, or a related field. Relevant GCP certifications, such as Google Cloud Certified Professional Cloud Architect or Google Cloud Certified Professional Cloud Developer . Soft Skills: Collaboration: Strong ability to work with cross-functional teams to ensure platform solutions meet development and operational needs. Problem-Solving: Excellent problem-solving skills with a focus on troubleshooting and performance optimization. Communication: Strong written and verbal communication skills, able to communicate effectively with both technical and non-technical teams. Initiative & Ownership: Ability to take ownership of platform projects, driving them from conception to deployment with minimal supervision. Adaptability: Willingness to learn new technologies and adjust to evolving business needs.

Posted 5 days ago

Apply

5.0 years

0 Lacs

bengaluru, karnataka, india

On-site

Title: DevOps Engineer Reporting to: Senior Director, Product Development Location: Bengaluru (Bangalore) Opportunity: Responsibilities: Infrastructure Development & Integration Design, implement, and manage cloud-native infrastructure (AWS, Azure, GCP) to support healthcare platforms, AI agents, and clinical applications. Build and maintain scalable CI/CD pipelines to enable rapid and reliable delivery of software, data pipelines, and AI/ML models. Design and manage Kubernetes (K8s) clusters for container orchestration, workload scaling, and high availability with integrated monitoring to ensure cluster health and performance Implement Kubernetes-native tools (Helm, Kustomize, ArgoCD) for deployment automation and environment management ensuring observability through monitoring dashboards and alerts Collaborate with Staff Engineers/Architects to align infrastructure with enterprise goals for scalability, reliability, and performance leveraging monitoring insights to inform architectural decisions. System Optimization & Reliability Implement and maintain comprehensive monitoring, logging, and alerting mechanisms (Prometheus, Grafana, ELK, Datadog, AWS cloudwatch, AWS cloud trail) to ensure real-time visibility into system performance, resource utilization, and potential incidents. Implement monitoring, logging, and alerting mechanisms (Prometheus, Grafana, ELK, Datadog) to ensure system reliability and proactive incident response. Ensure data pipeline workflows (ETL/ELT, real-time streaming, batch processing) are observable, reliable, and auditable. Support observability and monitoring of GenAI pipelines, embeddings, vector databases, and agentic AI workflows. Proactively analyze monitoring data to identify bottlenecks, predict failures, and drive continuous improvement in system reliability. Compliance & Security Support audit trails and compliance reporting through automated DevSecOps practices. Implement security controls for LLM-based applications, AI agents, and healthcare data pipelines, including prompt injection prevention, API rate limiting, and data governance. Collaboration & Agile Practices Partner closely with software engineers, data engineers, AI/ML engineers, and product managers to deliver integrated, secure, and scalable solutions. Contribute to agile development processes including sprint planning, stand-ups, and retrospectives. Mentor junior engineers and share best practices in cloud-native infrastructure, CI/CD, Kubernetes, and automation. Innovation & Technical Expertise Stay informed about emerging DevOps practices, cloud-native architectures, MLOps/LLMOps, and data engineering tools. Prototype and evaluate new frameworks and tools to enhance infrastructure for data pipelines, GenAI, and Agentic AI applications. Advocate for best practices in infrastructure design, focusing on modularity, maintainability, and scalability. Requirements Education & Experience Bachelor's or Master's degree in Computer Science, Engineering, or related technical discipline. 5+ years of experience in DevOps, Site Reliability Engineering, or related roles, with at least 3+ years building cloud-native infrastructure. Proven track record of managing production-grade Kubernetes clusters and cloud infrastructure in regulated environments. Experience supporting GenAI/LLM applications (e.g., OpenAI, Hugging Face, LangChain) and vector databases (e.g., Pinecone, Weaviate, FAISS). Hands-on experience supporting data pipeline products using ETL/ELT frameworks (Apache Airflow, dbt, Prefect) and streaming systems (Kafka, Spark, Flink). Experience deploying AI agents and orchestrating agent workflows in production environments. Technical Proficiency Expertise in Kubernetes (K8s) for orchestration, scaling, and managing containerized applications. Strong proficiency in containerization (Docker) and Kubernetes ecosystem tools (Helm, ArgoCD, Istio/Linkerd for service mesh). Hands-on experience with Infrastructure as Code (Terraform, CloudFormation, or Pulumi). Proficiency with CI/CD tools (Jenkins, GitHub Actions, GitLab CI, ArgoCD, Spinnaker). Familiarity with monitoring and observability tools (Prometheus, Grafana, ELK, Datadog, AWS cloud watch and AWS cloud trail), including setting up dashboards, alerts, and custom metrics for cloud-native and AI systems. Good to have: knowledge of healthcare data standards (FHIR, HL7) and secure deployment practices for AI/ML and data pipelines. Professional Skills Strong problem-solving skills with a focus on reliability, scalability, and security. Excellent collaboration and communication skills across cross-functional teams. Proactive, detail-oriented, and committed to technical excellence in a fast-paced healthcare environment. About Get Well: Now part of the SAI Group family, Get Well is redefining digital patient engagement by putting patients in control of their personalized healthcare journeys, both inside and outside the hospital. Get Well is combining high-tech AI navigation with high-touch care experiences driving patient activation, loyalty, and outcomes while reducing the cost of care. For almost 25 years, Get Well has served more than 10 million patients per year across over 1,000 hospitals and clinical partner sites, working to use longitudinal data analytics to better serve patients and clinicians. AI innovator SAI Group led by Chairman Romesh Wadhwani is the lead growth investor in Get Well. Get Well's award-winning solutions were recognized again in 2024 by KLAS Research and AVIA Marketplace. Learn more at Get Well and follow-us on LinkedIn and Twitter. Get Well is proud to be an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability, age or veteran status. About SAI Group: SAIGroup commits to $1 Billion capital, an advanced AI platform that currently processes 300M+ patients, and 4000+ global employee base to solve enterprise AI and high priority healthcare problems. SAIGroup - Growing companies with advanced AI; https://www.cnbc.com/2023/12/08/75-year-old-tech-mogul-betting-1-billion-of-his-fortune-on-ai-future.html Bio of our Chairman Dr. Romesh Wadhwani: Team - SAIGroup (Informal at Romesh Wadhwani - Wikipedia) TIME Magazine recently recognized Chairman Romesh Wadhwani as one of the Top 100 AI leaders in the world - Romesh and Sunil Wadhwani: The 100 Most Influential People in AI 2023 | TIME

Posted 6 days ago

Apply

5.0 years

1 - 5 Lacs

hyderābād

Remote

Software Engineer (MLOps) Hyderabad, Telangana, India Date posted Sep 08, 2025 Job number 1872301 Work site Up to 50% work from home Travel 0-25 % Role type Individual Contributor Profession Software Engineering Discipline Software Engineering Employment type Full-Time Overview Security represents the most critical priorities for our customers in a world awash in digital threats, regulatory scrutiny, and estate complexity. Microsoft Security aspires to make the world a safer place for all. We want to reshape security and empower every user, customer, and developer with a security cloud that protects them with end to end, simplified solutions. The Microsoft Security organization accelerates Microsoft’s mission and bold ambitions to ensure that our company and industry is securing digital technology platforms, devices, and clouds in our customers’ heterogeneous environments, as well as ensuring the security of our own internal estate. Our culture is centered on embracing a growth mindset, a theme of inspiring excellence, and encouraging teams and leaders to bring their best each day. In doing so, we create life-changing innovations that impact billions of lives around the world. If you are passionate about cybersecurity, data engineering, and MLOps, the M365 Security Engineering team at Microsoft offers an exciting opportunity to work on advanced solutions that safeguard Microsoft services against evolving cyber threats. Our team is dedicated to building scalable, reliable, and secure data and ML infrastructure that powers mission-critical AI-driven detections across Microsoft’s cloud ecosystem. We value diversity, deep collaboration, and technical excellence, bringing together engineers with expertise in large-scale software systems, security analysis, big data, and machine learning. You will join a group that thrives on analyzing billions of events and terabytes of data generated daily by Microsoft products and services (e.g., Azure, M365), uncovering evidence of suspicious activities and continuously improving detection capabilities. As part of our team, you will help ensure that critical security components and detection pipelines are present, robust, and up to date throughout the infrastructure. You will collaborate with Data Scientists, Security Researchers, and Platform Engineers to deliver seamless, secure, and innovative data workflows, driving the future of AI-powered cybersecurity at Microsoft. Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond. In alignment with our Microsoft values, we are committed to cultivating an inclusive work environment for all employees to positively impact our culture every day. Qualifications 5-8 years' experience in a data infrastructure, DevOps, SRE, or MLOps role supporting high-volume, low-latency data systems. Hands-on experience with data architectures, ETL pipelines, and feature engineering for ML workflows Expertise in data mining, data storage, and Extract-Transform-Load (ETL) processes. Experience with DevOps practices, CI/CD pipelines, and infrastructure automation using tools like Bicep, Terraform, ARM, or similar. Familiarity with distributed systems, containerization (Kubernetes, Docker), and orchestration frameworks. Solid understanding of scripting language preferably Python, Scala and pyspark is more suited. Familiarity with Azure ML/Data Analytics tools such as Synapse Experience with DevOps practices and managing CI/CD pipelines. Experience with MLOps tooling for model governance, monitoring, and compliance. Preferred Qualifications: Bachelor's degree in computer science, Mathematics, or a related field. 5-8 years' experience in a data infrastructure, DevOps, SRE, or MLOps role supporting high-volume, low-latency data systems. 5-8 years' experience managing and scaling distributed systems, from bare metal to Kubernetes, including deep knowledge across the full stack. 5+ years building and deploying containerized applications with Kubernetes and Helm/Kustomize. Proficiency in scripting and automation using languages such as Python, Bash, or PowerShell with proven experience in automating operational tasks, including health checks, alerting, and observability for data and ML systems. Demonstrated success in troubleshooting and supporting critical production systems with managing CI/CD pipelines and release automation. Responsibilities Collaborate with Data Scientists and Security Researchers to extract, transform, and load (ETL) data for exploratory analysis and ML experiments. Build and maintain scalable feature engineering pipelines, leveraging POC code and stakeholder inputs. Ensure data quality, integrity, and readiness for ML model development. Develop robust, scalable, and maintainable ML training and inference pipelines. Standardize components using SDKs/frameworks for reusability and consistency. Implement model monitoring (accuracy, false positives, etc.) and enable continuous improvement. Conduct performance, scalability, and reliability testing of ML pipelines. Automate infrastructure provisioning (e.g., using Bicep, Terraform, ARM) and manage CI/CD pipelines for ML workflows. Deploy ML pipelines to staging and production environments, provisioning dependencies (linked services, storage, Spark pools/clusters). Implement service monitoring, incident response, and operational health checks for data and ML systems. Develop self-service tooling to streamline productivity for developers and researchers. Work closely with AI researchers, platform engineers, and application developers to deliver seamless, secure data workflows. Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.  Industry leading healthcare  Educational resources  Discounts on products and services  Savings and investments  Maternity and paternity leave  Generous time away  Giving programs  Opportunities to network and connect Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.

Posted 1 week ago

Apply

5.0 years

0 Lacs

bengaluru, karnataka, india

On-site

Title: DevOps Engineer Reporting to: Senior Director, Product Development Location: Bengaluru (Bangalore) Opportunity Responsibilities: Infrastructure Development & Integration Design, implement, and manage cloud-native infrastructure (AWS, Azure, GCP) to support healthcare platforms, AI agents, and clinical applications. Build and maintain scalable CI/CD pipelines to enable rapid and reliable delivery of software, data pipelines, and AI/ML models. Design and manage Kubernetes (K8s) clusters for container orchestration, workload scaling, and high availability with integrated monitoring to ensure cluster health and performance Implement Kubernetes-native tools (Helm, Kustomize, ArgoCD) for deployment automation and environment management ensuring observability through monitoring dashboards and alerts Collaborate with Staff Engineers/Architects to align infrastructure with enterprise goals for scalability, reliability, and performance leveraging monitoring insights to inform architectural decisions. System Optimization & Reliability Implement and maintain comprehensive monitoring, logging, and alerting mechanisms (Prometheus, Grafana, ELK, Datadog, AWS cloudwatch, AWS cloud trail) to ensure real-time visibility into system performance, resource utilization, and potential incidents. Implement monitoring, logging, and alerting mechanisms (Prometheus, Grafana, ELK, Datadog) to ensure system reliability and proactive incident response. Ensure data pipeline workflows (ETL/ELT, real-time streaming, batch processing) are observable, reliable, and auditable. Support observability and monitoring of GenAI pipelines, embeddings, vector databases, and agentic AI workflows. Proactively analyze monitoring data to identify bottlenecks, predict failures, and drive continuous improvement in system reliability. Compliance & Security Support audit trails and compliance reporting through automated DevSecOps practices. Implement security controls for LLM-based applications, AI agents, and healthcare data pipelines, including prompt injection prevention, API rate limiting, and data governance. Collaboration & Agile Practices Partner closely with software engineers, data engineers, AI/ML engineers, and product managers to deliver integrated, secure, and scalable solutions. Contribute to agile development processes including sprint planning, stand-ups, and retrospectives. Mentor junior engineers and share best practices in cloud-native infrastructure, CI/CD, Kubernetes, and automation. Innovation & Technical Expertise Stay informed about emerging DevOps practices, cloud-native architectures, MLOps/LLMOps, and data engineering tools. Prototype and evaluate new frameworks and tools to enhance infrastructure for data pipelines, GenAI, and Agentic AI applications. Advocate for best practices in infrastructure design, focusing on modularity, maintainability, and scalability. Requirements Education & Experience Bachelor’s or Master’s degree in Computer Science, Engineering, or related technical discipline. 5+ years of experience in DevOps, Site Reliability Engineering, or related roles, with at least 3+ years building cloud-native infrastructure. Proven track record of managing production-grade Kubernetes clusters and cloud infrastructure in regulated environments. Experience supporting GenAI/LLM applications (e.g., OpenAI, Hugging Face, LangChain) and vector databases (e.g., Pinecone, Weaviate, FAISS). Hands-on experience supporting data pipeline products using ETL/ELT frameworks (Apache Airflow, dbt, Prefect) and streaming systems (Kafka, Spark, Flink). Experience deploying AI agents and orchestrating agent workflows in production environments. Technical Proficiency Expertise in Kubernetes (K8s) for orchestration, scaling, and managing containerized applications. Strong proficiency in containerization (Docker) and Kubernetes ecosystem tools (Helm, ArgoCD, Istio/Linkerd for service mesh). Hands-on experience with Infrastructure as Code (Terraform, CloudFormation, or Pulumi). Proficiency with CI/CD tools (Jenkins, GitHub Actions, GitLab CI, ArgoCD, Spinnaker). Familiarity with monitoring and observability tools (Prometheus, Grafana, ELK, Datadog, AWS cloud watch and AWS cloud trail), including setting up dashboards, alerts, and custom metrics for cloud-native and AI systems. Good to have: knowledge of healthcare data standards (FHIR, HL7) and secure deployment practices for AI/ML and data pipelines. Professional Skills Strong problem-solving skills with a focus on reliability, scalability, and security. Excellent collaboration and communication skills across cross-functional teams. Proactive, detail-oriented, and committed to technical excellence in a fast-paced healthcare environment. About Get Well Now part of the SAI Group family, Get Well is redefining digital patient engagement by putting patients in control of their personalized healthcare journeys, both inside and outside the hospital. Get Well is combining high-tech AI navigation with high-touch care experiences driving patient activation, loyalty, and outcomes while reducing the cost of care. For almost 25 years, Get Well has served more than 10 million patients per year across over 1,000 hospitals and clinical partner sites, working to use longitudinal data analytics to better serve patients and clinicians. AI innovator SAI Group led by Chairman Romesh Wadhwani is the lead growth investor in Get Well. Get Well’s award-winning solutions were recognized again in 2024 by KLAS Research and AVIA Marketplace. Learn more at Get Well and follow-us on LinkedIn and Twitter. Get Well is proud to be an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability, age or veteran status. About SAI Group SAIGroup commits to $1 Billion capital, an advanced AI platform that currently processes 300M+ patients, and 4000+ global employee base to solve enterprise AI and high priority healthcare problems. SAIGroup - Growing companies with advanced AI; https://www.cnbc.com/2023/12/08/75-year-old-tech-mogul-betting-1-billion-of-his-fortune-on-ai-future.html Bio of our Chairman Dr. Romesh Wadhwani: Team - SAIGroup (Informal at Romesh Wadhwani - Wikipedia) TIME Magazine recently recognized Chairman Romesh Wadhwani as one of the Top 100 AI leaders in the world - Romesh and Sunil Wadhwani: The 100 Most Influential People in AI 2023 | TIME

Posted 1 week ago

Apply

5.0 years

0 Lacs

hyderabad, telangana, india

On-site

Security represents the most critical priorities for our customers in a world awash in digital threats, regulatory scrutiny, and estate complexity. Microsoft Security aspires to make the world a safer place for all. We want to reshape security and empower every user, customer, and developer with a security cloud that protects them with end to end, simplified solutions. The Microsoft Security organization accelerates Microsoft’s mission and bold ambitions to ensure that our company and industry is securing digital technology platforms, devices, and clouds in our customers’ heterogeneous environments, as well as ensuring the security of our own internal estate. Our culture is centered on embracing a growth mindset, a theme of inspiring excellence, and encouraging teams and leaders to bring their best each day. In doing so, we create life-changing innovations that impact billions of lives around the world. If you are passionate about cybersecurity, data engineering, and MLOps, the M365 Security Engineering team at Microsoft offers an exciting opportunity to work on advanced solutions that safeguard Microsoft services against evolving cyber threats. Our team is dedicated to building scalable, reliable, and secure data and ML infrastructure that powers mission-critical AI-driven detections across Microsoft’s cloud ecosystem. We value diversity, deep collaboration, and technical excellence, bringing together engineers with expertise in large-scale software systems, security analysis, big data, and machine learning. You will join a group that thrives on analyzing billions of events and terabytes of data generated daily by Microsoft products and services (e.g., Azure, M365), uncovering evidence of suspicious activities and continuously improving detection capabilities. As part of our team, you will help ensure that critical security components and detection pipelines are present, robust, and up to date throughout the infrastructure. You will collaborate with Data Scientists, Security Researchers, and Platform Engineers to deliver seamless, secure, and innovative data workflows, driving the future of AI-powered cybersecurity at Microsoft. Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond. In alignment with our Microsoft values, we are committed to cultivating an inclusive work environment for all employees to positively impact our culture every day. Responsibilities Collaborate with Data Scientists and Security Researchers to extract, transform, and load (ETL) data for exploratory analysis and ML experiments. Build and maintain scalable feature engineering pipelines, leveraging POC code and stakeholder inputs. Ensure data quality, integrity, and readiness for ML model development. Develop robust, scalable, and maintainable ML training and inference pipelines. Standardize components using SDKs/frameworks for reusability and consistency. Implement model monitoring (accuracy, false positives, etc.) and enable continuous improvement. Conduct performance, scalability, and reliability testing of ML pipelines. Automate infrastructure provisioning (e.g., using Bicep, Terraform, ARM) and manage CI/CD pipelines for ML workflows. Deploy ML pipelines to staging and production environments, provisioning dependencies (linked services, storage, Spark pools/clusters). Implement service monitoring, incident response, and operational health checks for data and ML systems. Develop self-service tooling to streamline productivity for developers and researchers. Work closely with AI researchers, platform engineers, and application developers to deliver seamless, secure data workflows. Qualifications 5-8 years' experience in a data infrastructure, DevOps, SRE, or MLOps role supporting high-volume, low-latency data systems. Hands-on experience with data architectures, ETL pipelines, and feature engineering for ML workflows Expertise in data mining, data storage, and Extract-Transform-Load (ETL) processes. Experience with DevOps practices, CI/CD pipelines, and infrastructure automation using tools like Bicep, Terraform, ARM, or similar. Familiarity with distributed systems, containerization (Kubernetes, Docker), and orchestration frameworks. Solid understanding of scripting language preferably Python, Scala and pyspark is more suited. Familiarity with Azure ML/Data Analytics tools such as Synapse Experience with DevOps practices and managing CI/CD pipelines. Experience with MLOps tooling for model governance, monitoring, and compliance. Preferred Qualifications Bachelor's degree in computer science, Mathematics, or a related field. 5-8 years' experience in a data infrastructure, DevOps, SRE, or MLOps role supporting high-volume, low-latency data systems. 5-8 years' experience managing and scaling distributed systems, from bare metal to Kubernetes, including deep knowledge across the full stack. 5+ years building and deploying containerized applications with Kubernetes and Helm/Kustomize. Proficiency in scripting and automation using languages such as Python, Bash, or PowerShell with proven experience in automating operational tasks, including health checks, alerting, and observability for data and ML systems. Demonstrated success in troubleshooting and supporting critical production systems with managing CI/CD pipelines and release automation. Microsoft is an equal opportunity employer. Consistent with applicable law, all qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.

Posted 1 week ago

Apply

5.0 - 10.0 years

7 - 17 Lacs

mumbai

Work from Office

An understanding of product development methodologies and microservices architecture. Hands-on experience with at least two major cloud providers (AWS, GCP, Azure). Multi-cloud experience is a strong advantage. Expertise in designing, implementing, and managing cloud architectures focusing on scalability, security, and resilience. Understanding and experience with cloud fundamentals like Networking, IAM, Compute, and Managed Services like DB, Storage, GKE/EKS, and KMS. Hands-on experience with cloud architecture design & setup. An in-depth understanding of Infrastructure as Code tools like Terraform, HELM is a must. Practical experience in deploying, maintaining, and scaling applications on Kubernetes clusters using Helm Charts or Kustomize Hands-on experience with any CI/CD tools like Gitlab CI, Jenkins, Github Actions. GitOps tools like ArgoCD, FluxCD is a must. Experience with Monitoring and Logging tools like Prometheus, Grafana and Elastic Stack. Experience working with PaaS is a plus Experience deploying on-prem data centre. Experience with k3s OSS / OpenShift / Rancher Kubernetes Cluster is a plus What are we looking for Learn, Architect & Build Skills & Technologies as highlighted above Product-Oriented Delivery Design, Build, and Operate Cloud Architecture & DevOps Pipeline Build on Open Source Technologies Collaboration with teams across 5 products GitOps Philosophy DevSecOps Mindset - Highly Secure Platform

Posted 1 week ago

Apply

5.0 - 9.0 years

15 - 19 Lacs

mumbai

Remote

Key Responsibilities Strong understanding of product development methodologies and microservices architecture Hands-on experience with at least two major cloud providers (AWS, GCP, Azure); multi-cloud expertise is a plus Expertise in designing, implementing, and managing scalable, secure, and resilient cloud architectures Knowledge of cloud fundamentals Networking, IAM, Compute, Managed Services (DB, Storage, GKE/EKS, KMS) Hands-on experience in cloud architecture design & setup Proficiency with Infrastructure as Code (IaC) tools Terraform, Helm (must-have), Kustomize Experience deploying, maintaining, and scaling apps on Kubernetes clusters Hands-on with CI/CD tools (GitLab CI, Jenkins, GitHub Actions) and GitOps tools (ArgoCD, FluxCD must-have) Strong knowledge of Monitoring & Logging tools Prometheus, Grafana, Elastic Stack Experience with PaaS and on-prem data center deployments Exposure to k3s OSS / OpenShift / Rancher Kubernetes clusters is a plus What Were Looking For Ability to Learn, Architect & Build Hands-on expertise in the skills & technologies above Product-oriented delivery mindset Experience in Designing, Building, and Operating Cloud Architecture & DevOps Pipelines Strong background in Open Source Technologies Collaboration across multiple product teams (5+) Adherence to GitOps philosophy DevSecOps mindset building highly secure platforms

Posted 1 week ago

Apply

0.0 - 8.0 years

0 Lacs

hyderabad, telangana

Remote

Software Engineer (MLOps) Hyderabad, Telangana, India Date posted Sep 08, 2025 Job number 1872301 Work site Up to 50% work from home Travel 0-25 % Role type Individual Contributor Profession Software Engineering Discipline Software Engineering Employment type Full-Time Overview Security represents the most critical priorities for our customers in a world awash in digital threats, regulatory scrutiny, and estate complexity. Microsoft Security aspires to make the world a safer place for all. We want to reshape security and empower every user, customer, and developer with a security cloud that protects them with end to end, simplified solutions. The Microsoft Security organization accelerates Microsoft’s mission and bold ambitions to ensure that our company and industry is securing digital technology platforms, devices, and clouds in our customers’ heterogeneous environments, as well as ensuring the security of our own internal estate. Our culture is centered on embracing a growth mindset, a theme of inspiring excellence, and encouraging teams and leaders to bring their best each day. In doing so, we create life-changing innovations that impact billions of lives around the world. If you are passionate about cybersecurity, data engineering, and MLOps, the M365 Security Engineering team at Microsoft offers an exciting opportunity to work on advanced solutions that safeguard Microsoft services against evolving cyber threats. Our team is dedicated to building scalable, reliable, and secure data and ML infrastructure that powers mission-critical AI-driven detections across Microsoft’s cloud ecosystem. We value diversity, deep collaboration, and technical excellence, bringing together engineers with expertise in large-scale software systems, security analysis, big data, and machine learning. You will join a group that thrives on analyzing billions of events and terabytes of data generated daily by Microsoft products and services (e.g., Azure, M365), uncovering evidence of suspicious activities and continuously improving detection capabilities. As part of our team, you will help ensure that critical security components and detection pipelines are present, robust, and up to date throughout the infrastructure. You will collaborate with Data Scientists, Security Researchers, and Platform Engineers to deliver seamless, secure, and innovative data workflows, driving the future of AI-powered cybersecurity at Microsoft. Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond. In alignment with our Microsoft values, we are committed to cultivating an inclusive work environment for all employees to positively impact our culture every day. Qualifications 5-8 years' experience in a data infrastructure, DevOps, SRE, or MLOps role supporting high-volume, low-latency data systems. Hands-on experience with data architectures, ETL pipelines, and feature engineering for ML workflows Expertise in data mining, data storage, and Extract-Transform-Load (ETL) processes. Experience with DevOps practices, CI/CD pipelines, and infrastructure automation using tools like Bicep, Terraform, ARM, or similar. Familiarity with distributed systems, containerization (Kubernetes, Docker), and orchestration frameworks. Solid understanding of scripting language preferably Python, Scala and pyspark is more suited. Familiarity with Azure ML/Data Analytics tools such as Synapse Experience with DevOps practices and managing CI/CD pipelines. Experience with MLOps tooling for model governance, monitoring, and compliance. Preferred Qualifications: Bachelor's degree in computer science, Mathematics, or a related field. 5-8 years' experience in a data infrastructure, DevOps, SRE, or MLOps role supporting high-volume, low-latency data systems. 5-8 years' experience managing and scaling distributed systems, from bare metal to Kubernetes, including deep knowledge across the full stack. 5+ years building and deploying containerized applications with Kubernetes and Helm/Kustomize. Proficiency in scripting and automation using languages such as Python, Bash, or PowerShell with proven experience in automating operational tasks, including health checks, alerting, and observability for data and ML systems. Demonstrated success in troubleshooting and supporting critical production systems with managing CI/CD pipelines and release automation. Responsibilities Collaborate with Data Scientists and Security Researchers to extract, transform, and load (ETL) data for exploratory analysis and ML experiments. Build and maintain scalable feature engineering pipelines, leveraging POC code and stakeholder inputs. Ensure data quality, integrity, and readiness for ML model development. Develop robust, scalable, and maintainable ML training and inference pipelines. Standardize components using SDKs/frameworks for reusability and consistency. Implement model monitoring (accuracy, false positives, etc.) and enable continuous improvement. Conduct performance, scalability, and reliability testing of ML pipelines. Automate infrastructure provisioning (e.g., using Bicep, Terraform, ARM) and manage CI/CD pipelines for ML workflows. Deploy ML pipelines to staging and production environments, provisioning dependencies (linked services, storage, Spark pools/clusters). Implement service monitoring, incident response, and operational health checks for data and ML systems. Develop self-service tooling to streamline productivity for developers and researchers. Work closely with AI researchers, platform engineers, and application developers to deliver seamless, secure data workflows. Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.  Industry leading healthcare  Educational resources  Discounts on products and services  Savings and investments  Maternity and paternity leave  Generous time away  Giving programs  Opportunities to network and connect Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.

Posted 1 week ago

Apply

5.0 - 8.0 years

17 - 20 Lacs

mumbai

Work from Office

An understanding of product development methodologies and microservices architecture. Hands-on experience with at least two major cloud providers (AWS, GCP, Azure). Multi-cloud experience is a strong advantage. Expertise in designing, implementing, and managing cloud architectures focusing on scalability, security, and resilience. Understanding and experience with cloud fundamentals like Networking, IAM, Compute, and Managed Services like DB, Storage, GKE/EKS, and KMS. Hands-on experience with cloud architecture design & setup. An in-depth understanding of Infrastructure as Code tools like Terraform, HELM is a must. Practical experience in deploying, maintaining, and scaling applications on Kubernetes clusters using Helm Charts or Kustomize Hands-on experience with any CI/CD tools like Gitlab CI, Jenkins, Github Actions. GitOps tools like ArgoCD, FluxCD is a must. Experience with Monitoring and Logging tools like Prometheus, Grafana and Elastic Stack. Experience working with PaaS is a plus Experience deploying on-prem data centre. Experience with k3s OSS / OpenShift / Rancher Kubernetes Cluster is a plus What are we looking for Learn, Architect & Build Skills & Technologies as highlighted above Product-Oriented Delivery Design, Build, and Operate Cloud Architecture & DevOps Pipeline Build on Open Source Technologies Collaboration with teams across 5 products GitOps Philosophy DevSecOps Mindset - Highly Secure Platform

Posted 1 week ago

Apply

0 years

0 Lacs

ahmedabad, gujarat, india

On-site

Job Purpose To ensure the reliability, performance, and resilience of our systems by managing Windows and Linux servers, SQL Server, .NET applications, and Azure services, while bridging development and operations teams to foster a culture of reliability. Who you are: ● Lead incident management processes, carry out on-call duties, and effectively use incident management tools to manage and mitigate critical incidents. ● Administer and fine-tune server infrastructures across both Windows and Linux platforms, ensuring high availability and performance. ● Scale, and maintain applications and services within a .NET framework, complemented by robust SQL Server management. ● Deploy web applications leveraging Angular, ensuring fluid interaction with backend services orchestrated in both Windows and Linux environments. ● Monitor and manage containerized workloads using Kubernetes, including Azure Kubernetes Services, across heterogeneous operating systems. ● Maintain Azure compute resources, Azure Load Balancers, and facilitate networking services for optimal performance and reliability. ● Manage proxy servers and load balancers such as NGINX and HAProxy to enhance application delivery in a mixed server environment. ● Support Elasticsearch clusters, SQL Server Reporting Services (SSRS), and SQL Server Integration Services (SSIS). ● Construct and manage comprehensive log aggregation strategies to support proactive monitoring and swift issue resolution. ● Define and track SLOs, SLAs, and SLIs, applying them to continuously refine service reliability. ● Utilize advanced monitoring systems to detect and preempt disruptions, ensuring system robustness. ● Experience with infrastructure as code and automation tools, including Terraform, Helm, and Ansible, to manage infrastructure lifecycle effectively. ● Work with diverse teams to embed reliability principles throughout the software development life cycle and to conduct insightful post-mortems without blame. What will excite us: ● Demonstrable experience in SRE or similar roles, with responsibilities spanning both Windows and Linux server ecosystems. ● Working knowledge of of Microsoft SQL Server, Windows Server, .NET, MVC, Web API, C#, along with a robust understanding of Linux server administration. ● Experience in web based distributed micorservices based applications, particularly with MVC, Web API, Angular and nodeJS frameworks. ● Technical knowledge of following languages: C#, TSQL, Typescript, python, bash scripting, HCL ● Expertise in Kubernetes orchestration, inclusive of Azure Kubernetes Services, in a mixed OS environment. ● Experience with Azure cloud services, particularly Azure Compute and Azure Load Balancers. ● In-depth knowledge of NGINX and HAProxy, especially within a dual-platform context. ● Familiarity with Elasticsearch, SSRS, and SSIS across both Windows and Linux systems. ● Solid understanding of log aggregation, SLOs, SLAs, SLIs, and monitoring systems. ● Competence with infrastructure as code and automation tools (Terraform, Helm, Ansible, Kustomize). ● Strong incident management skills and experience with relevant tools. ● Excellent analytical, problem-solving, communication, and collaboration skills. ● Experience with git based source control. ● Experience with scripting languages. ● Understanding and experience in implementation of continuous integration and continuous deployment pipelines. Location: Ahmedabad (Work from Office)

Posted 1 week ago

Apply

2.0 - 6.0 years

0 Lacs

thiruvananthapuram, kerala

On-site

We are looking for a skilled and proactive DevOps Engineer to join our team. The ideal candidate should possess a strong foundation in cloud-native solutions and microservices, focusing on automation, infrastructure as code (IaC), and continuous improvement. As a DevOps Engineer, you will be responsible for managing infrastructure architecture and non-functional requirements across various projects, ensuring high standards in security, operational efficiency, and cost management. Your key responsibilities will include owning infrastructure architecture and non-functional requirements for a set of projects, designing and integrating cloud architecture with cloud-native solutions, implementing common infrastructure best practices emphasizing security, operational efficiency, and cost efficiency, demonstrating strong knowledge of microservices based architecture, and understanding Kubernetes and Docker. You should also have experience in SecOps practices, developing CI/CD pipelines for faster build with quality and security automation, enabling observability within the platform, and building & deploying cloud IAC in AWS using tools like CrossPlane, Terraform, or Cloud Formation. To be successful in this role, you should have at least 2 years of hands-on experience in DevOps and possess technical skills such as proficiency in containerization (Docker) and container orchestration (Kubernetes), strong scripting and automation abilities, familiarity with DevOps tools and CI/CD processes, especially in Agile environments. Extensive hands-on experience with AWS, including deploying and managing infrastructure through IaC (Terraform, CloudFormation), and proficiency in configuration management tools like Ansible and Kustomize are essential. Additional skills like knowledge of microservices architecture, observability best practices, and SecOps tools will be beneficial. Your primary skills should include a good understanding of scripting and automation, exposure to Linux and Windows-based environments, experience in DevOps engineering with automation using tools like Ansible and Kustomize, familiarity with CI/CD processes and Agile development, and proficiency in toolchains like Containerization, Container Orchestrations, CI/CD, and SecOps. Communication skills, teamwork, attention to detail, problem-solving abilities, learning agility, and effective prioritization are key attributes we are looking for in a candidate. If you possess excellent communication skills, are self-motivated, proactive, quick to learn new technologies, and can effectively manage multiple projects within deadlines, you are the right fit for this role. Join our team and contribute to our mission of operational excellence, customer-centricity, and continuous learning in a collaborative environment. (ref:hirist.tech),

Posted 1 week ago

Apply

25.0 years

0 Lacs

ahmedabad, gujarat, india

On-site

eInfochips (An Arrow Company): eInfochips, an Arrow company (A $27.9 B, NASDAQ listed (ARW); Ranked #154 on the Fortune List), is a leading global provider of product engineering and semiconductor design services. 25+ years of proven track record, with a team of over 2500+ engineers, the team has been instrumental in developing over 500+ products and 40M deployments in 140 countries. Company’s service offerings include Silicon Engineering, Embedded Engineering, Hardware Engineering & Digital Engineering services. eInfochips services 7 of the top 10 semiconductor companies and is recognized by NASSCOM, Zinnov and Gartner as a leading Semiconductor service provider. What we are looking : 5 to 8 years experience in DevOps, with a strong focus on automation, cloud infrastructure, and CI/CD practices. Terraform: Advanced knowledge of Terraform, with experience in writing, testing, and deploying modules. AWS: Extensive experience with AWS services (EC2, S3, RDS, Lambda, VPC, etc.) and best practices in cloud architecture. Docker & Kubernetes: Proven experience in containerization with Docker and orchestration with Kubernetes in production environments. CI/CD: Strong understanding of CI/CD processes, with hands-on experience in CircleCI or similar tools. Scripting: Proficient in Python and Linux Shell scripting for automation and process improvement. Monitoring & Logging: Experience with Datadog or similar tools for monitoring and alerting in large-scale environments. Version Control: Proficient with Git, including branching, merging, and collaborative workflows. Configuration Management: Experience with Kustomize or similar tools for managing Kubernetes configurations Work Location - Ahmedabad/Pune/Bangalore/Hyderabad Shift timing (Rotational) - S1 : 6 AM to 2:30 PM IST S2 : 2 PM to 10:30 PM IST S3 : 10 PM to 6:30 AM IST Interested candidates can share resumes on arti.bhimani1@einfochips.com

Posted 2 weeks ago

Apply

3.0 - 7.0 years

0 Lacs

haryana

On-site

We are seeking a skilled MLOps Engineer with expertise in deploying and managing machine learning models utilizing cloud-native CI/CD pipelines, FastAPI, and Kubernetes, excluding Docker. The perfect candidate will have a strong background in scalable model serving, API development, and infrastructure automation on the cloud utilizing native container alternatives or pre-built images. Responsibilities will include designing, developing, and maintaining CI/CD pipelines for ML model training, testing, and deployment on cloud platforms such as Azure, AWS, and GCP. You will be tasked with creating REST APIs using FastAPI for model inference and data services, as well as deploying and orchestrating microservices and ML workloads on Kubernetes clusters like EKS, AKS, GKE, or on-prem K8s. It will be essential to implement model monitoring, logging, and version control without Docker-based containers, utilizing alternatives such as Singularity, Buildah, or cloud-native container orchestration. Automation of deployment pipelines using tools like GitHub Actions, GitLab CI, Jenkins, and Azure DevOps will also be part of your role. Additionally, you will manage secrets, configurations, and infrastructure using Kubernetes secrets, ConfigMaps, Helm, or Kustomize, while collaborating closely with Data Scientists and Backend Engineers to integrate ML models with APIs and UIs. Your responsibilities will also include optimizing performance, scalability, and reliability of ML services in production. The ideal candidate should possess strong experience with Kubernetes, including deployment, scaling, Helm, and Kustomize. A deep understanding of CI/CD tools like Jenkins, GitHub Actions, GitLab CI/CD, or Azure DevOps is required. Proficiency in FastAPI for high-performance ML/REST APIs is essential, along with experience in cloud platforms like AWS, GCP, or Azure for ML pipeline orchestration. Familiarity with non-Docker containerization or deployment tools such as Singularity, Podman, or OCI-compliant methods is preferred. Strong Python skills and familiarity with ML libraries and model serialization (e.g., Pickle, ONNX, TorchServe) are also necessary, as well as a good understanding of DevOps principles, GitOps, and IaC (Terraform or similar). Preferred qualifications include experience with Kubeflow, MLflow, or similar tools, along with familiarity with model monitoring tools like Prometheus, Grafana, or Seldon Core. An understanding of security and compliance in production ML systems is advantageous. A Bachelor's or Master's degree in Computer Science, Engineering, or a related field is preferred. This is a full-time, permanent position in the Technology, Information, and Internet industry.,

Posted 2 weeks ago

Apply

0 years

0 Lacs

gurugram, haryana, india

On-site

CI/CD (Continuous Integration/Delivery/Deployment) The Core Requirements For The Job Include The Following Tools: Jenkins, GitHub Actions, GitLab CI, CircleCI, ArgoCD, Spinnaker. Concepts: Pipeline design (build, test, deploy), Blue-green / canary deployments, Rollbacks and artifact versioning, GitOps practices. Infrastructure As Code (IaC) Tools: Terraform, Pulumi, AWS CloudFormation, Ansible, Helm. Skills: Writing modular IaC code. Secret and state management. Policy enforcement (OPA, Sentinel). DRY patterns and IaC testing (e. g., Terratest). Cloud Platforms Platforms: AWS, Azure, GCP, OCI. Skills: VPC/networking setup, IAM policies, Managed services (EKS, GKE, AKS, RDS, Lambda), Billing, cost control, tagging governance, Cloud automation with CLI/SDKs. Containerization And Orchestration Tools: Docker, Podman, Kubernetes, OpenShift. Skills: Dockerfile optimization, multi-stage builds, Helm charts, Kustomize, K8s RBAC, admission controllers, pod security policies, Service mesh (Istio, Linkerd). Security And Compliance Tools: HashiCorp Vault, AWS Secrets Manager, Aqua, Snyk. Practices: Image scanning and runtime protection, Least privilege access models, Network policies, TLS enforcement, Audit logging, and compliance automation. Observability And Monitoring Tools: Prometheus, Grafana, ELK stack, Datadog, New Relic. Skills: Metrics, tracing, log aggregation, alerting thresholds and SLOs, Distributed tracing (Jaeger, OpenTelemetry). Reliability And Resilience Engineering Concepts and Tools: SRE practices, error budgets, Chaos engineering (Gremlin, LitmusChaos), Auto-scaling, self-healing infrastructure, Service Level Objectives (SLO/SLI) Platform Engineering (DevEx Focused) Tools: Backstage, Internal Developer Portals, Terraform Cloud. Practices: Golden paths and reusable blueprints, Self-service pipelines, Developer onboarding automation, Platform as a Product mindset. Source Control And Collaboration Tools: Git, Bitbucket, GitHub, GitLab. Practices: Branching strategies (Git Flow, trunk-based), Code reviews, merge policies, commit signing, and DCO enforcement. Scripting And Automation Languages: Bash, Python, Go, PowerShell. Skills: Writing CLI tools, Cron jobs and job runners, ChatOps and automation bots (Slack, MS Teams). This job was posted by Bhavya Chauhan from CloudTechner.

Posted 2 weeks ago

Apply

8.0 - 12.0 years

0 Lacs

karnataka

On-site

As an experienced DevOps Engineer joining our development team, you will play a crucial role in the evolution of our Platform Orchestration product. Your expertise will be utilized to work on software incorporating cutting-edge technologies and integration frameworks. At our organization, we prioritize staff training, investment, and career growth, ensuring you have the opportunity to enhance your skills and experience through exposure to various software validation techniques and industry-standard engineering processes. Your contributions will include building and maintaining CI/CD pipelines for multi-tenant deployments using Jenkins and GitOps practices, managing Kubernetes infrastructure (specifically AWS EKS), Helm charts, and service mesh configurations (ISTIO). You will utilize tools like kubectl, Lens, or other dashboards for real-time workload inspection and troubleshooting. Evaluating the security, stability, compatibility, scalability, interoperability, monitorability, resilience, and performance of our software will be a key responsibility. Supporting development and QA teams with code merge, build, install, and deployment environments, you will ensure continuous improvement of the software automation pipeline to enhance build and integration efficiency. Additionally, overseeing and maintaining the health of software repositories and build tools, ensuring successful and continuous software builds will be part of your role. Verifying final software release configurations against specifications, architecture, and documentation, as well as performing fulfillment and release activities for timely and reliable deployments, will also fall within your purview. To thrive in this role, we are seeking candidates with a Bachelors or Masters degree in Computer Science, Engineering, or a related field, coupled with 8-12 years of hands-on experience in DevOps or SRE roles for cloud-native Java-based platforms. Deep knowledge of AWS Cloud Services, including EKS, IAM, CloudWatch, S3, and Secrets Manager, is essential. Your expertise with Kubernetes, Helm, ConfigMaps, Secrets, and Kustomize, along with experience in authoring and maintaining Jenkins pipelines integrated with security and quality scanning tools, will be beneficial. Proficiency in scripting/programming languages such as Ruby, Groovy, and Java is desired, as well as experience with infrastructure provisioning tools like Docker and CloudFormation. In return, we offer an inclusive culture that reflects our core values, providing you with the opportunity to make an impact, develop professionally, and participate in valuable learning experiences. You will benefit from highly competitive compensation, benefits, and rewards programs that recognize and encourage your best work every day. Our engaging work environment promotes work/life balance, offers employee resource groups, and social events to foster interaction and camaraderie. Join us in shaping the future of our Platform Orchestration product and growing your skills in a supportive and dynamic team environment.,

Posted 2 weeks ago

Apply

0 years

0 Lacs

delhi cantonment, delhi, india

On-site

Role Overview As a Senior DevOps Engineer, you will own, operate, and continuously optimize Linux-based platforms and Kubernetes environments, ensuring high availability, scalability, and security. You will lead a team of engineers, delegate complex tasks, oversee project execution, and set technical direction across virtualization, storage, security, and automation. This is a leadership-driven role requiring both hands-on expertise and strategic Responsibilities : Leadership & Ownership Plan, delegate, and track tasks for the DevOps team. Oversee sprint/release cycles, drive Agile ceremonies, and ensure timely delivery with documentation. Act as the escalation point for technical and operational challenges. Platform Engineering Administer and optimize Linux (RHEL/CentOS) servers with kernel tuning and patching. Manage VMware/Red Hat virtualization (HA/DR, templates, vSwitch, resource pools). Provision and manage storage (block, file, object); design quota strategies. Database & Middleware Operate SQL (MySQL/PostgreSQL) and NoSQL (MongoDB/Redis) clusters. Perform query optimization, clustering, and backup/restore strategies. Deploy and manage CMS (Drupal, WordPress) across multi-stage environments. Containers & Orchestration Lead Kubernetes operations (installations, upgrades, ingress, RBAC, CNI policies). Manage Docker image registries, CI/CD pipelines, and container hygiene. Automation & Observability Build advanced automation scripts (Bash/Python) for deployment and remediation. Implement observability with metrics, tracing, alerts, and SLOs. Benchmark and fine-tune performance across compute, storage, and network. Security & Compliance Drive infrastructure hardening (CIS benchmarks, firewalls, TLS/PKI). Conduct VA/PT remediation, patch cycles, and compliance reporting. Cloud & Distributed Systems Operate OpenStack and Ceph clusters (setup, scaling, troubleshooting). Ensure cloud-native resilience, HA/DR, and cost Skills & Experience : Expert Linux administration with kernel-level knowledge. Strong virtualization (VMware/Red Hat) management experience. Kubernetes mastery (Helm, Kustomize, ingress, multi-node clusters). SQL and NoSQL database administration and optimization. Advanced scripting (Bash, Python optional) for infrastructure automation. Strong observability skills : logs, tracing, metrics, and performance benchmarking. Proven leadership in Agile/Scrum-driven Certifications : RHCE, RHCSA Certified Kubernetes Administrator (CKA) VMware VCP OpenStack / Ceph certifications Security certifications (CompTIA Security+, CISSP Associate) (ref:hirist.tech)

Posted 2 weeks ago

Apply

6.0 years

0 Lacs

bengaluru, karnataka, india

On-site

Profile Description We’re seeking someone to join our team as (Director) Python/Cloud Developer who have a passion for embracing cutting edge technology, fostering innovation and a penchant for automation Enterprise_Technology Enterprise Technology & Services (ETS) delivers shared technology services for Morgan Stanley supporting all business applications and end users. ETS provides capabilities for all stages of Morgan Stanley’s software development lifecycle, enabling productive coding, functional and integration testing, application releases, and ongoing monitoring and support for over 3,000 production applications. ETS also delivers all workplace technologies (desktop, mobile, voice, video, productivity, intranet/internet) in integrated configurations that boost the personal productivity of employees. Application and end user functions are delivered on a scalable, secure, and reliable infrastructure composed of seamlessly integrated datacenter, network, compute, cloud, storage, and database functions. Enterprise Computing (EC) Enterprise Computing is a department of Core Infrastructure that provides a collection of products and services that make up the Morgan Stanley core compute infrastructure, hosting platforms and supporting application services. Solutions are delivered across the infrastructure stack from foundational hardware platforms, storage, server operating systems and virtualization up through the hosting of common web, database, middleware and reporting services. Cloud & Infrastructure Engineering This is Associate position that manages and optimizes technical infrastructure and ensures the seamless operation of IT systems to support business needs effectively. Morgan Stanley is an industry leader in financial services, known for mobilizing capital to help governments, corporations, institutions, and individuals around the world achieve their financial goals. At Morgan Stanley India, we support the Firm’s global businesses, with critical presence across Institutional Securities, Wealth Management, and Investment management, as well as in the Firm’s infrastructure functions of Technology, Operations, Finance, Risk Management, Legal and Corporate & Enterprise Services. Morgan Stanley has been rooted in India since 1993, with campuses in both Mumbai and Bengaluru. We empower our multi-faceted and talented teams to advance their careers and make a global impact on the business. For those who show passion and grit in their work, there’s ample opportunity to move across the businesses for those who show passion and grit in their work. Interested in joining a team that’s eager to create, innovate and make an impact on the world? Read on… What You’ll Do In The Role The Cloud Platform Engineer will be part of a team of engineers which works on automation and configuration as code for Public Cloud Kubernetes & foundational architecture related to connectivity across multiple Cloud Service Providers. They should have a strong background in infrastructure and Public Cloud technologies They will be a part of the global team and will be responsible for setting up & also connecting complex, multi-tier applications from on-prem to the Public Cloud They will be closely working with Product Management and Vendors to develop and deploy Cloud services to meet customer expectations. What You’ll Bring To The Role At least 6 years’ relevant experience would generally be expected to find the skills required for this role. Experience in Kubernetes and Container-based technologies Experience in any of the following cloud service providers - GCP, Azure or AWS. Sound experience with Infrastructure as Code (Terraform, etc.) Strong development skills in Python or Golang. Sound experience in a scripting language such as Shell Scripting. Experience in development of projects in a distributed enterprise environment. Experience of setting up a new development project using modern tools and practices including git, github, github actions, Jenkins, test-driven development, and continuous integration in a Linux-based environment Sound knowledge of Devops, infrastructure and cloud computing Ability to mentor and develop more junior programmers, including participating in constructive code reviews Desired Experience working in AWS, Azure or GCP Working with teams using scrum, kanban or other agile practices Familiarity with the Kubernetes eco system and its tools i.e. Helm, Kustomize, OPA, FluxCD etc Proficiency with standard Linux command line and debugging tools. Experience of working with RESTful APIs, especially to manage and configure compute and storage infrastructure Knowledge of how to write comprehensive unit tests, including the mocking of external utilities and APIs What You Can Expect From Morgan Stanley We are committed to maintaining the first-class service and high standard of excellence that have defined Morgan Stanley for over 89 years. Our values - putting clients first, doing the right thing, leading with exceptional ideas, committing to diversity and inclusion, and giving back - aren’t just beliefs, they guide the decisions we make every day to do what's best for our clients, communities and more than 80,000 employees in 1,200 offices across 42 countries. At Morgan Stanley, you’ll find an opportunity to work alongside the best and the brightest, in an environment where you are supported and empowered. Our teams are relentless collaborators and creative thinkers, fueled by their diverse backgrounds and experiences. We are proud to support our employees and their families at every point along their work-life journey, offering some of the most attractive and comprehensive employee benefits and perks in the industry. There’s also ample opportunity to move about the business for those who show passion and grit in their work. To learn more about our offices across the globe, please copy and paste https://www.morganstanley.com/about-us/global-offices into your browser. Morgan Stanley is an equal opportunities employer. We work to provide a supportive and inclusive environment where all individuals can maximize their full potential. Our skilled and creative workforce is comprised of individuals drawn from a broad cross section of the global communities in which we operate and who reflect a variety of backgrounds, talents, perspectives, and experiences. Our strong commitment to a culture of inclusion is evident through our constant focus on recruiting, developing, and advancing individuals based on their skills and talents.

Posted 2 weeks ago

Apply

0 years

0 Lacs

gurgaon, haryana, india

On-site

CI/CD (Continuous Integration/Delivery/Deployment) The core requirements for the job include the following: Tools: Jenkins, GitHub Actions, GitLab CI, CircleCI, ArgoCD, Spinnaker. Concepts: Pipeline design (build, test, deploy), Blue-green / canary deployments, Rollbacks and artifact versioning, GitOps practices. Infrastructure As Code (IaC) Tools: Terraform, Pulumi, AWS CloudFormation, Ansible, Helm. Skills: Writing modular IaC code. Secret and state management. Policy enforcement (OPA, Sentinel). DRY patterns and IaC testing (e. g., Terratest). Cloud Platforms Platforms: AWS, Azure, GCP, OCI. Skills: VPC/networking setup, IAM policies, Managed services (EKS, GKE, AKS, RDS, Lambda), Billing, cost control, tagging governance, Cloud automation with CLI/SDKs. Containerization And Orchestration Tools: Docker, Podman, Kubernetes, OpenShift. Skills: Dockerfile optimization, multi-stage builds, Helm charts, Kustomize, K8s RBAC, admission controllers, pod security policies, Service mesh (Istio, Linkerd). Security And Compliance Tools: HashiCorp Vault, AWS Secrets Manager, Aqua, Snyk. Practices: Image scanning and runtime protection, Least privilege access models, Network policies, TLS enforcement, Audit logging, and compliance automation. Observability And Monitoring Tools: Prometheus, Grafana, ELK stack, Datadog, New Relic. Skills: Metrics, tracing, log aggregation, alerting thresholds and SLOs, Distributed tracing (Jaeger, OpenTelemetry). Reliability And Resilience Engineering Concepts and Tools: SRE practices, error budgets, Chaos engineering (Gremlin, LitmusChaos), Auto-scaling, self-healing infrastructure, Service Level Objectives (SLO/SLI) Platform Engineering (DevEx Focused) Tools: Backstage, Internal Developer Portals, Terraform Cloud. Practices: Golden paths and reusable blueprints, Self-service pipelines, Developer onboarding automation, Platform as a Product mindset. Source Control And Collaboration Tools: Git, Bitbucket, GitHub, GitLab. Practices: Branching strategies (Git Flow, trunk-based), Code reviews, merge policies, commit signing, and DCO enforcement. Scripting And Automation Languages: Bash, Python, Go, PowerShell. Skills: Writing CLI tools, Cron jobs and job runners, ChatOps and automation bots (Slack, MS Teams). This job was posted by Bhavya Chauhan from CloudTechner.

Posted 2 weeks ago

Apply
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies