Senior Machine Learning Lead - LLM Avalara Technologies

8.0 - 13.0 years

12 - 16 Lacs

Pune

Work from Office

What You'll Do We are looking for experienced Machine Learning Engineer with a background in software development and a deep enthusiasm for solving complex problems. You will lead a dynamic team dedicated to designing and implementing a large language model framework to power diverse applications across Avalara. Your responsibilities as a Senior Technical Lead will span the entire development lifecycle, including conceptualization, prototyping and delivery of the LLM platform features. You will be reporting to Senior Manager, Software Engineering What Your Responsibilities Will Be You have a blend of technical skills in the fields of AI & Machine Learning especially with LLMs and a deep-seated understanding of software development practices where you'll work with a team to ensure our systems are scalable, performant and accurate. We are looking for engineers who can think quick and have a background in implementation. Your responsibilities will include: Build on top of the foundational framework for supporting Large Language Model Applications at Avalara Experience with LLMs - like GPT, Claude, LLama and other Bedrock models Leverage best practices in software development, including Continuous Integration/Continuous Deployment (CI/CD) along with appropriate functional and unit testing in place. Inspire creativity by researching and applying the latest technologies and methodologies in machine learning and software development. Write, review, and maintain high-quality code that meets industry standards, contributing to the project's. Lead code review sessions, ensuring good code quality and documentation. Mentor junior engineers, encouraging a culture of collaboration Proficiency in developing and debugging software with a preference for Python, though familiarity with additional programming languages is valued and encouraged. What You'll Need to be Successful 8+ years of experience building Machine Learning models and deploying them in production environments as part of creating solutions to complex customer problems. Bachelor's degree with computer science exposure Proficiency working in cloud computing environments (AWS, Azure, GCP), Machine Learning frameworks, and software development best practices. With technological innovations in AI & ML(esp. GenAI). Expertise in design patterns, data structures, distributed systems, and experience with cloud technologies. Good analytical, design and debugging skills. Technologies you will work with: Python, LLMs, MLFlow, Docker, Kubernetes, Terraform, AWS, GitLab, Postgres, Prometheus, Grafana

Posted 2 weeks ago

Apply

Observability Engineer - SRE DevOps HCLTech

5.0 - 8.0 years

18 - 20 Lacs

Noida, Madurai, Chennai

Hybrid

1. Expertise on Observability/SRE tools, platforms, and standards, including ELK Stack, Grafana, Prometheus, Loki, Victoria Metrics, Telegraf 2. Familiarity with modern logging frameworks and best practices: Opentelemetry, Kafka etc. 3. Experience with data visualization tools like Grafana, Kibana to create informative and actionable dashboards, reports, and alerts. 4. Proficiency in scripting languages like Python, Bash, or PowerShell is valuable for automating data collection, analysis, and visualization processes. 5. Good to have Experience in Monitoring Tools SCOM, Opensearch.

Posted 2 weeks ago

Apply

Application Developer Accenture

15.0 - 20.0 years

5 - 9 Lacs

Chennai

Work from Office

Project Role : Application Developer Project Role Description : Design, build and configure applications to meet business process and application requirements. Must have skills : Spring Boot Good to have skills : NAMinimum 5 year(s) of experience is required Educational Qualification : 15 years full time education Summary :As an Application Developer, you will design, build, and configure applications to meet business process and application requirements. A typical day involves collaborating with team members to understand project needs, developing application features, and ensuring that the applications are aligned with business objectives. You will also engage in problem-solving discussions and contribute to the overall success of the projects by implementing effective solutions. Roles & Responsibilities:- Expected to be an SME.- Collaborate and manage the team to perform.- Responsible for team decisions.- Engage with multiple teams and contribute on key decisions.- Provide solutions to problems for their immediate team and across multiple teams.- Facilitate knowledge sharing sessions to enhance team capabilities.- Monitor project progress and ensure timely delivery of application features. Professional & Technical Skills: - DS & Algo, Java 17/Java EE, Spring Boot, CICD- Web-Services using RESTful, Spring framework, Caching techniques, PostgreSQL SQL, Junit for testing, and containerization with Kubernetes/Docker. Airflow, GCP, Spark, Kafka - Hands on experiencing in building alerting/monitoring/logging for micro services using frameworks like Open Observe/Splunk, Grafana, Prometheus Additional Information:- The candidate should have minimum 5 years of experience in Spring Boot.- This position is based in Chennai.- A 15 years full time education is required. Qualification 15 years full time education

Posted 2 weeks ago

Apply

Devops Engineer Exathought

7.0 - 9.0 years

12 - 20 Lacs

Bengaluru

Work from Office

•This is a contract position for 6 months to 1year and in electronic city office for one of our client. Design, implement, and manage scalable infrastructure solutions in Kubernetes to ensure optimal performance and reliability of services. • Monitor and manage Kubernetes clusters, focusing on service availability, scaling, and resource optimization to meet SLA requirements. • Automate scaling (up and down) of services using tools like Horizontal Pod Autoscaler (HPA) and Cluster Autoscaler. • Develop and maintain CI/CD pipelines for automated deployment, testing, and delivery of infrastructure and services. • Set up, configure, manage, and monitor self-hosted services such as MQTT, Kafka, Redis, Databases, and Nginx within Kubernetes clusters. • Implement robust alerting and monitoring solutions using tools like Prometheus, Grafana, and Loki (for log aggregation) to ensure continuous observability of infrastructure and services. • Handle the deployment, maintenance, and upgrades of both stateful and stateless services across development, staging, and production environments. • Optimize Kubernetes workloads for cost efficiency, reliability, and performance. • Design and implement log aggregation solutions using Loki and its tech stack, enabling efficient centralized log management across environments. • Collaborate with cross-functional teams to troubleshoot and resolve infrastructure issues while adhering to SLA and operational requirements. • Ensure compliance with IT security standards and successfully pass IT security assessments and penetration tests. • Maintain high availability and performance of production systems by proactively managing scalability, disaster recovery, and incident response.

Posted 2 weeks ago

Apply

Application Developer Accenture

15.0 - 20.0 years

5 - 9 Lacs

Coimbatore

Work from Office

Project Role : Application Developer Project Role Description : Design, build and configure applications to meet business process and application requirements. Must have skills : Spring Boot Good to have skills : NAMinimum 5 year(s) of experience is required Educational Qualification : 15 years full time education Summary :As an Application Developer, you will design, build, and configure applications to meet business process and application requirements. A typical day involves collaborating with team members to understand project needs, developing application features, and ensuring that the applications are aligned with business objectives. You will also engage in problem-solving discussions and contribute to the overall success of the projects by implementing effective solutions. Roles & Responsibilities:- Expected to be an SME, collaborate and manage the team to perform.- Responsible for team decisions.- Engage with multiple teams and contribute on key decisions.- Provide solutions to problems for their immediate team and across multiple teams.- Facilitate knowledge sharing sessions to enhance team capabilities.- Monitor project progress and ensure timely delivery of application features. Professional & Technical Skills: - DS & Algo, Java 17/Java EE, Spring Boot, CICD- Web-Services using RESTful, Spring framework, Caching techniques, PostgreSQL SQL, Junit for testing, and containerization with Kubernetes/Docker. Airflow, GCP, Spark, Kafka - Hands on experiencing in building alerting/monitoring/logging for micro services using frameworks like Open Observe/Splunk, Grafana, Prometheus Additional Information:- The candidate should have minimum 5 years of experience in Spring Boot.- This position is based at our Coimbatore office.- A 15 years full time education is required. Qualification 15 years full time education

Posted 2 weeks ago

Apply

Application Support Lead Network People Services Technologies

8.0 - 13.0 years

20 - 25 Lacs

Noida

Work from Office

Who we are and what do we do Innovation in every byte India has witnessed a journey of Innovation in Digital Payments and today it leads the world with over 45% of the Global digital transaction volume. At NPST, we believe that our decade long journey has carved an opportunity for building future roadmap for the world to follow. We are determined to contribute immensely to nation's growth story with our vision to provide digital technology across financial value chain and our mission to create leadership position in digital payment space. Founded in 2013, NPST is a leading fintech firm in India, part of the Make in India initiative and listed on BSE and National Stock Exchange. We specialize in Digital Payments operating as Technology Service Provider to Regulated entities and providing Payment Platform to Industry empowered by payment processing engine, Financial Super app, Risk Intelligence engine and digital merchant solution. While we drive 3% of global digital transaction volume for over 100+ clients, we aim to increase our market share by 5X in next five years through innovation and industry first initiatives. What will you do The ideal candidate will have deep expertise in application support, IT operations, incident/problem management, and hands-on experience in enterprise tools and technologies. The person will play a key role in ensuring high availability, security, and performance of our business-critical applications. Job Responsibilities 1. Leadership & Team Management Lead and mentor a team of application support engineers. Define KPIs, monitor team performance, and conduct regular reviews. Establish and continuously improve support processes and best practices. Act as a point of escalation for critical application incidents. 2. Application Monitoring & Support Manage the end-to-end support lifecycle for web-based and backend applications. Ensure application uptime and response SLAs are met. Coordinate with developers, infrastructure, and security teams for efficient resolution. 3. Incident, Problem & Change Management Handle critical incidents and ensure root cause analysis and permanent resolution. Drive preventive measures and post-incident reviews. Participate in change advisory boards and ensure minimal downtime during releases. 4. Documentation & Compliance Maintain SOPs, runbooks, and knowledge base for all supported applications. Ensure compliance with internal audit, IT security policies, and external standards like ISO 27001. What are we looking for: Technical Skills: Operating Systems : Linux (RHEL/CentOS/Ubuntu), Windows Server Databases : SQL Server, MySQL, Oracle Monitoring Tools : Zabbix, Grafana, Prometheus, Nagios Ticketing & ITSM Tools : ServiceNow, JIRA, Freshservice Middleware : Apache, Nginx, WebLogic, Tomcat Scripting : Shell scripting, PowerShell, basic Python (preferred) Version Control/Deployment Tools : Git, Jenkins, Ansible Security Awareness : Understanding of patch management, encryption, firewalls, and access controls Soft Skills: Strong analytical and troubleshooting skills. Effective communication with technical and non-technical stakeholders. Ability to work under pressure and manage multiple priorities. Strong ownership and customer-centric approach. Education Qualification - Bachelor's degree in software engineering or computer science. Experience: 8-12 years Industry - IT/Software/BFSI/ Banking /Fintech Work arrangement: 5 days working from office Location: Noida/Bengaluru What do we offer: An organization where we strongly believe in one organization, one goal. A fun workplace which compels us to challenge ourselves and aim higher. A team that strongly believes in collaboration and celebrating success together. Benefits that resonate 'We Care'. If this opportunity excites you, we invite you to apply and contribute to our success story. If your resume is shortlisted, you will hear back from us.

Posted 2 weeks ago

Apply

Cloud Platform Engineer - Specialist Accenture

8.0 - 12.0 years

10 - 15 Lacs

Kochi

Work from Office

Job Title - Cloud Platform Engineer Specialist ACS Song Management Level:Level 9 Specialist Location:Kochi, Coimbatore, Trivandrum Must have skills:AWS, Terraform Good to have skills:Hybrid Cloud Experience:8-12 years of experience is required Educational Qualification:Graduation (Accurate educational details should capture) Job Summary Within our Cloud Platforms & Managed Services Solution Line, we apply an agile approach to provide true on-demand cloud platforms. We implement and operate secure cloud and hybrid global infrastructures using automation techniques for our clients business critical application landscape. As a Cloud Platform Engineer you are responsible for implementing on cloud and hybrid global infrastructures using infrastructure-as-code. Roles and Responsibilities Implement Cloud and Hybrid Infrastructures using Infrastructure-as-Code. Automate Provisioning and Maintenance for streamlined operations. Design and Estimate Infrastructure with an emphasis on observability and security. Establish CI/CD Pipelines for seamless application deployment. Ensure Data Integrity and Security through robust mechanisms. Implement Backup and Recovery Procedures for data protection. Build Self-Service Systems for enhanced developer autonomy. Collaborate with Development and Operations Teams for platform optimization. Professional and Technical Skills Customer-Focused Communicator adept at engaging cross-functional teams. Cloud Infrastructure Expert in AWS, Azure, or GCP. Proficient in Infrastructure as Code with tools like Terraform. Experienced in Container Orchestration (Kubernetes, Openshift, Docker Swarm). Skilled in Observability Tools like Prometheus, Grafana, etc., as well as Competent in Log Aggregation tools (Loki, ELK, Graylog) and Familiar with Tracing Systems such as Tempo. CI/CD and GitOps Savvy with potential knowledge of Argo-CD or Flux. Automation Proficiency in Bash and high-level languages (Python, Golang). Linux, Networking, and Database Knowledge for robust infrastructure management. Hybrid Cloud Experience a plus Additional Information About Our Company | Accenture (do not remove the hyperlink) Qualification Experience:3-5 years of experience is required Educational Qualification:Graduation (Accurate educational details should capture)

Posted 2 weeks ago

Apply

Kubernetes Engineer (On site only) Ajmera Infotech

3.0 - 8.0 years

9 - 19 Lacs

Hyderabad, Ahmedabad, Bengaluru

Work from Office

Kubernetes Engineer Build bulletproof infrastructure for regulated industries At Ajmera Infotech , we're building planet-scale software for NYSE-listed clients with a 120+ strong engineering team . Our work powers mission-critical systems in HIPAA, FDA, and SOC2-compliant domains where failure is not an option . Why Youll Love It Own production-grade Kubernetes deployments at real scale Drive TDD-first DevOps in CI/CD environments Work in a compliance-first org (HIPAA, FDA, SOC2) with code-first values Collaborate with top-tier engineers in multi-cloud deployments Career growth via mentorship , deep-tech projects , and leadership tracks Key Responsibilities Design, deploy, and manage resilient Kubernetes clusters (k8s/k3s) Automate workload orchestration using Ansible or custom scripting Integrate Kubernetes deeply into CI/CD pipelines Tune infrastructure for performance, scalability, and regulatory reliability Support secure multi-tenant environments and compliance needs (e.g., HIPAA/FDA) Must-Have Skills 38 years of hands-on experience in production Kubernetes environments Expert-level knowledge of containerization with Docker Proven experience with CI/CD integration for k8s Automation via Ansible , shell scripting, or similar tools Infrastructure performance tuning within Kubernetes clusters Nice-to-Have Skills Multi-cloud cluster management (AWS/GCP/Azure) Helm, ArgoCD, or Flux for deployment and GitOps Service mesh, ingress controllers, and pod security policies

Posted 2 weeks ago

Apply

Senior Site Reliability Engineer (On site Only) Ajmera Infotech

3.0 - 8.0 years

7 - 17 Lacs

Hyderabad, Ahmedabad, Bengaluru

Work from Office

Sr. Site Reliability Engineer - Keep Planet-Scale Systems Reliable, Secure, and Fast At Ajmera Infotech , we build planet-scale platforms for NYSE-listed clients from HIPAA-compliant health systems to FDA-regulated software that simply cannot fail. Our 120+ elite engineers design, deploy, and safeguard mission-critical infrastructure trusted by millions. Why Youll Love It Dev-first SRE culture — automation, CI/CD, zero-toil mindset TDD, monitoring, and observability baked in — not bolted on Code-first reliability — script, ship, and scale with real ownership Mentorship-driven growth — with exposure to regulated industries (HIPAA, FDA, SOC2) End-to-end impact — own infra across Dev and Ops Key Responsibilities Architect and manage scalable, secure Kubernetes clusters (k8s/k3s) in production Develop scripts in Python, PowerShell, and Bash to automate infrastructure operations Optimize performance, availability, and cost across cloud environments Design and enforce CI/CD pipelines using Jenkins, Bamboo, GitHub Actions Implement log monitoring and proactive alerting systems Integrate and tune observability tools like Prometheus and Grafana Support both development and operations pipelines for continuous delivery Manage infrastructure components including Artifactory, Nginx, Apache, IIS Drive compliance-readiness across HIPAA, FDA, ISO, SOC2 Must-Have Skills 3–8 years in SRE or infrastructure engineering roles Kubernetes (k8s/k3s) production experience Scripting: Python, PowerShell, Bash CI/CD tools: Jenkins, Bamboo, GitHub Actions Experience with log monitoring, alerting, and observability stacks Cross-functional pipeline support (Dev + Ops) Tooling: Artifactory, Nginx, Apache, IIS Performance, availability, and cost-efficiency tuning Nice-to-Have Skills Background in regulated environments (HIPAA, FDA, ISO, SOC2) Multi-OS platform experience Integration of Prometheus, Grafana, or similar observability platforms

Posted 2 weeks ago

Apply

Software Engineer NetApp

3.0 - 5.0 years

15 - 27 Lacs

Bengaluru

Work from Office

Job Summary The NetApp Keystone team is responsible for cutting-edge technologies that enable NetApp’s pay as you go offering. Keystone helps customers manage data on prem or in the cloud and have invoices that are charged in a subscription manner. As an engineer in the NetApp’s Keystone organization, you will be executing our most challenging and complex projects. You will be responsible for decomposing complex product requirements into simple solutions, understanding system interdependencies and limitations and engineering best practices. Job Requirements Strong knowledge of Go programming language, paradigms, constructs, and idioms Knowledge of various Go frameworks and tools year experience working with the Go programming language Strong written and communication skills with proven fluency in English Familiarity with database technologies such as NoSQL, Prometheus and MongoDB Hands-on experience with code conversion tools like Git. Passionate about learning new tools, languages, philosophies, and workflows Working with generated code and code generation techniques Working with document databases and Golang ORM libraries Knowledge of programming methodologies - Object Oriented/Functional/Design Patterns Knowledge of software development methodologies - SCRUM/AGILE/LEAN Knowledge of software deployment - Docker/Kubernetes Education Minimum of 2 to 4 years experience required with B.Tech or M.Tech background.

Posted 2 weeks ago

Apply

Sr. Principal Site Reliability Engineer F5

7.0 - 11.0 years

17 - 22 Lacs

Mandya

Work from Office

Position Summary F5 Inc. is actively seeking an exceptional Sr Principal Software Engineer (Individual Contributor) to play a pivotal role in our SRE Operations team for the groundbreaking F5XC Product. Are you an SRE Operations specialist with automation in your DNA? Do you thrive in fast-paced SaaS environments where Why This Role is Unique: Our SaaS is hybrid running across public cloud and a global network of 50+ PoPs , delivering terabits of capacity . Our infrastructure spans cloud-native services and physical networking gear (routers, switches, firewalls), creating a uniquely challenging and exciting observability landscape. The Analytics & Observability platform will have deep reach across these layers , ensuring reliability, security, and performance at a massive scale. What Youll Do: Be the Force Behind Observability & Stability Drive end-to-end Observability (Logs, Metrics, and Alerts) across our hybrid SaaS stack , spanning cloud, edge, and physical network devices. Take ownership of Alerting strategy , cutting through noise while ensuring actionable, high-fidelity alerts. Implement intelligent automation to reduce operational toil and enhance real-time visibility. Own & Automate Operations Design, build, and manage automation for self-healing infrastructure across cloud + global PoPs. Develop automation for Kubernetes, ArgoCD, Helm Charts, Golang-based services, AWS, GCP, Terraform . Improve networking observability , ensuring our routers, switches, and firewalls are monitored at scale. Continuously eliminate manual ops work through automation and platform improvements. Lead Incident Response & Operational Excellence Participate in on-call rotations , ensuring rapid incident response across our cloud + edge stack. Drive incident response automation , reducing MTTR and increasing system resilience . Ensure security, compliance, and best practices in observability & automation . Collaborate & Mentor Work closely with application teams, network engineers, and SREs to improve reliability and performance. Mentor junior engineers, fostering a culture of automation-first thinking and deep observability . What Makes You a Great Fit? Deep expertise in Logs, Metrics, and Alerting, with a strong focus on Alerting automation . Experience in hybrid SaaS environments spanning cloud-native and global infrastructure. Strong background in Kubernetes, Infrastructure-as-Code (Terraform), Golang, AWS/GCP, and networking observability . Proven track record of eliminating toil and improving operational efficiency through automation. Passion for deep observability, networking-scale analytics, and automation at the edge .If you love solving reliability challenges at global scale, automating everything, and working in a hybrid cloud + networking environment , we want to talk to you!The About The Role is intended to be a general representation of the responsibilities and requirements of the job. However, the description may not be all-inclusive, and responsibilities and requirements are subject to change. Must-Have: Observability & Alerting Expertise Strong experience with Logs, Metrics, and Alerts , with a focus on high-fidelity alerting and automation . Automation & Infrastructure as Code Deep knowledge of Terraform, ArgoCD, Helm, Kubernetes, and Golang for automation . Cloud & Hybrid SaaS Experience Hands-on experience managing cloud-native (AWS/GCP) and edge infrastructure . Incident Response & Reliability Engineering Strong on-call experience , with a track record of reducing MTTR through automation Kubernetes Mastery Hands-on experience deploying, managing, and troubleshooting Kubernetes in production environments. Nice-to-Have: Networking & Edge Observability Familiarity with monitoring routers, switches, and firewalls in a global PoP environment . Data & Analytics in Observability Experience with time-series databases (Prometheus, Grafana, OpenTelemetry, etc.) . Security & Compliance Awareness Understanding of secure-by-design principles for monitoring & alerting . Mentorship & Collaboration Ability to mentor junior engineers and work cross-functionally with SREs, application teams, and network engineers . High Availability Disaster Recovery Experience with HA/DR and Migration Qualifications Typically, it requires at least 18 years of related experience with a bachelors degree, 15 years and a masters degree, or a PhD with 12 years experience; or equivalent experience. Excellent organizational agility and communication skills throughout the organization. Environment Empowered Work Culture: Experience an environment that values autonomy, fostering a culture where creativity and ownership are encouraged. Continuous Learning: Benefit from the mentorship of experienced professionals with solid backgrounds across diverse domains, supporting your professional growth. Team Cohesion: Join a collaborative and supportive team where youll feel at home from day one, contributing to a positive and inspiring workplace. F5 Networks, Inc. is an equal opportunity employer and strongly supports diversity in the workplace.

Posted 2 weeks ago

Apply

Sr. Principal Site Reliability Engineer F5

7.0 - 11.0 years

17 - 22 Lacs

Mysuru

Work from Office

Position Summary F5 Inc. is actively seeking an exceptional Sr Principal Software Engineer (Individual Contributor) to play a pivotal role in our SRE Operations team for the groundbreaking F5XC Product. Are you an SRE Operations specialist with automation in your DNA? Do you thrive in fast-paced SaaS environments where Why This Role is Unique: Our SaaS is hybrid running across public cloud and a global network of 50+ PoPs , delivering terabits of capacity . Our infrastructure spans cloud-native services and physical networking gear (routers, switches, firewalls), creating a uniquely challenging and exciting observability landscape. The Analytics & Observability platform will have deep reach across these layers , ensuring reliability, security, and performance at a massive scale. What Youll Do: Be the Force Behind Observability & Stability Drive end-to-end Observability (Logs, Metrics, and Alerts) across our hybrid SaaS stack , spanning cloud, edge, and physical network devices. Take ownership of Alerting strategy , cutting through noise while ensuring actionable, high-fidelity alerts. Implement intelligent automation to reduce operational toil and enhance real-time visibility. Own & Automate Operations Design, build, and manage automation for self-healing infrastructure across cloud + global PoPs. Develop automation for Kubernetes, ArgoCD, Helm Charts, Golang-based services, AWS, GCP, Terraform . Improve networking observability , ensuring our routers, switches, and firewalls are monitored at scale. Continuously eliminate manual ops work through automation and platform improvements. Lead Incident Response & Operational Excellence Participate in on-call rotations , ensuring rapid incident response across our cloud + edge stack. Drive incident response automation , reducing MTTR and increasing system resilience . Ensure security, compliance, and best practices in observability & automation . Collaborate & Mentor Work closely with application teams, network engineers, and SREs to improve reliability and performance. Mentor junior engineers, fostering a culture of automation-first thinking and deep observability . What Makes You a Great Fit? Deep expertise in Logs, Metrics, and Alerting, with a strong focus on Alerting automation . Experience in hybrid SaaS environments spanning cloud-native and global infrastructure. Strong background in Kubernetes, Infrastructure-as-Code (Terraform), Golang, AWS/GCP, and networking observability . Proven track record of eliminating toil and improving operational efficiency through automation. Passion for deep observability, networking-scale analytics, and automation at the edge .If you love solving reliability challenges at global scale, automating everything, and working in a hybrid cloud + networking environment , we want to talk to you!The About The Role is intended to be a general representation of the responsibilities and requirements of the job. However, the description may not be all-inclusive, and responsibilities and requirements are subject to change. Must-Have: Observability & Alerting Expertise Strong experience with Logs, Metrics, and Alerts , with a focus on high-fidelity alerting and automation . Automation & Infrastructure as Code Deep knowledge of Terraform, ArgoCD, Helm, Kubernetes, and Golang for automation . Cloud & Hybrid SaaS Experience Hands-on experience managing cloud-native (AWS/GCP) and edge infrastructure . Incident Response & Reliability Engineering Strong on-call experience , with a track record of reducing MTTR through automation Kubernetes Mastery Hands-on experience deploying, managing, and troubleshooting Kubernetes in production environments. Nice-to-Have: Networking & Edge Observability Familiarity with monitoring routers, switches, and firewalls in a global PoP environment . Data & Analytics in Observability Experience with time-series databases (Prometheus, Grafana, OpenTelemetry, etc.) . Security & Compliance Awareness Understanding of secure-by-design principles for monitoring & alerting . Mentorship & Collaboration Ability to mentor junior engineers and work cross-functionally with SREs, application teams, and network engineers . High Availability Disaster Recovery Experience with HA/DR and Migration Qualifications Typically, it requires at least 18 years of related experience with a bachelors degree, 15 years and a masters degree, or a PhD with 12 years experience; or equivalent experience. Excellent organizational agility and communication skills throughout the organization. Environment Empowered Work Culture: Experience an environment that values autonomy, fostering a culture where creativity and ownership are encouraged. Continuous Learning: Benefit from the mentorship of experienced professionals with solid backgrounds across diverse domains, supporting your professional growth. Team Cohesion: Join a collaborative and supportive team where youll feel at home from day one, contributing to a positive and inspiring workplace. F5 Networks, Inc. is an equal opportunity employer and strongly supports diversity in the workplace.

Posted 2 weeks ago

Apply

Sr. Principal Site Reliability Engineer F5

7.0 - 11.0 years

17 - 22 Lacs

Hassan

Work from Office

Position Summary F5 Inc. is actively seeking an exceptional Sr Principal Software Engineer (Individual Contributor) to play a pivotal role in our SRE Operations team for the groundbreaking F5XC Product. Are you an SRE Operations specialist with automation in your DNA? Do you thrive in fast-paced SaaS environments where Why This Role is Unique: Our SaaS is hybrid running across public cloud and a global network of 50+ PoPs , delivering terabits of capacity . Our infrastructure spans cloud-native services and physical networking gear (routers, switches, firewalls), creating a uniquely challenging and exciting observability landscape. The Analytics & Observability platform will have deep reach across these layers , ensuring reliability, security, and performance at a massive scale. What Youll Do: Be the Force Behind Observability & Stability Drive end-to-end Observability (Logs, Metrics, and Alerts) across our hybrid SaaS stack , spanning cloud, edge, and physical network devices. Take ownership of Alerting strategy , cutting through noise while ensuring actionable, high-fidelity alerts. Implement intelligent automation to reduce operational toil and enhance real-time visibility. Own & Automate Operations Design, build, and manage automation for self-healing infrastructure across cloud + global PoPs. Develop automation for Kubernetes, ArgoCD, Helm Charts, Golang-based services, AWS, GCP, Terraform . Improve networking observability , ensuring our routers, switches, and firewalls are monitored at scale. Continuously eliminate manual ops work through automation and platform improvements. Lead Incident Response & Operational Excellence Participate in on-call rotations , ensuring rapid incident response across our cloud + edge stack. Drive incident response automation , reducing MTTR and increasing system resilience . Ensure security, compliance, and best practices in observability & automation . Collaborate & Mentor Work closely with application teams, network engineers, and SREs to improve reliability and performance. Mentor junior engineers, fostering a culture of automation-first thinking and deep observability . What Makes You a Great Fit? Deep expertise in Logs, Metrics, and Alerting, with a strong focus on Alerting automation . Experience in hybrid SaaS environments spanning cloud-native and global infrastructure. Strong background in Kubernetes, Infrastructure-as-Code (Terraform), Golang, AWS/GCP, and networking observability . Proven track record of eliminating toil and improving operational efficiency through automation. Passion for deep observability, networking-scale analytics, and automation at the edge .If you love solving reliability challenges at global scale, automating everything, and working in a hybrid cloud + networking environment , we want to talk to you!The About The Role is intended to be a general representation of the responsibilities and requirements of the job. However, the description may not be all-inclusive, and responsibilities and requirements are subject to change. Must-Have: Observability & Alerting Expertise Strong experience with Logs, Metrics, and Alerts , with a focus on high-fidelity alerting and automation . Automation & Infrastructure as Code Deep knowledge of Terraform, ArgoCD, Helm, Kubernetes, and Golang for automation . Cloud & Hybrid SaaS Experience Hands-on experience managing cloud-native (AWS/GCP) and edge infrastructure . Incident Response & Reliability Engineering Strong on-call experience , with a track record of reducing MTTR through automation Kubernetes Mastery Hands-on experience deploying, managing, and troubleshooting Kubernetes in production environments. Nice-to-Have: Networking & Edge Observability Familiarity with monitoring routers, switches, and firewalls in a global PoP environment . Data & Analytics in Observability Experience with time-series databases (Prometheus, Grafana, OpenTelemetry, etc.) . Security & Compliance Awareness Understanding of secure-by-design principles for monitoring & alerting . Mentorship & Collaboration Ability to mentor junior engineers and work cross-functionally with SREs, application teams, and network engineers . High Availability Disaster Recovery Experience with HA/DR and Migration Qualifications Typically, it requires at least 18 years of related experience with a bachelors degree, 15 years and a masters degree, or a PhD with 12 years experience; or equivalent experience. Excellent organizational agility and communication skills throughout the organization. Environment Empowered Work Culture: Experience an environment that values autonomy, fostering a culture where creativity and ownership are encouraged. Continuous Learning: Benefit from the mentorship of experienced professionals with solid backgrounds across diverse domains, supporting your professional growth. Team Cohesion: Join a collaborative and supportive team where youll feel at home from day one, contributing to a positive and inspiring workplace. F5 Networks, Inc. is an equal opportunity employer and strongly supports diversity in the workplace.

Posted 2 weeks ago

Apply

Sr. Principal Site Reliability Engineer F5

7.0 - 11.0 years

17 - 22 Lacs

Faridabad

Work from Office

Position Summary F5 Inc. is actively seeking an exceptional Sr Principal Software Engineer (Individual Contributor) to play a pivotal role in our SRE Operations team for the groundbreaking F5XC Product. Are you an SRE Operations specialist with automation in your DNA? Do you thrive in fast-paced SaaS environments where Why This Role is Unique: Our SaaS is hybrid running across public cloud and a global network of 50+ PoPs , delivering terabits of capacity . Our infrastructure spans cloud-native services and physical networking gear (routers, switches, firewalls), creating a uniquely challenging and exciting observability landscape. The Analytics & Observability platform will have deep reach across these layers , ensuring reliability, security, and performance at a massive scale. What Youll Do: Be the Force Behind Observability & Stability Drive end-to-end Observability (Logs, Metrics, and Alerts) across our hybrid SaaS stack , spanning cloud, edge, and physical network devices. Take ownership of Alerting strategy , cutting through noise while ensuring actionable, high-fidelity alerts. Implement intelligent automation to reduce operational toil and enhance real-time visibility. Own & Automate Operations Design, build, and manage automation for self-healing infrastructure across cloud + global PoPs. Develop automation for Kubernetes, ArgoCD, Helm Charts, Golang-based services, AWS, GCP, Terraform . Improve networking observability , ensuring our routers, switches, and firewalls are monitored at scale. Continuously eliminate manual ops work through automation and platform improvements. Lead Incident Response & Operational Excellence Participate in on-call rotations , ensuring rapid incident response across our cloud + edge stack. Drive incident response automation , reducing MTTR and increasing system resilience . Ensure security, compliance, and best practices in observability & automation . Collaborate & Mentor Work closely with application teams, network engineers, and SREs to improve reliability and performance. Mentor junior engineers, fostering a culture of automation-first thinking and deep observability . What Makes You a Great Fit? Deep expertise in Logs, Metrics, and Alerting, with a strong focus on Alerting automation . Experience in hybrid SaaS environments spanning cloud-native and global infrastructure. Strong background in Kubernetes, Infrastructure-as-Code (Terraform), Golang, AWS/GCP, and networking observability . Proven track record of eliminating toil and improving operational efficiency through automation. Passion for deep observability, networking-scale analytics, and automation at the edge .If you love solving reliability challenges at global scale, automating everything, and working in a hybrid cloud + networking environment , we want to talk to you!The About The Role is intended to be a general representation of the responsibilities and requirements of the job. However, the description may not be all-inclusive, and responsibilities and requirements are subject to change. Must-Have: Observability & Alerting Expertise Strong experience with Logs, Metrics, and Alerts , with a focus on high-fidelity alerting and automation . Automation & Infrastructure as Code Deep knowledge of Terraform, ArgoCD, Helm, Kubernetes, and Golang for automation . Cloud & Hybrid SaaS Experience Hands-on experience managing cloud-native (AWS/GCP) and edge infrastructure . Incident Response & Reliability Engineering Strong on-call experience , with a track record of reducing MTTR through automation Kubernetes Mastery Hands-on experience deploying, managing, and troubleshooting Kubernetes in production environments. Nice-to-Have: Networking & Edge Observability Familiarity with monitoring routers, switches, and firewalls in a global PoP environment . Data & Analytics in Observability Experience with time-series databases (Prometheus, Grafana, OpenTelemetry, etc.) . Security & Compliance Awareness Understanding of secure-by-design principles for monitoring & alerting . Mentorship & Collaboration Ability to mentor junior engineers and work cross-functionally with SREs, application teams, and network engineers . High Availability Disaster Recovery Experience with HA/DR and Migration Qualifications Typically, it requires at least 18 years of related experience with a bachelors degree, 15 years and a masters degree, or a PhD with 12 years experience; or equivalent experience. Excellent organizational agility and communication skills throughout the organization. Environment Empowered Work Culture: Experience an environment that values autonomy, fostering a culture where creativity and ownership are encouraged. Continuous Learning: Benefit from the mentorship of experienced professionals with solid backgrounds across diverse domains, supporting your professional growth. Team Cohesion: Join a collaborative and supportive team where youll feel at home from day one, contributing to a positive and inspiring workplace. F5 Networks, Inc. is an equal opportunity employer and strongly supports diversity in the workplace.

Posted 2 weeks ago

Apply

Sr. Principal Site Reliability Engineer F5

7.0 - 11.0 years

17 - 22 Lacs

Ghaziabad

Work from Office

Position Summary F5 Inc. is actively seeking an exceptional Sr Principal Software Engineer (Individual Contributor) to play a pivotal role in our SRE Operations team for the groundbreaking F5XC Product. Are you an SRE Operations specialist with automation in your DNA? Do you thrive in fast-paced SaaS environments where Why This Role is Unique: Our SaaS is hybrid running across public cloud and a global network of 50+ PoPs , delivering terabits of capacity . Our infrastructure spans cloud-native services and physical networking gear (routers, switches, firewalls), creating a uniquely challenging and exciting observability landscape. The Analytics & Observability platform will have deep reach across these layers , ensuring reliability, security, and performance at a massive scale. What Youll Do: Be the Force Behind Observability & Stability Drive end-to-end Observability (Logs, Metrics, and Alerts) across our hybrid SaaS stack , spanning cloud, edge, and physical network devices. Take ownership of Alerting strategy , cutting through noise while ensuring actionable, high-fidelity alerts. Implement intelligent automation to reduce operational toil and enhance real-time visibility. Own & Automate Operations Design, build, and manage automation for self-healing infrastructure across cloud + global PoPs. Develop automation for Kubernetes, ArgoCD, Helm Charts, Golang-based services, AWS, GCP, Terraform . Improve networking observability , ensuring our routers, switches, and firewalls are monitored at scale. Continuously eliminate manual ops work through automation and platform improvements. Lead Incident Response & Operational Excellence Participate in on-call rotations , ensuring rapid incident response across our cloud + edge stack. Drive incident response automation , reducing MTTR and increasing system resilience . Ensure security, compliance, and best practices in observability & automation . Collaborate & Mentor Work closely with application teams, network engineers, and SREs to improve reliability and performance. Mentor junior engineers, fostering a culture of automation-first thinking and deep observability . What Makes You a Great Fit? Deep expertise in Logs, Metrics, and Alerting, with a strong focus on Alerting automation . Experience in hybrid SaaS environments spanning cloud-native and global infrastructure. Strong background in Kubernetes, Infrastructure-as-Code (Terraform), Golang, AWS/GCP, and networking observability . Proven track record of eliminating toil and improving operational efficiency through automation. Passion for deep observability, networking-scale analytics, and automation at the edge .If you love solving reliability challenges at global scale, automating everything, and working in a hybrid cloud + networking environment , we want to talk to you!The About The Role is intended to be a general representation of the responsibilities and requirements of the job. However, the description may not be all-inclusive, and responsibilities and requirements are subject to change. Must-Have: Observability & Alerting Expertise Strong experience with Logs, Metrics, and Alerts , with a focus on high-fidelity alerting and automation . Automation & Infrastructure as Code Deep knowledge of Terraform, ArgoCD, Helm, Kubernetes, and Golang for automation . Cloud & Hybrid SaaS Experience Hands-on experience managing cloud-native (AWS/GCP) and edge infrastructure . Incident Response & Reliability Engineering Strong on-call experience , with a track record of reducing MTTR through automation Kubernetes Mastery Hands-on experience deploying, managing, and troubleshooting Kubernetes in production environments. Nice-to-Have: Networking & Edge Observability Familiarity with monitoring routers, switches, and firewalls in a global PoP environment . Data & Analytics in Observability Experience with time-series databases (Prometheus, Grafana, OpenTelemetry, etc.) . Security & Compliance Awareness Understanding of secure-by-design principles for monitoring & alerting . Mentorship & Collaboration Ability to mentor junior engineers and work cross-functionally with SREs, application teams, and network engineers . High Availability Disaster Recovery Experience with HA/DR and Migration Qualifications Typically, it requires at least 18 years of related experience with a bachelors degree, 15 years and a masters degree, or a PhD with 12 years experience; or equivalent experience. Excellent organizational agility and communication skills throughout the organization. Environment Empowered Work Culture: Experience an environment that values autonomy, fostering a culture where creativity and ownership are encouraged. Continuous Learning: Benefit from the mentorship of experienced professionals with solid backgrounds across diverse domains, supporting your professional growth. Team Cohesion: Join a collaborative and supportive team where youll feel at home from day one, contributing to a positive and inspiring workplace. F5 Networks, Inc. is an equal opportunity employer and strongly supports diversity in the workplace.

Posted 2 weeks ago

Apply

Sr. Principal Site Reliability Engineer F5

7.0 - 11.0 years

17 - 22 Lacs

Chittoor

Work from Office

Position Summary F5 Inc. is actively seeking an exceptional Sr Principal Software Engineer (Individual Contributor) to play a pivotal role in our SRE Operations team for the groundbreaking F5XC Product. Are you an SRE Operations specialist with automation in your DNA? Do you thrive in fast-paced SaaS environments where Why This Role is Unique: Our SaaS is hybrid running across public cloud and a global network of 50+ PoPs , delivering terabits of capacity . Our infrastructure spans cloud-native services and physical networking gear (routers, switches, firewalls), creating a uniquely challenging and exciting observability landscape. The Analytics & Observability platform will have deep reach across these layers , ensuring reliability, security, and performance at a massive scale. What Youll Do: Be the Force Behind Observability & Stability Drive end-to-end Observability (Logs, Metrics, and Alerts) across our hybrid SaaS stack , spanning cloud, edge, and physical network devices. Take ownership of Alerting strategy , cutting through noise while ensuring actionable, high-fidelity alerts. Implement intelligent automation to reduce operational toil and enhance real-time visibility. Own & Automate Operations Design, build, and manage automation for self-healing infrastructure across cloud + global PoPs. Develop automation for Kubernetes, ArgoCD, Helm Charts, Golang-based services, AWS, GCP, Terraform . Improve networking observability , ensuring our routers, switches, and firewalls are monitored at scale. Continuously eliminate manual ops work through automation and platform improvements. Lead Incident Response & Operational Excellence Participate in on-call rotations , ensuring rapid incident response across our cloud + edge stack. Drive incident response automation , reducing MTTR and increasing system resilience . Ensure security, compliance, and best practices in observability & automation . Collaborate & Mentor Work closely with application teams, network engineers, and SREs to improve reliability and performance. Mentor junior engineers, fostering a culture of automation-first thinking and deep observability . What Makes You a Great Fit? Deep expertise in Logs, Metrics, and Alerting, with a strong focus on Alerting automation . Experience in hybrid SaaS environments spanning cloud-native and global infrastructure. Strong background in Kubernetes, Infrastructure-as-Code (Terraform), Golang, AWS/GCP, and networking observability . Proven track record of eliminating toil and improving operational efficiency through automation. Passion for deep observability, networking-scale analytics, and automation at the edge .If you love solving reliability challenges at global scale, automating everything, and working in a hybrid cloud + networking environment , we want to talk to you!The About The Role is intended to be a general representation of the responsibilities and requirements of the job. However, the description may not be all-inclusive, and responsibilities and requirements are subject to change. Must-Have: Observability & Alerting Expertise Strong experience with Logs, Metrics, and Alerts , with a focus on high-fidelity alerting and automation . Automation & Infrastructure as Code Deep knowledge of Terraform, ArgoCD, Helm, Kubernetes, and Golang for automation . Cloud & Hybrid SaaS Experience Hands-on experience managing cloud-native (AWS/GCP) and edge infrastructure . Incident Response & Reliability Engineering Strong on-call experience , with a track record of reducing MTTR through automation Kubernetes Mastery Hands-on experience deploying, managing, and troubleshooting Kubernetes in production environments. Nice-to-Have: Networking & Edge Observability Familiarity with monitoring routers, switches, and firewalls in a global PoP environment . Data & Analytics in Observability Experience with time-series databases (Prometheus, Grafana, OpenTelemetry, etc.) . Security & Compliance Awareness Understanding of secure-by-design principles for monitoring & alerting . Mentorship & Collaboration Ability to mentor junior engineers and work cross-functionally with SREs, application teams, and network engineers . High Availability Disaster Recovery Experience with HA/DR and Migration Qualifications Typically, it requires at least 18 years of related experience with a bachelors degree, 15 years and a masters degree, or a PhD with 12 years experience; or equivalent experience. Excellent organizational agility and communication skills throughout the organization. Environment Empowered Work Culture: Experience an environment that values autonomy, fostering a culture where creativity and ownership are encouraged. Continuous Learning: Benefit from the mentorship of experienced professionals with solid backgrounds across diverse domains, supporting your professional growth. Team Cohesion: Join a collaborative and supportive team where youll feel at home from day one, contributing to a positive and inspiring workplace. F5 Networks, Inc. is an equal opportunity employer and strongly supports diversity in the workplace.

Posted 2 weeks ago

Apply

Sr. Principal Site Reliability Engineer F5

7.0 - 11.0 years

17 - 22 Lacs

Gurugram

Work from Office

Position Summary F5 Inc. is actively seeking an exceptional Sr Principal Software Engineer (Individual Contributor) to play a pivotal role in our SRE Operations team for the groundbreaking F5XC Product. Are you an SRE Operations specialist with automation in your DNA? Do you thrive in fast-paced SaaS environments where Why This Role is Unique: Our SaaS is hybrid running across public cloud and a global network of 50+ PoPs , delivering terabits of capacity . Our infrastructure spans cloud-native services and physical networking gear (routers, switches, firewalls), creating a uniquely challenging and exciting observability landscape. The Analytics & Observability platform will have deep reach across these layers , ensuring reliability, security, and performance at a massive scale. What Youll Do: Be the Force Behind Observability & Stability Drive end-to-end Observability (Logs, Metrics, and Alerts) across our hybrid SaaS stack , spanning cloud, edge, and physical network devices. Take ownership of Alerting strategy , cutting through noise while ensuring actionable, high-fidelity alerts. Implement intelligent automation to reduce operational toil and enhance real-time visibility. Own & Automate Operations Design, build, and manage automation for self-healing infrastructure across cloud + global PoPs. Develop automation for Kubernetes, ArgoCD, Helm Charts, Golang-based services, AWS, GCP, Terraform . Improve networking observability , ensuring our routers, switches, and firewalls are monitored at scale. Continuously eliminate manual ops work through automation and platform improvements. Lead Incident Response & Operational Excellence Participate in on-call rotations , ensuring rapid incident response across our cloud + edge stack. Drive incident response automation , reducing MTTR and increasing system resilience . Ensure security, compliance, and best practices in observability & automation . Collaborate & Mentor Work closely with application teams, network engineers, and SREs to improve reliability and performance. Mentor junior engineers, fostering a culture of automation-first thinking and deep observability . What Makes You a Great Fit? Deep expertise in Logs, Metrics, and Alerting, with a strong focus on Alerting automation . Experience in hybrid SaaS environments spanning cloud-native and global infrastructure. Strong background in Kubernetes, Infrastructure-as-Code (Terraform), Golang, AWS/GCP, and networking observability . Proven track record of eliminating toil and improving operational efficiency through automation. Passion for deep observability, networking-scale analytics, and automation at the edge .If you love solving reliability challenges at global scale, automating everything, and working in a hybrid cloud + networking environment , we want to talk to you!The About The Role is intended to be a general representation of the responsibilities and requirements of the job. However, the description may not be all-inclusive, and responsibilities and requirements are subject to change. Must-Have: Observability & Alerting Expertise Strong experience with Logs, Metrics, and Alerts , with a focus on high-fidelity alerting and automation . Automation & Infrastructure as Code Deep knowledge of Terraform, ArgoCD, Helm, Kubernetes, and Golang for automation . Cloud & Hybrid SaaS Experience Hands-on experience managing cloud-native (AWS/GCP) and edge infrastructure . Incident Response & Reliability Engineering Strong on-call experience , with a track record of reducing MTTR through automation Kubernetes Mastery Hands-on experience deploying, managing, and troubleshooting Kubernetes in production environments. Nice-to-Have: Networking & Edge Observability Familiarity with monitoring routers, switches, and firewalls in a global PoP environment . Data & Analytics in Observability Experience with time-series databases (Prometheus, Grafana, OpenTelemetry, etc.) . Security & Compliance Awareness Understanding of secure-by-design principles for monitoring & alerting . Mentorship & Collaboration Ability to mentor junior engineers and work cross-functionally with SREs, application teams, and network engineers . High Availability Disaster Recovery Experience with HA/DR and Migration Qualifications Typically, it requires at least 18 years of related experience with a bachelors degree, 15 years and a masters degree, or a PhD with 12 years experience; or equivalent experience. Excellent organizational agility and communication skills throughout the organization. Environment Empowered Work Culture: Experience an environment that values autonomy, fostering a culture where creativity and ownership are encouraged. Continuous Learning: Benefit from the mentorship of experienced professionals with solid backgrounds across diverse domains, supporting your professional growth. Team Cohesion: Join a collaborative and supportive team where youll feel at home from day one, contributing to a positive and inspiring workplace. F5 Networks, Inc. is an equal opportunity employer and strongly supports diversity in the workplace.

Posted 2 weeks ago

Apply

Sr. Principal Site Reliability Engineer F5

7.0 - 11.0 years

17 - 22 Lacs

Pune

Work from Office

Position Summary F5 Inc. is actively seeking an exceptional Sr Principal Software Engineer (Individual Contributor) to play a pivotal role in our SRE Operations team for the groundbreaking F5XC Product. Are you an SRE Operations specialist with automation in your DNA? Do you thrive in fast-paced SaaS environments where Why This Role is Unique: Our SaaS is hybrid running across public cloud and a global network of 50+ PoPs , delivering terabits of capacity . Our infrastructure spans cloud-native services and physical networking gear (routers, switches, firewalls), creating a uniquely challenging and exciting observability landscape. The Analytics & Observability platform will have deep reach across these layers , ensuring reliability, security, and performance at a massive scale. What Youll Do: Be the Force Behind Observability & Stability Drive end-to-end Observability (Logs, Metrics, and Alerts) across our hybrid SaaS stack , spanning cloud, edge, and physical network devices. Take ownership of Alerting strategy , cutting through noise while ensuring actionable, high-fidelity alerts. Implement intelligent automation to reduce operational toil and enhance real-time visibility. Own & Automate Operations Design, build, and manage automation for self-healing infrastructure across cloud + global PoPs. Develop automation for Kubernetes, ArgoCD, Helm Charts, Golang-based services, AWS, GCP, Terraform . Improve networking observability , ensuring our routers, switches, and firewalls are monitored at scale. Continuously eliminate manual ops work through automation and platform improvements. Lead Incident Response & Operational Excellence Participate in on-call rotations , ensuring rapid incident response across our cloud + edge stack. Drive incident response automation , reducing MTTR and increasing system resilience . Ensure security, compliance, and best practices in observability & automation . Collaborate & Mentor Work closely with application teams, network engineers, and SREs to improve reliability and performance. Mentor junior engineers, fostering a culture of automation-first thinking and deep observability . What Makes You a Great Fit? Deep expertise in Logs, Metrics, and Alerting, with a strong focus on Alerting automation . Experience in hybrid SaaS environments spanning cloud-native and global infrastructure. Strong background in Kubernetes, Infrastructure-as-Code (Terraform), Golang, AWS/GCP, and networking observability . Proven track record of eliminating toil and improving operational efficiency through automation. Passion for deep observability, networking-scale analytics, and automation at the edge .If you love solving reliability challenges at global scale, automating everything, and working in a hybrid cloud + networking environment , we want to talk to you!The About The Role is intended to be a general representation of the responsibilities and requirements of the job. However, the description may not be all-inclusive, and responsibilities and requirements are subject to change. Must-Have: Observability & Alerting Expertise Strong experience with Logs, Metrics, and Alerts , with a focus on high-fidelity alerting and automation . Automation & Infrastructure as Code Deep knowledge of Terraform, ArgoCD, Helm, Kubernetes, and Golang for automation . Cloud & Hybrid SaaS Experience Hands-on experience managing cloud-native (AWS/GCP) and edge infrastructure . Incident Response & Reliability Engineering Strong on-call experience , with a track record of reducing MTTR through automation Kubernetes Mastery Hands-on experience deploying, managing, and troubleshooting Kubernetes in production environments. Nice-to-Have: Networking & Edge Observability Familiarity with monitoring routers, switches, and firewalls in a global PoP environment . Data & Analytics in Observability Experience with time-series databases (Prometheus, Grafana, OpenTelemetry, etc.) . Security & Compliance Awareness Understanding of secure-by-design principles for monitoring & alerting . Mentorship & Collaboration Ability to mentor junior engineers and work cross-functionally with SREs, application teams, and network engineers . High Availability Disaster Recovery Experience with HA/DR and Migration Qualifications Typically, it requires at least 18 years of related experience with a bachelors degree, 15 years and a masters degree, or a PhD with 12 years experience; or equivalent experience. Excellent organizational agility and communication skills throughout the organization. Environment Empowered Work Culture: Experience an environment that values autonomy, fostering a culture where creativity and ownership are encouraged. Continuous Learning: Benefit from the mentorship of experienced professionals with solid backgrounds across diverse domains, supporting your professional growth. Team Cohesion: Join a collaborative and supportive team where youll feel at home from day one, contributing to a positive and inspiring workplace. F5 Networks, Inc. is an equal opportunity employer and strongly supports diversity in the workplace.

Posted 2 weeks ago

Apply

Sr. Principal Site Reliability Engineer F5

7.0 - 11.0 years

17 - 22 Lacs

Nashik

Work from Office

Position Summary F5 Inc. is actively seeking an exceptional Sr Principal Software Engineer (Individual Contributor) to play a pivotal role in our SRE Operations team for the groundbreaking F5XC Product. Are you an SRE Operations specialist with automation in your DNA? Do you thrive in fast-paced SaaS environments where Why This Role is Unique: Our SaaS is hybrid running across public cloud and a global network of 50+ PoPs , delivering terabits of capacity . Our infrastructure spans cloud-native services and physical networking gear (routers, switches, firewalls), creating a uniquely challenging and exciting observability landscape. The Analytics & Observability platform will have deep reach across these layers , ensuring reliability, security, and performance at a massive scale. What Youll Do: Be the Force Behind Observability & Stability Drive end-to-end Observability (Logs, Metrics, and Alerts) across our hybrid SaaS stack , spanning cloud, edge, and physical network devices. Take ownership of Alerting strategy , cutting through noise while ensuring actionable, high-fidelity alerts. Implement intelligent automation to reduce operational toil and enhance real-time visibility. Own & Automate Operations Design, build, and manage automation for self-healing infrastructure across cloud + global PoPs. Develop automation for Kubernetes, ArgoCD, Helm Charts, Golang-based services, AWS, GCP, Terraform . Improve networking observability , ensuring our routers, switches, and firewalls are monitored at scale. Continuously eliminate manual ops work through automation and platform improvements. Lead Incident Response & Operational Excellence Participate in on-call rotations , ensuring rapid incident response across our cloud + edge stack. Drive incident response automation , reducing MTTR and increasing system resilience . Ensure security, compliance, and best practices in observability & automation . Collaborate & Mentor Work closely with application teams, network engineers, and SREs to improve reliability and performance. Mentor junior engineers, fostering a culture of automation-first thinking and deep observability . What Makes You a Great Fit? Deep expertise in Logs, Metrics, and Alerting, with a strong focus on Alerting automation . Experience in hybrid SaaS environments spanning cloud-native and global infrastructure. Strong background in Kubernetes, Infrastructure-as-Code (Terraform), Golang, AWS/GCP, and networking observability . Proven track record of eliminating toil and improving operational efficiency through automation. Passion for deep observability, networking-scale analytics, and automation at the edge .If you love solving reliability challenges at global scale, automating everything, and working in a hybrid cloud + networking environment , we want to talk to you!The About The Role is intended to be a general representation of the responsibilities and requirements of the job. However, the description may not be all-inclusive, and responsibilities and requirements are subject to change. Must-Have: Observability & Alerting Expertise Strong experience with Logs, Metrics, and Alerts , with a focus on high-fidelity alerting and automation . Automation & Infrastructure as Code Deep knowledge of Terraform, ArgoCD, Helm, Kubernetes, and Golang for automation . Cloud & Hybrid SaaS Experience Hands-on experience managing cloud-native (AWS/GCP) and edge infrastructure . Incident Response & Reliability Engineering Strong on-call experience , with a track record of reducing MTTR through automation Kubernetes Mastery Hands-on experience deploying, managing, and troubleshooting Kubernetes in production environments. Nice-to-Have: Networking & Edge Observability Familiarity with monitoring routers, switches, and firewalls in a global PoP environment . Data & Analytics in Observability Experience with time-series databases (Prometheus, Grafana, OpenTelemetry, etc.) . Security & Compliance Awareness Understanding of secure-by-design principles for monitoring & alerting . Mentorship & Collaboration Ability to mentor junior engineers and work cross-functionally with SREs, application teams, and network engineers . High Availability Disaster Recovery Experience with HA/DR and Migration Qualifications Typically, it requires at least 18 years of related experience with a bachelors degree, 15 years and a masters degree, or a PhD with 12 years experience; or equivalent experience. Excellent organizational agility and communication skills throughout the organization. Environment Empowered Work Culture: Experience an environment that values autonomy, fostering a culture where creativity and ownership are encouraged. Continuous Learning: Benefit from the mentorship of experienced professionals with solid backgrounds across diverse domains, supporting your professional growth. Team Cohesion: Join a collaborative and supportive team where youll feel at home from day one, contributing to a positive and inspiring workplace. F5 Networks, Inc. is an equal opportunity employer and strongly supports diversity in the workplace.

Posted 2 weeks ago

Apply

Sr. Principal Site Reliability Engineer F5

7.0 - 11.0 years

17 - 22 Lacs

Navi Mumbai

Work from Office

Position Summary F5 Inc. is actively seeking an exceptional Sr Principal Software Engineer (Individual Contributor) to play a pivotal role in our SRE Operations team for the groundbreaking F5XC Product. Are you an SRE Operations specialist with automation in your DNA? Do you thrive in fast-paced SaaS environments where Why This Role is Unique: Our SaaS is hybrid running across public cloud and a global network of 50+ PoPs , delivering terabits of capacity . Our infrastructure spans cloud-native services and physical networking gear (routers, switches, firewalls), creating a uniquely challenging and exciting observability landscape. The Analytics & Observability platform will have deep reach across these layers , ensuring reliability, security, and performance at a massive scale. What Youll Do: Be the Force Behind Observability & Stability Drive end-to-end Observability (Logs, Metrics, and Alerts) across our hybrid SaaS stack , spanning cloud, edge, and physical network devices. Take ownership of Alerting strategy , cutting through noise while ensuring actionable, high-fidelity alerts. Implement intelligent automation to reduce operational toil and enhance real-time visibility. Own & Automate Operations Design, build, and manage automation for self-healing infrastructure across cloud + global PoPs. Develop automation for Kubernetes, ArgoCD, Helm Charts, Golang-based services, AWS, GCP, Terraform . Improve networking observability , ensuring our routers, switches, and firewalls are monitored at scale. Continuously eliminate manual ops work through automation and platform improvements. Lead Incident Response & Operational Excellence Participate in on-call rotations , ensuring rapid incident response across our cloud + edge stack. Drive incident response automation , reducing MTTR and increasing system resilience . Ensure security, compliance, and best practices in observability & automation . Collaborate & Mentor Work closely with application teams, network engineers, and SREs to improve reliability and performance. Mentor junior engineers, fostering a culture of automation-first thinking and deep observability . What Makes You a Great Fit? Deep expertise in Logs, Metrics, and Alerting, with a strong focus on Alerting automation . Experience in hybrid SaaS environments spanning cloud-native and global infrastructure. Strong background in Kubernetes, Infrastructure-as-Code (Terraform), Golang, AWS/GCP, and networking observability . Proven track record of eliminating toil and improving operational efficiency through automation. Passion for deep observability, networking-scale analytics, and automation at the edge .If you love solving reliability challenges at global scale, automating everything, and working in a hybrid cloud + networking environment , we want to talk to you!The About The Role is intended to be a general representation of the responsibilities and requirements of the job. However, the description may not be all-inclusive, and responsibilities and requirements are subject to change. Must-Have: Observability & Alerting Expertise Strong experience with Logs, Metrics, and Alerts , with a focus on high-fidelity alerting and automation . Automation & Infrastructure as Code Deep knowledge of Terraform, ArgoCD, Helm, Kubernetes, and Golang for automation . Cloud & Hybrid SaaS Experience Hands-on experience managing cloud-native (AWS/GCP) and edge infrastructure . Incident Response & Reliability Engineering Strong on-call experience , with a track record of reducing MTTR through automation Kubernetes Mastery Hands-on experience deploying, managing, and troubleshooting Kubernetes in production environments. Nice-to-Have: Networking & Edge Observability Familiarity with monitoring routers, switches, and firewalls in a global PoP environment . Data & Analytics in Observability Experience with time-series databases (Prometheus, Grafana, OpenTelemetry, etc.) . Security & Compliance Awareness Understanding of secure-by-design principles for monitoring & alerting . Mentorship & Collaboration Ability to mentor junior engineers and work cross-functionally with SREs, application teams, and network engineers . High Availability Disaster Recovery Experience with HA/DR and Migration Qualifications Typically, it requires at least 18 years of related experience with a bachelors degree, 15 years and a masters degree, or a PhD with 12 years experience; or equivalent experience. Excellent organizational agility and communication skills throughout the organization. Environment Empowered Work Culture: Experience an environment that values autonomy, fostering a culture where creativity and ownership are encouraged. Continuous Learning: Benefit from the mentorship of experienced professionals with solid backgrounds across diverse domains, supporting your professional growth. Team Cohesion: Join a collaborative and supportive team where youll feel at home from day one, contributing to a positive and inspiring workplace. F5 Networks, Inc. is an equal opportunity employer and strongly supports diversity in the workplace.

Posted 2 weeks ago

Apply

Sr. Principal Site Reliability Engineer F5

7.0 - 11.0 years

17 - 22 Lacs

Thane

Work from Office

Position Summary F5 Inc. is actively seeking an exceptional Sr Principal Software Engineer (Individual Contributor) to play a pivotal role in our SRE Operations team for the groundbreaking F5XC Product. Are you an SRE Operations specialist with automation in your DNA? Do you thrive in fast-paced SaaS environments where Why This Role is Unique: Our SaaS is hybrid running across public cloud and a global network of 50+ PoPs , delivering terabits of capacity . Our infrastructure spans cloud-native services and physical networking gear (routers, switches, firewalls), creating a uniquely challenging and exciting observability landscape. The Analytics & Observability platform will have deep reach across these layers , ensuring reliability, security, and performance at a massive scale. What Youll Do: Be the Force Behind Observability & Stability Drive end-to-end Observability (Logs, Metrics, and Alerts) across our hybrid SaaS stack , spanning cloud, edge, and physical network devices. Take ownership of Alerting strategy , cutting through noise while ensuring actionable, high-fidelity alerts. Implement intelligent automation to reduce operational toil and enhance real-time visibility. Own & Automate Operations Design, build, and manage automation for self-healing infrastructure across cloud + global PoPs. Develop automation for Kubernetes, ArgoCD, Helm Charts, Golang-based services, AWS, GCP, Terraform . Improve networking observability , ensuring our routers, switches, and firewalls are monitored at scale. Continuously eliminate manual ops work through automation and platform improvements. Lead Incident Response & Operational Excellence Participate in on-call rotations , ensuring rapid incident response across our cloud + edge stack. Drive incident response automation , reducing MTTR and increasing system resilience . Ensure security, compliance, and best practices in observability & automation . Collaborate & Mentor Work closely with application teams, network engineers, and SREs to improve reliability and performance. Mentor junior engineers, fostering a culture of automation-first thinking and deep observability . What Makes You a Great Fit? Deep expertise in Logs, Metrics, and Alerting, with a strong focus on Alerting automation . Experience in hybrid SaaS environments spanning cloud-native and global infrastructure. Strong background in Kubernetes, Infrastructure-as-Code (Terraform), Golang, AWS/GCP, and networking observability . Proven track record of eliminating toil and improving operational efficiency through automation. Passion for deep observability, networking-scale analytics, and automation at the edge .If you love solving reliability challenges at global scale, automating everything, and working in a hybrid cloud + networking environment , we want to talk to you!The About The Role is intended to be a general representation of the responsibilities and requirements of the job. However, the description may not be all-inclusive, and responsibilities and requirements are subject to change. Must-Have: Observability & Alerting Expertise Strong experience with Logs, Metrics, and Alerts , with a focus on high-fidelity alerting and automation . Automation & Infrastructure as Code Deep knowledge of Terraform, ArgoCD, Helm, Kubernetes, and Golang for automation . Cloud & Hybrid SaaS Experience Hands-on experience managing cloud-native (AWS/GCP) and edge infrastructure . Incident Response & Reliability Engineering Strong on-call experience , with a track record of reducing MTTR through automation Kubernetes Mastery Hands-on experience deploying, managing, and troubleshooting Kubernetes in production environments. Nice-to-Have: Networking & Edge Observability Familiarity with monitoring routers, switches, and firewalls in a global PoP environment . Data & Analytics in Observability Experience with time-series databases (Prometheus, Grafana, OpenTelemetry, etc.) . Security & Compliance Awareness Understanding of secure-by-design principles for monitoring & alerting . Mentorship & Collaboration Ability to mentor junior engineers and work cross-functionally with SREs, application teams, and network engineers . High Availability Disaster Recovery Experience with HA/DR and Migration Qualifications Typically, it requires at least 18 years of related experience with a bachelors degree, 15 years and a masters degree, or a PhD with 12 years experience; or equivalent experience. Excellent organizational agility and communication skills throughout the organization. Environment Empowered Work Culture: Experience an environment that values autonomy, fostering a culture where creativity and ownership are encouraged. Continuous Learning: Benefit from the mentorship of experienced professionals with solid backgrounds across diverse domains, supporting your professional growth. Team Cohesion: Join a collaborative and supportive team where youll feel at home from day one, contributing to a positive and inspiring workplace. F5 Networks, Inc. is an equal opportunity employer and strongly supports diversity in the workplace.

Posted 2 weeks ago

Apply

Sr. Principal Site Reliability Engineer F5

7.0 - 11.0 years

17 - 22 Lacs

Bengaluru

Work from Office

Position Summary F5 Inc. is actively seeking an exceptional Sr Principal Software Engineer (Individual Contributor) to play a pivotal role in our SRE Operations team for the groundbreaking F5XC Product. Are you an SRE Operations specialist with automation in your DNA? Do you thrive in fast-paced SaaS environments where Why This Role is Unique: Our SaaS is hybrid running across public cloud and a global network of 50+ PoPs , delivering terabits of capacity . Our infrastructure spans cloud-native services and physical networking gear (routers, switches, firewalls), creating a uniquely challenging and exciting observability landscape. The Analytics & Observability platform will have deep reach across these layers , ensuring reliability, security, and performance at a massive scale. What Youll Do: Be the Force Behind Observability & Stability Drive end-to-end Observability (Logs, Metrics, and Alerts) across our hybrid SaaS stack , spanning cloud, edge, and physical network devices. Take ownership of Alerting strategy , cutting through noise while ensuring actionable, high-fidelity alerts. Implement intelligent automation to reduce operational toil and enhance real-time visibility. Own & Automate Operations Design, build, and manage automation for self-healing infrastructure across cloud + global PoPs. Develop automation for Kubernetes, ArgoCD, Helm Charts, Golang-based services, AWS, GCP, Terraform . Improve networking observability , ensuring our routers, switches, and firewalls are monitored at scale. Continuously eliminate manual ops work through automation and platform improvements. Lead Incident Response & Operational Excellence Participate in on-call rotations , ensuring rapid incident response across our cloud + edge stack. Drive incident response automation , reducing MTTR and increasing system resilience . Ensure security, compliance, and best practices in observability & automation . Collaborate & Mentor Work closely with application teams, network engineers, and SREs to improve reliability and performance. Mentor junior engineers, fostering a culture of automation-first thinking and deep observability . What Makes You a Great Fit? Deep expertise in Logs, Metrics, and Alerting, with a strong focus on Alerting automation . Experience in hybrid SaaS environments spanning cloud-native and global infrastructure. Strong background in Kubernetes, Infrastructure-as-Code (Terraform), Golang, AWS/GCP, and networking observability . Proven track record of eliminating toil and improving operational efficiency through automation. Passion for deep observability, networking-scale analytics, and automation at the edge .If you love solving reliability challenges at global scale, automating everything, and working in a hybrid cloud + networking environment , we want to talk to you!The About The Role is intended to be a general representation of the responsibilities and requirements of the job. However, the description may not be all-inclusive, and responsibilities and requirements are subject to change. Must-Have: Observability & Alerting Expertise Strong experience with Logs, Metrics, and Alerts , with a focus on high-fidelity alerting and automation . Automation & Infrastructure as Code Deep knowledge of Terraform, ArgoCD, Helm, Kubernetes, and Golang for automation . Cloud & Hybrid SaaS Experience Hands-on experience managing cloud-native (AWS/GCP) and edge infrastructure . Incident Response & Reliability Engineering Strong on-call experience , with a track record of reducing MTTR through automation Kubernetes Mastery Hands-on experience deploying, managing, and troubleshooting Kubernetes in production environments. Nice-to-Have: Networking & Edge Observability Familiarity with monitoring routers, switches, and firewalls in a global PoP environment . Data & Analytics in Observability Experience with time-series databases (Prometheus, Grafana, OpenTelemetry, etc.) . Security & Compliance Awareness Understanding of secure-by-design principles for monitoring & alerting . Mentorship & Collaboration Ability to mentor junior engineers and work cross-functionally with SREs, application teams, and network engineers . High Availability Disaster Recovery Experience with HA/DR and Migration Qualifications Typically, it requires at least 18 years of related experience with a bachelors degree, 15 years and a masters degree, or a PhD with 12 years experience; or equivalent experience. Excellent organizational agility and communication skills throughout the organization. Environment Empowered Work Culture: Experience an environment that values autonomy, fostering a culture where creativity and ownership are encouraged. Continuous Learning: Benefit from the mentorship of experienced professionals with solid backgrounds across diverse domains, supporting your professional growth. Team Cohesion: Join a collaborative and supportive team where youll feel at home from day one, contributing to a positive and inspiring workplace. F5 Networks, Inc. is an equal opportunity employer and strongly supports diversity in the workplace.

Posted 2 weeks ago

Apply

Sr. Principal Site Reliability Engineer F5

7.0 - 11.0 years

17 - 22 Lacs

Hyderabad

Work from Office

Position Summary F5 Inc. is actively seeking an exceptional Sr Principal Software Engineer (Individual Contributor) to play a pivotal role in our SRE Operations team for the groundbreaking F5XC Product. Are you an SRE Operations specialist with automation in your DNA? Do you thrive in fast-paced SaaS environments where Why This Role is Unique: Our SaaS is hybrid running across public cloud and a global network of 50+ PoPs , delivering terabits of capacity . Our infrastructure spans cloud-native services and physical networking gear (routers, switches, firewalls), creating a uniquely challenging and exciting observability landscape. The Analytics & Observability platform will have deep reach across these layers , ensuring reliability, security, and performance at a massive scale. What Youll Do: Be the Force Behind Observability & Stability Drive end-to-end Observability (Logs, Metrics, and Alerts) across our hybrid SaaS stack , spanning cloud, edge, and physical network devices. Take ownership of Alerting strategy , cutting through noise while ensuring actionable, high-fidelity alerts. Implement intelligent automation to reduce operational toil and enhance real-time visibility. Own & Automate Operations Design, build, and manage automation for self-healing infrastructure across cloud + global PoPs. Develop automation for Kubernetes, ArgoCD, Helm Charts, Golang-based services, AWS, GCP, Terraform . Improve networking observability , ensuring our routers, switches, and firewalls are monitored at scale. Continuously eliminate manual ops work through automation and platform improvements. Lead Incident Response & Operational Excellence Participate in on-call rotations , ensuring rapid incident response across our cloud + edge stack. Drive incident response automation , reducing MTTR and increasing system resilience . Ensure security, compliance, and best practices in observability & automation . Collaborate & Mentor Work closely with application teams, network engineers, and SREs to improve reliability and performance. Mentor junior engineers, fostering a culture of automation-first thinking and deep observability . What Makes You a Great Fit? Deep expertise in Logs, Metrics, and Alerting, with a strong focus on Alerting automation . Experience in hybrid SaaS environments spanning cloud-native and global infrastructure. Strong background in Kubernetes, Infrastructure-as-Code (Terraform), Golang, AWS/GCP, and networking observability . Proven track record of eliminating toil and improving operational efficiency through automation. Passion for deep observability, networking-scale analytics, and automation at the edge .If you love solving reliability challenges at global scale, automating everything, and working in a hybrid cloud + networking environment , we want to talk to you!The About The Role is intended to be a general representation of the responsibilities and requirements of the job. However, the description may not be all-inclusive, and responsibilities and requirements are subject to change. Must-Have: Observability & Alerting Expertise Strong experience with Logs, Metrics, and Alerts , with a focus on high-fidelity alerting and automation . Automation & Infrastructure as Code Deep knowledge of Terraform, ArgoCD, Helm, Kubernetes, and Golang for automation . Cloud & Hybrid SaaS Experience Hands-on experience managing cloud-native (AWS/GCP) and edge infrastructure . Incident Response & Reliability Engineering Strong on-call experience , with a track record of reducing MTTR through automation Kubernetes Mastery Hands-on experience deploying, managing, and troubleshooting Kubernetes in production environments. Nice-to-Have: Networking & Edge Observability Familiarity with monitoring routers, switches, and firewalls in a global PoP environment . Data & Analytics in Observability Experience with time-series databases (Prometheus, Grafana, OpenTelemetry, etc.) . Security & Compliance Awareness Understanding of secure-by-design principles for monitoring & alerting . Mentorship & Collaboration Ability to mentor junior engineers and work cross-functionally with SREs, application teams, and network engineers . High Availability Disaster Recovery Experience with HA/DR and Migration Qualifications Typically, it requires at least 18 years of related experience with a bachelors degree, 15 years and a masters degree, or a PhD with 12 years experience; or equivalent experience. Excellent organizational agility and communication skills throughout the organization. Environment Empowered Work Culture: Experience an environment that values autonomy, fostering a culture where creativity and ownership are encouraged. Continuous Learning: Benefit from the mentorship of experienced professionals with solid backgrounds across diverse domains, supporting your professional growth. Team Cohesion: Join a collaborative and supportive team where youll feel at home from day one, contributing to a positive and inspiring workplace. F5 Networks, Inc. is an equal opportunity employer and strongly supports diversity in the workplace.

Posted 2 weeks ago

Apply

Sr. Principal Site Reliability Engineer F5

7.0 - 11.0 years

17 - 22 Lacs

Nizamabad

Work from Office

Position Summary F5 Inc. is actively seeking an exceptional Sr Principal Software Engineer (Individual Contributor) to play a pivotal role in our SRE Operations team for the groundbreaking F5XC Product. Are you an SRE Operations specialist with automation in your DNA? Do you thrive in fast-paced SaaS environments where Why This Role is Unique: Our SaaS is hybrid running across public cloud and a global network of 50+ PoPs , delivering terabits of capacity . Our infrastructure spans cloud-native services and physical networking gear (routers, switches, firewalls), creating a uniquely challenging and exciting observability landscape. The Analytics & Observability platform will have deep reach across these layers , ensuring reliability, security, and performance at a massive scale. What Youll Do: Be the Force Behind Observability & Stability Drive end-to-end Observability (Logs, Metrics, and Alerts) across our hybrid SaaS stack , spanning cloud, edge, and physical network devices. Take ownership of Alerting strategy , cutting through noise while ensuring actionable, high-fidelity alerts. Implement intelligent automation to reduce operational toil and enhance real-time visibility. Own & Automate Operations Design, build, and manage automation for self-healing infrastructure across cloud + global PoPs. Develop automation for Kubernetes, ArgoCD, Helm Charts, Golang-based services, AWS, GCP, Terraform . Improve networking observability , ensuring our routers, switches, and firewalls are monitored at scale. Continuously eliminate manual ops work through automation and platform improvements. Lead Incident Response & Operational Excellence Participate in on-call rotations , ensuring rapid incident response across our cloud + edge stack. Drive incident response automation , reducing MTTR and increasing system resilience . Ensure security, compliance, and best practices in observability & automation . Collaborate & Mentor Work closely with application teams, network engineers, and SREs to improve reliability and performance. Mentor junior engineers, fostering a culture of automation-first thinking and deep observability . What Makes You a Great Fit? Deep expertise in Logs, Metrics, and Alerting, with a strong focus on Alerting automation . Experience in hybrid SaaS environments spanning cloud-native and global infrastructure. Strong background in Kubernetes, Infrastructure-as-Code (Terraform), Golang, AWS/GCP, and networking observability . Proven track record of eliminating toil and improving operational efficiency through automation. Passion for deep observability, networking-scale analytics, and automation at the edge .If you love solving reliability challenges at global scale, automating everything, and working in a hybrid cloud + networking environment , we want to talk to you!The About The Role is intended to be a general representation of the responsibilities and requirements of the job. However, the description may not be all-inclusive, and responsibilities and requirements are subject to change. Must-Have: Observability & Alerting Expertise Strong experience with Logs, Metrics, and Alerts , with a focus on high-fidelity alerting and automation . Automation & Infrastructure as Code Deep knowledge of Terraform, ArgoCD, Helm, Kubernetes, and Golang for automation . Cloud & Hybrid SaaS Experience Hands-on experience managing cloud-native (AWS/GCP) and edge infrastructure . Incident Response & Reliability Engineering Strong on-call experience , with a track record of reducing MTTR through automation Kubernetes Mastery Hands-on experience deploying, managing, and troubleshooting Kubernetes in production environments. Nice-to-Have: Networking & Edge Observability Familiarity with monitoring routers, switches, and firewalls in a global PoP environment . Data & Analytics in Observability Experience with time-series databases (Prometheus, Grafana, OpenTelemetry, etc.) . Security & Compliance Awareness Understanding of secure-by-design principles for monitoring & alerting . Mentorship & Collaboration Ability to mentor junior engineers and work cross-functionally with SREs, application teams, and network engineers . High Availability Disaster Recovery Experience with HA/DR and Migration Qualifications Typically, it requires at least 18 years of related experience with a bachelors degree, 15 years and a masters degree, or a PhD with 12 years experience; or equivalent experience. Excellent organizational agility and communication skills throughout the organization. Environment Empowered Work Culture: Experience an environment that values autonomy, fostering a culture where creativity and ownership are encouraged. Continuous Learning: Benefit from the mentorship of experienced professionals with solid backgrounds across diverse domains, supporting your professional growth. Team Cohesion: Join a collaborative and supportive team where youll feel at home from day one, contributing to a positive and inspiring workplace. F5 Networks, Inc. is an equal opportunity employer and strongly supports diversity in the workplace.

Posted 2 weeks ago

Apply

Sr. Principal Site Reliability Engineer F5

7.0 - 11.0 years

17 - 22 Lacs

Mumbai

Work from Office

Position Summary F5 Inc. is actively seeking an exceptional Sr Principal Software Engineer (Individual Contributor) to play a pivotal role in our SRE Operations team for the groundbreaking F5XC Product. Are you an SRE Operations specialist with automation in your DNA? Do you thrive in fast-paced SaaS environments where Why This Role is Unique: Our SaaS is hybrid running across public cloud and a global network of 50+ PoPs , delivering terabits of capacity . Our infrastructure spans cloud-native services and physical networking gear (routers, switches, firewalls), creating a uniquely challenging and exciting observability landscape. The Analytics & Observability platform will have deep reach across these layers , ensuring reliability, security, and performance at a massive scale. What Youll Do: Be the Force Behind Observability & Stability Drive end-to-end Observability (Logs, Metrics, and Alerts) across our hybrid SaaS stack , spanning cloud, edge, and physical network devices. Take ownership of Alerting strategy , cutting through noise while ensuring actionable, high-fidelity alerts. Implement intelligent automation to reduce operational toil and enhance real-time visibility. Own & Automate Operations Design, build, and manage automation for self-healing infrastructure across cloud + global PoPs. Develop automation for Kubernetes, ArgoCD, Helm Charts, Golang-based services, AWS, GCP, Terraform . Improve networking observability , ensuring our routers, switches, and firewalls are monitored at scale. Continuously eliminate manual ops work through automation and platform improvements. Lead Incident Response & Operational Excellence Participate in on-call rotations , ensuring rapid incident response across our cloud + edge stack. Drive incident response automation , reducing MTTR and increasing system resilience . Ensure security, compliance, and best practices in observability & automation . Collaborate & Mentor Work closely with application teams, network engineers, and SREs to improve reliability and performance. Mentor junior engineers, fostering a culture of automation-first thinking and deep observability . What Makes You a Great Fit? Deep expertise in Logs, Metrics, and Alerting, with a strong focus on Alerting automation . Experience in hybrid SaaS environments spanning cloud-native and global infrastructure. Strong background in Kubernetes, Infrastructure-as-Code (Terraform), Golang, AWS/GCP, and networking observability . Proven track record of eliminating toil and improving operational efficiency through automation. Passion for deep observability, networking-scale analytics, and automation at the edge .If you love solving reliability challenges at global scale, automating everything, and working in a hybrid cloud + networking environment , we want to talk to you!The About The Role is intended to be a general representation of the responsibilities and requirements of the job. However, the description may not be all-inclusive, and responsibilities and requirements are subject to change. Must-Have: Observability & Alerting Expertise Strong experience with Logs, Metrics, and Alerts , with a focus on high-fidelity alerting and automation . Automation & Infrastructure as Code Deep knowledge of Terraform, ArgoCD, Helm, Kubernetes, and Golang for automation . Cloud & Hybrid SaaS Experience Hands-on experience managing cloud-native (AWS/GCP) and edge infrastructure . Incident Response & Reliability Engineering Strong on-call experience , with a track record of reducing MTTR through automation Kubernetes Mastery Hands-on experience deploying, managing, and troubleshooting Kubernetes in production environments. Nice-to-Have: Networking & Edge Observability Familiarity with monitoring routers, switches, and firewalls in a global PoP environment . Data & Analytics in Observability Experience with time-series databases (Prometheus, Grafana, OpenTelemetry, etc.) . Security & Compliance Awareness Understanding of secure-by-design principles for monitoring & alerting . Mentorship & Collaboration Ability to mentor junior engineers and work cross-functionally with SREs, application teams, and network engineers . High Availability Disaster Recovery Experience with HA/DR and Migration Qualifications Typically, it requires at least 18 years of related experience with a bachelors degree, 15 years and a masters degree, or a PhD with 12 years experience; or equivalent experience. Excellent organizational agility and communication skills throughout the organization. Environment Empowered Work Culture: Experience an environment that values autonomy, fostering a culture where creativity and ownership are encouraged. Continuous Learning: Benefit from the mentorship of experienced professionals with solid backgrounds across diverse domains, supporting your professional growth. Team Cohesion: Join a collaborative and supportive team where youll feel at home from day one, contributing to a positive and inspiring workplace. F5 Networks, Inc. is an equal opportunity employer and strongly supports diversity in the workplace.

Posted 2 weeks ago

Apply

Login to

Please Verify Your Phone or Email

Confirm Action

Search

Profile

Upskill and Grow with AI

1154 Prometheus Jobs - Page 12

Job Alert

Start Your Job Search Today

Please Verify Your Phone or Email

Job Application AI Bot

Download the Mobile App

Setup Job Alerts

Featured Companies

Before You Leave... Find Your Perfect Job!

Login to

Please Verify Your Phone or Email

Confirm Action

Contact Us

Search

Profile

Upskill and Grow with AI

Personal Settings

1154 Prometheus Jobs - Page 12

Job Alert

Upload Resume

AI Job Matching Summary

Pros

Cons

Summary

Start Your Job Search Today

Please Verify Your Phone or Email

Job Application AI Bot

Download the Mobile App

Setup Job Alerts

Featured Companies