Get alerts for new jobs matching your selected skills, preferred locations, and experience range. Manage Job Alerts
6.0 - 11.0 years
20 - 25 Lacs
Pune
Work from Office
PTC is looking for hands-on engineer, experienced with site reliability and operations , for a leading CAD SaaS solution. As part of your job at PTC, you will: Collaborate with multiple teams, to monitor & observe their cloud-deployed services Implement automated pipelines for deployment into cloud environment Implement monitoring & observability solutions Handle incidents and changes Troubleshoot and resolve production issues Conduct post-mortems Handle security incidents Job requirements: Proven experience working in Cloud DevOps and Site Reliability Engineering Ability to develop observability solutions using DataDog, or ELK, Prometheus and Grafana Great communication skills, written and verbal Strong hands-on skills to support Security in Cloud environment Experience and knowledge in cloud architecture reviews, SaaS processes and handling security incidences Advantage - knowledge and experience with Azure
Posted 1 month ago
2.0 - 5.0 years
3 - 7 Lacs
Bhubaneswar, Mumbai, Gurugram
Work from Office
Job Title: Infra Tech Support Practitioner Locations: Gurgaon, Mumbai, Bengaluru, Bhubaneswar Experience: 27+ Years Education: 15 Years Full-Time Job Summary Seeking an Infra Tech Support Practitioner to manage L1 and L2 technical support, monitor infrastructure health, and ensure system stability across enterprise environments. Key Responsibilities Provide remote/on-site support for systems and servicesTroubleshoot L1 & L2 network/server issuesMonitor performance and resolve infrastructure bottlenecksCollaborate with teams and contribute to process improvementsEnsure compliance with service models and SLAs. Required Skills Expertise in Network Infrastructure and Site Reliability Engineering Strong understanding of network protocols, server architecture, and system monitoring Familiarity with cloud platforms and virtualization technologies.
Posted 1 month ago
2.0 - 7.0 years
4 - 9 Lacs
Hyderabad
Work from Office
Job Requirements Phenom People is looking for an experienced and motivated Product Manager to join our Product team in Hyderabad, Telangana, India. This is a full-time position. The Associate Product Manager or the Product Manager will be responsible for developing and managing the product roadmap, working with stakeholders to define product requirements, and managing the product life cycle. The ideal candidate will have a strong technical background and experience in product management. Responsibilities: - Develop and manage the product roadmap - Work with stakeholders to define product requirements - Manage the product life cycle - Monitor product performance and customer feedback - Identify and prioritize product features - Develop product pricing and positioning strategies - Create product marketing plans - Develop product launch plans - Analyze market trends and customer needs - Collaborate with engineering, design, and marketing teams Requirements: Must-Have: 2+ years of product management experience with at least 2 years in a technical or observability-related role. Strong understanding of APM concepts: distributed tracing, metrics aggregation, anomaly detection, alerting, root cause analysis. Familiarity with modern observability stacks: OpenTelemetry, Prometheus, Grafana, Jaeger, Zipkin, ELK/EFK, Datadog, New Relic, AppDynamics, etc. Exposure to cloud-native infrastructure: containers, Kubernetes, microservices architecture. Experience working with engineers on deeply technical systems and scalable backend architecture. Proficiency in creating technically detailed user stories and acceptance criteria. Strong problem-solving and analytical skills, with a bias for action and customer empathy. Nice-to-Have: Background in software engineering, DevOps, or site reliability engineering. Experience in building Technical products Understanding of telemetry pipelines, sampling strategies, and correlation between MELT signals. Familiarity with SLIs/SLOs, service maps, and incident response workflows. Knowledge of integration with CI/CD, synthetic monitoring, or real-user monitoring (RUM). We prefer candidates with these experiences Experience in product management - worked as PO or PM in a SaaS product organization Experience working on integrations, APIs etc.,\uFEFF Experience collaborating with customers and internal business partners Experience working with distributed / international teams Experience with JIRA or equivalent product development management tools Minimum Qualifications 1 to 3 years of experience in product management - as a Product Manager or Product owner or Associate Product Manager Experience in HR Tech industry is a plus but not mandatory Bachelor\u2019s degree or equivalent years of experience. MBA is highly desirable. Benefits Competitive salary for a startup Gain experience rapidly Work directly with executive team Fast-paced work environment \uFEFF#LI-JG1
Posted 1 month ago
6.0 - 11.0 years
8 - 13 Lacs
Pune
Work from Office
PTC is looking for hands-on engineer, experienced with site reliability and operations , for a leading CAD SaaS solution. As part of your job at PTC, you will: Collaborate with multiple teams, to monitor & observe their cloud-deployed services Implement automated pipelines for deployment into cloud environment Implement monitoring & observability solutions Handle incidents and changes Troubleshoot and resolve production issues Conduct post-mortems Handle security incidents Job requirements: Proven experience working in Cloud DevOps and Site Reliability Engineering Ability to develop observability solutions using DataDog, or ELK, Prometheus and Grafana Great communication skills, written and verbal Strong hands-on skills to support Security in Cloud environment Experience and knowledge in cloud architecture reviews, SaaS processes and handling security incidences Advantage - knowledge and experience with Azure Why PTC? Life at PTC is about more than working with today s most cutting-edge technologies to transform the physical world. It s about showing up as you are and working alongside some of today s most talented industry leaders to transform the world around you. If you share our passion for problem-solving through innovation, you ll likely become just as passionate about the PTC experience as we are. Are you ready to explore your next career move with us? Website: https://www.ptc.com LinkedIn: https: / / www.linkedin.com / company / ptcinc / Facebook Page: https: / / www.facebook.com / ptc.inc / Twitter Handle: @LifeatPTC @PTC Instagram: ptc_inc Hashtag: #lifeatPTC Life at PTC is about more than working with today s most cutting-edge technologies to transform the physical world. It s about showing up as you are and working alongside some of today s most talented industry leaders to transform the world around you. If you share our passion for problem-solving through innovation, you ll likely become just as passionate about the PTC experience as we are. Are you ready to explore your next career move with us? We respect the privacy rights of individuals and are committed to handling Personal Information responsibly and in accordance with all applicable privacy and data protection laws. Review our Privacy Policy here ."
Posted 1 month ago
10.0 - 15.0 years
30 - 45 Lacs
Bengaluru
Work from Office
Staff Reliability Engineer Our Mission SPAN is enabling electrification for all We are a mission-driven company designing, building, and deploying products that electrify the built environment, reduce carbon emissions, and slow the effects of climate change. Decarbonization is the process to reduce or remove greenhouse gas emissions, especially carbon dioxide, from entering our atmosphere. Electrification is the process of replacing fossil fuel appliances that run on gas or oil with all-electric upgrades for a cleaner way to power our lives. At SPAN, we believe in: Enabling homes and vehicles powered by clean energy Making electrification upgrades possible Building more resilient homes with reliable backup Designing a flexible and distributed electrical grid The Role We are seeking a Staff Reliability Engineer to join our SPAN engineering team. In this vital role, you will leverage your expertise to ensure the reliability and performance of our products through systematic analysis and testing. You will collaborate closely with cross-functional teams to design and implement reliability engineering processes, contributing to the overall quality and resilience of our energy management systems. Responsibilities: Review new product BOMs to ensure all electrical and mechanical component datasheet ratings meet or exceed the product operating conditions. Write test and reliability reports to summarize product performance outcomes as a result of reliability testing, pre-certification testing, and product performance testing. Inform product design through early component HALT and Sherlock simulations with the goal to mitigate reliability risks in final product. Develop and implement reliability testing plans, methodologies, and metrics to assess product performance and durability over time. Conduct Failure Mode Effects Analysis (FMEA) and root cause analysis to identify potential design issues and ensure corrective actions are taken. Collaborate with design engineers to evaluate product reliability through simulated environments and real-world data, providing feedback for design improvements. Participate in product development reviews and provide expertise on reliability criteria and best practices. Lead Failure Analyses (e.g. 8D) on issues discovered during testing in order to inform product design with recommended changes and improvements to the product hardware. Create and maintain reliability documentation, including reliability reports, plans, and data analyses, to support continuous improvement efforts and inform product reliability risks to internal support services. Code-driven analysis of publicly available datasets and secure fleet-level datasets to evaluate the covariance of weather conditions, home power usage, solar loading, and load distributions in order to define and test product mission profiles. Provide training and mentorship to junior engineers on reliability principles and practices. Work closely with manufacturing and quality assurance teams to ensure product consistency and reliability throughout the lifecycle. Potential Projects: Evaluating the reliability of new energy-efficient systems and components during development and production. Statistically analyze field performance data to identify trends and opportunities for product improvement. Collaborating with external partners on reliability testing and validation of new technologies. About You Required Qualifications You'd be a great fit for this role if you: Hold a Bachelors or Masters in Computer Science, Mechanical Engineering, Electrical Engineering, Robotics and Controls Engineering, or a related field. Have 9+ years of experience in reliability engineering, preferably in hardware products. Understand material degradation mechanism in response to environmental and operating stresses, know how to design the experiments to get related acceleration parameters for reliability lifetime modelling Demonstrate proficiency in reliability analysis tools and methodologies, including FMEA, 8D report and fault tree analysis. Exhibit strong analytical and problem-solving skills, with the ability to manage multiple projects effectively. Possess excellent communication and interpersonal skills to collaborate with cross-functional teams. Have flexibility and willingness to accommodate meetings with SPAN US-based colleagues Knowledge of statistical analysis tools and software for reliability data analysis. Bonus Qualifications Wed love it if you also have: Familiarity with industry standards and practices for reliability, such as MIL-STD, ISO 9001, UL, ANSI, ASTM standards or similar. Experience in high and low voltage electro-mechanical systems and their reliability evaluation. Python experience in automated testing, live data acquisition, live data analysis, and automating system alerts Life at SPAN Our Bengaluru team plays a pivotal role in SPANs continued growth and expansion. Together, were driving engineering , product development , and operational excellence to shape the future of home energy solutions. As part of our team in India, youll have the opportunity to collaborate closely with our teams in the US and across the globe. This international collaboration fosters innovation, learning, and growth, while helping us achieve our bold mission of electrifying homes and advancing clean energy solutions worldwide. Our in-office culture offers the chance for dynamic interactions and hands-on teamwork, making SPAN a truly collaborative environment where every team member’s contribution matters. Our climate-focused culture is driven by a team of forward-thinkers, engineers, and problem-solvers who push boundaries every day. Do mission-driven work: Every role at SPAN directly advances clean energy adoption. Bring powerful ideas to life: We encourage diverse ideas and perspectives to drive stronger products. Nurture an innovation-first mindset: We encourage big thinking and bold action. Deliver exceptional customer value: We value hard work, and the ability to deliver exceptional customer value. Benefits at SPAN India Generous paid leave Comprehensive Insurance & Health Benefits Centrally located office in Bengaluru with easy access to public transit, dining, and city amenities Interested in joining our team? Apply today and we’ll be in touch with the next steps!
Posted 1 month ago
8.0 - 13.0 years
30 - 35 Lacs
Hyderabad
Work from Office
Job Description We are seeking a proactive and skilled Manager DevOps to design, implement, and maintain robust CI/CD pipelines and cloud infrastructure on Microsoft Azure. This role is essential to support the scalable, secure, and reliable deployment of AI services, including those built on platforms like Azure Databricks. You will collaborate closely with engineering, data science, and cloud teams to drive automation, performance, and system resilience. Key Responsibilities Build and manage scalable, repeatable infrastructure using Terraform (IaC) across environments. Design cloud environments for performance, availability, and cost optimization using Azure best practices. Deploy infrastructure to support AI services and data pipelines on Azure and Databricks. Design, implement, and maintain CI/CD pipelines for microservices and AI workflows. Automate deployment processes using tools like Azure DevOps, GitHub Actions, or equivalent. Integrate quality gates, security checks, and automated testing into pipelines. Package and deploy containerized applications using Docker and Azure Container Apps. Manage images in Azure Container Registry (ACR) and orchestrate secure image lifecycle management. Leverage Azure Key Vault, Blob Storage, and Service Bus for secure and scalable operations Implement and manage Azure Monitor for logging, telemetry, and proactive alerting. Ensure system reliability through effective monitoring, logging, and incident response setups .Maintain compliance and governance policies across environments. Qualifications 8+ years of hands-on experience in DevOps, Site Reliability Engineering, or Cloud Infrastructure roles. Strong expertise in Azure cloud services, Terraform, and Docker.
Posted 1 month ago
2.0 - 7.0 years
18 - 20 Lacs
Kolkata, Mumbai, New Delhi
Work from Office
Solve complex problems related to infrastructure cloud services and build automation to prevent problem recurrence. Design, write, and deploy software to improve the availability, scalability, and efficiency of Oracle products and services. Design and develop designs, architectures, standards, and methods for large-scale distributed systems. Facilitate service capacity planning and demand forecasting, software performance analysis, and system tuning. Work with Site Reliability Engineering (SRE) team on the shared full stack ownership of a collection of services and/or technology areas. Understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of production services. Responsible for the design and delivery of the mission critical stack, with focus on security, resiliency, scale, and performance. Authority for end-to-end performance and operability. Partner with development teams in defining and implementing improvements in service architecture. Articulate technical characteristics of services and technology areas and guide Development Teams to engineer and add premier capabilities to the Oracle Cloud service portfolio. Understand and communicate the scale, capacity, security, performance attributes, and requirements of the service and technology stack. Demonstrate clear understanding of automation and orchestration principles. Act as ultimate escalation point for complex or critical issues that have not yet been documented as Standard Operating Procedures (SOPs). Utilize a deep understanding of service topology and their dependencies required to troubleshoot issues and define mitigations. Understand and explain the affect of product architecture decisions on distributed systems. Professional curiosity and a desire to a develop deep understanding of services and technologies.
Posted 1 month ago
8.0 - 13.0 years
10 - 14 Lacs
Hyderabad
Work from Office
We are seeking a proactive and skilled Manager DevOps to design, implement, and maintain robust CI/CD pipelines and cloud infrastructure on Microsoft Azure. This role is essential to support the scalable, secure, and reliable deployment of AI services, including those built on platforms like Azure Databricks. You will collaborate closely with engineering, data science, and cloud teams to drive automation, performance, and system resilience. Key Responsibilities Build and manage scalable, repeatable infrastructure using Terraform (IaC) across environments. Design cloud environments for performance, availability, and cost optimization using Azure best practices. Deploy infrastructure to support AI services and data pipelines on Azure and Databricks. Design, implement, and maintain CI/CD pipelines for microservices and AI workflows. Automate deployment processes using tools like Azure DevOps, GitHub Actions, or equivalent. Integrate quality gates, security checks, and automated testing into pipelines. Package and deploy containerized applications using Docker and Azure Container Apps. Manage images in Azure Container Registry (ACR) and orchestrate secure image lifecycle management. Leverage Azure Key Vault, Blob Storage, and Service Bus for secure and scalable operations Implement and manage Azure Monitor for logging, telemetry, and proactive alerting. Ensure system reliability through effective monitoring, logging, and incident response setups .Maintain compliance and governance policies across environments. 8+ years of hands-on experience in DevOps, Site Reliability Engineering, or Cloud Infrastructure roles. Strong expertise in Azure cloud services, Terraform, and Docker. Solid understanding of
Posted 1 month ago
6.0 - 11.0 years
7 - 15 Lacs
Navi Mumbai
Work from Office
Position Operation Engineer – Reliability Location: Taloja – Navi Mumbai Qualification: BE/B.Tech – Mechanical Engineering Experience: 6-12Years Job Description: Engineer - Reliability role is part of the Plant Maintenance dept. He/ she will be responsible for ensuring the Reliability of Plant Assets & the flawless planning & execution of multi-disciplinary Projects to achieve Time, Quality & Cost Standards. He/ she will be required to be vigilant for the Safety requirements, ensuring Reliable management of the equipment including TPM methodology & implementation as per Company’s defined Planner Role requirements, in depth analysis of failures, suggest improvements to enhance the equipment operating life, implementation of the best practices, TPM, etc. to enhance equipment life; also involves developing and implementing reliability improvement initiatives to enhance system performance, reduce downtime, and ensure the reliability of equipment and processes. He/ She will be responsible for managing engineering projects from concept through completion, ensuring they meet technical, financial, and scheduling requirements; Project scope & technicalities and cash flow, co-ordination with respective departments, timely updating, forecast & Projects reporting. Reliability: He/ She will be responsible to support Maintenance & other teams & supervise Maintenance Work request details, Work/Material/Man planning, work order approval & execution overview in line with Company’s defined Planner Role and more with Taloja requirements; TPM practice & implementation. Overview PM plan, work orders, execution and completion details entered in SAP, maintain equipment maintenance history tracking tool, on-time 3rd party services execution, ensure reliability tools calibration & maintenance. Overview & manage PM pillar KPI’s like MTTR, MTBF, Costs, etc. Maintain CBM program integrity by ensuring equipment list selection based on ECR, monitoring parameters, data acquisition frequency, information retrieval from data, assist in decision making strategy, outsourced activities report introspection & cascade the recommendation to concerned departments. Carry out Walk-by inspection, facilitate & ensure activity sustainability. Ensure new PdM technologies deployment through proper training & expertise development. Ensure proper tracking of spares and inventory, develop methods & contribute to achieve the spares consumption reduction target, drive effort to identify critical spares & inventory, work towards indigenization & common spares identification for L5-01 & 02 with the teams. Conduct PM Matrix, PFMEA compilation for L5-01 & 02. Prepare Weekly highlights, safety lead indicators, Maintain PM history details in between for Bushing change. Work with team/s to Reduce equipment downtime including repetitive failures (sporadic & chronic failures), enhance reliability by facilitation of cross functional team’s involvement for root cause failure analysis tools, FMEA, explore, learn, & support in new tools & technology deployment. Attitude: Team work, Confident, Cost conscious, Proactive. Drive for Results. Excellent communication skills. Focus on development of self & others. Ability to plan complex tasks & resources, manage multiple priorities or projects at one time in a calm & dignified manner. Ability to influence & Lead with a sense of purpose. Flexible, support to other plants as necessary. Contact Person: Shankar Gurkha Contact Number: 9033440540
Posted 1 month ago
3.0 - 7.0 years
15 - 20 Lacs
Noida, Pune
Work from Office
The duties of a Site Reliability Engineer will be to support and maintain various Cloud Infrastructure Technology Tools in our hosted production/DR environments. He/she will be the subject matter expert for specific tool(s) or monitoring solution(s). Will be responsible for testing, verifying and implementing upgrades, patches and implementations. He/She will also partner with the other service and/or service functions to investigate and/or improve monitoring solutions. May mentor one or more tools team members or provide training to other cross functional teams as required. May motivate, develop, and manage performance of individuals and teams while on shift. May be assigned to produces regular and adhoc management reports in a timely manner. Proficient in Splunk/ELK, and Datadog. Experience with observability tools such as Prometheus/InfluxDB, and Grafana. Possesses strong knowledge of at least one scripting language such as Python, Bash, Powershell or any other relevant languages. Design, develop, and maintain observability tools and infrastructure. Collaborate with other teams to ensure observability best practices are followed. Develop and maintain dashboards and alerts for monitoring system health. Troubleshoot and resolve issues related to observability tools and infrastructure. Bachelors Degree in information systems or Computer Science or related discipline with relevant experience of 5-8 years Proficient in Splunk/ELK, and Datadog. Experience with Enterprise Software Implementations for Large Scale Organizations Exhibit extensive experience about the new technology trends prevalent in the market like SaaS, Cloud, Hosting Services and Application Management Service Monitoring tools like : Grafana, Prometheus, Datadog, Experience in deployment of application & infrastructure clusters within a Public Cloud environment utilizing a Cloud Management Platform Professional and positive with outstanding customer-facing practices Can-do attitude, willing to go the extra mile Consistently follows-up and follows-through on delegated tasks and actions
Posted 1 month ago
8.0 - 13.0 years
25 - 30 Lacs
Kolkata, Mumbai, New Delhi
Work from Office
The network operations center is responsible for repair and network availability on all of OCIs global network. The resources on this team will support the Tier 1 GNOC as an escalation resource, with additional focus on general network availability of the optical network. Required Experience: Infinera line system & optical line protection Cisco line system & optical line protection Arista or other datacenter switch experience ZR tunable optics experience Understanding of fiber routes, entrances, and diversity requirements Understanding of striping, failure modeling, and traffic drain Work with Site Reliability Engineering (SRE) team on the shared full stack ownership of a collection of services and/or technology areas. Understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of production services. Responsible for the design and delivery of the mission critical stack, with focus on security, resiliency, scale, and performance. Authority for end-to-end performance and operability. Partner with development teams in defining and implementing improvements in service architecture. Articulate technical characteristics of services and technology areas and guide Development Teams to engineer and add premier capabilities to the Oracle Cloud service portfolio. Understand and communicate the scale, capacity, security, performance attributes, and requirements of the service and technology stack. Demonstrate clear understanding of automation and orchestration principles. Act as ultimate escalation point for complex or critical issues that have not yet been documented as Standard Operating Procedures (SOPs). Utilize a deep understanding of service topology and their dependencies required to troubleshoot issues and define mitigations. Understand and explain the affect of product architecture decisions on distributed systems. Professional curiosity and a desire to a develop deep understanding of services and technologies.
Posted 1 month ago
6.0 - 10.0 years
15 - 25 Lacs
Gurugram, Bengaluru
Hybrid
What you will be doing The Site Reliability Engineer (SRE) operates and maintains production systems in the cloud. Their primary goal is to make sure the systems are up and running and provide the expected performance. This involves daily operations tasks of monitoring, deployment and incident management as well as strategic tasks like capacity planning, provisioning and continuous improvement of processes. Also, a major part of the role is the design for reliability, scalability, efficiency and the automation of everyday system operations tasks. SREs work closely together with technical support teams, application developers and DevOps engineers both on incident resolution and on long-term evolution of systems. Employees will primarily work on creating Terraform, Shell & Ansible scripts and will be part of Application deployments using Azure Kubernetes service. Employees will work with a cybersecurity client/company. Monitor production systems' health, usage, and performance using dashboards and monitoring tools. Track provisioned resources, infrastructure, and their configuration. Perform regular maintenance activities on databases, services, and infrastructure. Respond to alerts and incidents: investigate, resolve, or dispatch according to SLAs. Respond to emergencies: recover systems and restore services with minimal downtime. Coordinate with customer success and engineering teams on incident resolution. Perform postmortems after major incidents. Change management: perform rollouts, rollbacks, patching and configuration changes. Drive demand forecasting and capacity planning with engineering and customer success teams. Consider projected growth and demand spikes. Provision production resources according to capacity demands. Work with the engineering teams on the design and testing for reliability, scalability, performance, efficiency, and security. Track resource utilization and cost-efficiency of production services. What were BSc/MSc, B. Tech degree in STEM, 3+ years of relevant industry experience. Technical skills: Terraform, Docker Swarm/K8s, Python, Unix/Linux Shell scripting, DevOps, GitHub Actions, Azure Active Directory, Azure monitor & Log Analytics. Experience in integrating Grafana with Prometheus will be an added advantage. Strong verbal and written communication skills. Ability to perform on-call duties. Regards, Kajal Khatri Kajal@beanhr.com
Posted 1 month ago
3.0 - 6.0 years
7 - 10 Lacs
Hyderabad
Work from Office
Kubernetes Engineer Location: Hyderabad, India (On-Site) Company: Turium AI - Pioneering Enterprise AI Solutions About Turium AI Turium AI is at the forefront of building intelligent, enterprise-grade AI solutions. We empower global organizations to transform their business operations through scalable and secure AI systems. As part of our continued growth, were looking for a skilled Kubernetes Engineer to join our DevOps & Infrastructure team in Hyderabad. Role Overview We are seeking a Kubernetes Engineer who will be responsible for building and maintaining scalable, resilient, and secure Kubernetes infrastructure to power our AI platforms and services. You will work closely with ML engineers, software developers, and DevOps to ensure smooth deployment and operation of AI workloads. Key Responsibilities Design, deploy, and maintain Kubernetes clusters on cloud platforms (e.g., AWS EKS, Azure AKS, GCP GKE) Develop and maintain Helm charts, Kubernetes manifests, and custom controllers Automate CI/CD pipelines with tools like GitHub Actions, ArgoCD, or Jenkins Ensure high availability, performance, and security of containerized AI workloads Implement monitoring, alerting, and logging using Prometheus, Grafana, ELK/EFK stack, or Loki Support the ML/AI team with model deployments, auto-scaling, GPU scheduling, and inference infrastructure Troubleshoot infrastructure and deployment issues across staging and production Collaborate with security teams to ensure compliance and secure configurations (RBAC, network policies, secrets management) Required Qualifications 3-6 years of experience in DevOps, Site Reliability Engineering, or Platform Engineering Hands-on experience with Kubernetes in production environments Proficient in Helm, Docker, GitOps, and IaC tools (Terraform, Pulumi preferred) Experience with at least one major cloud platform (AWS, GCP, or Azure) Solid scripting skills in Bash, Python, or Go Strong understanding of networking, service meshes (e.g., Istio), and observability stacks Familiarity with container security best practices Preferred Qualifications Experience supporting AI/ML pipelines and GPU workloads in Kubernetes Knowledge of Kubeflow, MLflow, or other AI/ML orchestration frameworks CNCF certifications (CKA, CKAD) Experience with serverless tools, eBPF, or service mesh optimization MUST: Ability to work hours in Australian Eastern Daylight Time (AEDT)/Australian Eastern Standard Time (AEST) What We Offer Opportunity to work at the cutting edge of AI and enterprise infrastructure Competitive compensation with performance bonuses Work with a highly skilled and collaborative global team Flexible working hours and remote-friendly culture Career growth through exposure to AI, MLOps, and DevSecOps ecosystems Turium Inc. and Xaana Pty Ltd are equal opportunity employers. We celebrate diversity and are committed to creating an inclusive environment for all employees.
Posted 1 month ago
5.0 - 10.0 years
7 - 12 Lacs
Hyderabad
Work from Office
Its fun to work in a company where people truly BELIEVE in what theyre doing! Were committed to bringing passion and customer focus to the business. Job Description This role requires working from our local Hyderabad office 2-3x a week. INTRODUCTION: ABC Fitness is seeking an experienced Senior Test Engineer to ensure the quality and reliability of software products with our industry leading fitness platform, Glofox. Recently acquired by ABC Fitness Solutions (abcfitness.com), a leading provider of software for the fitness industry. ABC Fitness is on a mission to turn fitness visions into seamless reality. Glofox and ABC Fitness combined are even better positioned to boost performance and create a total fitness experience for members of clubs of all sizes whether a multi-location chain, franchise or an independent gym. Our values: Best Life, One Team and Growth Mindset encourage us to come together as a team to achieve great work, live our best life at work and see challenges as opportunities. WHAT YOU LL DO: Work closely with a cross-functional team to establish and evolve a whole team test approach Establish and facilitate quality related team practices such as 3 amigos type sessions, bug bashes, incident learning reviews, testability reviews and operability reviews Actively partake in discussions related to technical decisions Collaborate with your teammates to identify and automate the appropriate tests Collaborate with your teammates to exploratory test and uncover unexpected risks Work with the team to establish and maintain a fast, reliable pipeline that provides valuable feedback on every change as it moves towards production Collaborate with Design, Product and Customer Success to better understand our customers and manage customer incidents effectively WHAT YOU LL NEED: 5 + years of experience in a software testing role (or an industry placement bootcamp could be considered) Experience with exploratory testing Experience in applying different techniques, tools and approaches based on context Experience in testing technically complex systems Strong problem-solving skills Excellent communication and influencing skills Passionately team-oriented and collaborative AND IT S GREAT TO HAVE: Experience in establishing and managing an effective whole team test approach Knowledge of when and when not to use automation in testing Testing in Production Reliability Engineering Quality Engineering Security Testing Performance Testing Continuous Delivery Testing Community Involvement WHAT S IN IT FOR YOU: Purpose led company with a Values focused culture - Best Life, One Team, Growth Mindset Time Off - competitive PTO plans with 15 Earned accrued leave, 12 days Sick leave, and 12 days Casual leave per year 11 Holidays plus 4 Days of Disconnect - once a quarter, we take a collective breather and enjoy a day off together around the globe. #oneteam Group Mediclaim insurance coverage of INR 500,000 for employee + spouse, 2 kids, and parents or parent-in-laws, and including EAP counseling Life Insurance and Personal Accident Insurance Best Life Perk - we are committed to meeting you wherever you are in your fitness journey with a quarterly reimbursement Premium Calm App - enjoy tranquility with a Calm App subscription for you and up to 4 dependents over the age of 16 Support for working women with financial aid towards cr che facility, ensuring a safe and nurturing environment for their little ones while they focus on their careers We re committed to diversity and passion, and encourage you to apply, even if you don t demonstrate all the listed skillsets! ABC S COMMITMENT TO DIVERSITY, EQUALITY, BELONGING AND INCLUSION: ABC is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. We are intentional about creating an environment where employees, our clients and other stakeholders feel valued and inspired to reach their full potential and make authentic connections. We foster a workplace culture that embraces each person s diversity, including the extent to which they are similar or different. ABC leaders believe that an equitable and inclusive culture is not only the right thing to do, it is a business imperative. Read more about our commitment to diversity, equality, belonging and inclusion at abcfitness.com ABOUT ABC: ABC Fitness (abcfitness.com) is the premier provider of software and related services for the fitness industry and has built a reputation for excellence in support for clubs and their members. ABC is the trusted provider to boost performance and create a total fitness experience for over 41 million members of clubs of all sizes whether a multi-location chain, franchise or an independent gym. Founded in 1981, ABC helps over 31,000 gyms and health clubs globally perform better and more profitably offering a comprehensive SaaS club management solution that enables club operators to achieve optimal performance. ABC Fitness is a Thoma Bravo portfolio company, a private equity firm focused on investing in software and technology companies (thomabravo.com). #LI-HYBRID If you like wild growth and working with happy, enthusiastic over-achievers, youll enjoy your career with us!
Posted 1 month ago
3.0 - 4.0 years
5 - 15 Lacs
Bengaluru
Work from Office
As our first Site Reliability Engineer (SRE), youll take ownership of the reliability, observability, and resilience of our systems across development, staging, and production. Youll bring stability to our infrastructure, implement proactive monitoring, lead incident response, optimize costs, and collaborate cross functionally with developers, QA, and security teams. This is a hands-on role with both strategic and tactical responsibilities, ideal for someone who thrives in early-stage environments. Key Responsibilities Monitoring & Observability Define and enforce monitoring standards across services (metrics, logs, traces). Consolidate and manage monitoring tools (Elastic, Sentry, Slack, Azure Monitor, etc.). Build actionable dashboards and configure alerting for RabbitMQ, APIs, databases, and third-party integrations and data pipelines on Databricks. Establish SLIs, SLOs, and error budgets to guide operational priorities. Incident Management & Response Implement on-call rotations and escalation policies. Develop and maintain incident response runbooks and post-incident reviews (RCAs). Reduce MTTR (Mean Time to Recovery) by automating detection and remediation where possible. Infrastructure & Reliability Engineering Own availability and scalability of our services on Microsoft Azure. Optimize performance and memory usage of services like RabbitMQ, APIs, and analytics pipelines running in Databricks Build fault-tolerant systems: retries, backoff, circuit breakers, etc. Collaborate with developers to implement resilience patterns in the codebase. Cost Optimization & Efficiency Track, analyze, and report on cloud infrastructure costs. Configure budgets, alerts, and resource tagging to prevent surprises. Lead right-sizing and cleanup initiatives to remove unused or overprovisioned assets. Security & Compliance Collaboration Work with the security team to maintain infrastructure diagrams and data flow diagrams. Participate in threat modeling and define trust boundaries. Ensure systems and tooling are audit-ready for compliance (e.g., ISO 27001, GDPR, PDPA). Tooling & Automation Build internal tools to improve deployment reliability, diagnostics, and rollback safety. Implement and manage Infrastructure-as-Code using Terraform, Bicep, or similar. Improve CI/CD pipelines for safer and faster releases. Tech Stack You’ll Work With : Cloud:Microsoft Azure (App Services, VMs, Cosmos DB, Monitor, etc.) Monitoring & Logs: ELK, Sentry, Azure Monitor, Prometheus, Grafana Queueing: RabbitMQ,Kafka Languages: Node.js, Python (mostly reading/debugging) Infra as Code: Terraform, Bicep, GitHub Actions Requirements Must-Have 3+ years of experience in DevOps, SRE, or infrastructure engineering roles. Experience managing high-availability systems and debugging production issues under pressure. Proven track record with cloud infrastructure (Azure preferred) and observability tooling. Strong understanding of distributed systems, incident response, and cost management. Comfortable collaborating across functions — including developers, QA, and security. Nice-to-Have Experience with compliance/regulatory frameworks (ISO 27001, GDPR, etc.). Familiarity with customer engagement or loyalty platforms. Contributions to infra/tooling culture in an early-stage startup. What You’ll Get The opportunity to shape the reliability strategy of a fast-growing product from the ground up. A strong voice in infra design, tooling choices, and culture. A globally distributed, high-caliber team that’s customer-obsessed and product-driven.
Posted 1 month ago
3.0 - 6.0 years
6 - 10 Lacs
Hyderabad
Work from Office
Compute SRE to join our team and ensure our compute infrastructure's reliability, performance, and scalability You will work on building and maintaining highly available systems that power our applications and services. Required Candidate profile 3+ years of experience in systems engineering or operations focus on SRE principles operating systems -Linux, Windows,Storage and Back Up systems, container orchestration platforms -Kubernetes, Docker
Posted 1 month ago
18.0 - 23.0 years
18 - 22 Lacs
Dahej
Work from Office
Job Purpose Provides input to a Risk Management Plan that will anticipate reliability-related, and non-reliability-related risks that could adversely impact plant operation. To Provides technical support to production, maintenance management and technical personnel PRINCIPAL ACCOUNTABILITIES Observes conduct of tests at supplier, plant or field locations to evaluate reliability factors such as cause of failure Works with Project Engineering to ensure the reliability and maintainability of new and modified installations. The Reliability Engineer is responsible for adhering to the Life Cycle Asset Management (LCAM) process throughout the entire life cycle of new assets. Participates in the development of design and installation specifications along with commissioning plans. Participates in the development of criteria for and evaluation of equipment and technical MRO suppliers and technical maintenance service providers. Develops acceptance tests and inspection criteria. Participates in the final check out of new installations. This includes factory and site acceptance testing that will assure adherence to functional specifications. Guides efforts to ensure reliability and maintainability of equipment, processes, utilities, facilities, controls, and safety/security systems. Professionally and systematically defines, designs, develops, monitors and refines an Asset Maintenance Plan that includes: Value-added preventive maintenance tasks Effective utilization of predictive and other non-destructive testing methodologies designed to identify and isolate inherent reliability problems. Provides input to a Risk Management Plan that will anticipate reliability-related, and non-reliability-related risks that could adversely impact plant operation. Develops engineering solutions to repetitive failures and all other problems that adversely affect plant operations. These problems include capacity, quality, cost or regulatory compliance issues Provides technical support to production, maintenance management and technical personnel. Applies value analysis to repair/replace, repair/redesign, and make/buy decisions. Ensure all activities under his/her control and supervision are compliant with all the laws of land and statutory requirement. Conduct all operation of function, ensuring social responsibility and accountabilities by following the company guideline of the same. Be responsible for ethical operation under his control. Be responsible for prevention, detection and reporting of bribery and other forms of corruption including breach of code of conduct and other company regulation. Avoid all such activity that could lead to or imply breach of code of conduct, anti-bribery and anti-corruption etc. Skills and academic qualifications Educational Qualifications Minimum Qualification - BE/B Tech in Mechanical Engineering Preferred Qualification - M.Tech in Reliability Engineering or equivalent Relevant and total experience Total Number of experience required - 18 Relevant experience required in - 10 years in Reliability
Posted 1 month ago
4.0 - 9.0 years
12 - 16 Lacs
Bengaluru
Work from Office
At Boeing, we innovate and collaborate to make the world a better place. We’re committed to fostering an environment for every teammate that’s welcoming, respectful and inclusive, with great opportunity for professional growth. Find your future with us. Overview As a leading global aerospace company, Boeing develops, manufactures and services commercial airplanes, defense products and space systems for customers in more than 150 countries. As a top U.S. exporter, the company leverages the talents of a global supplier base to advance economic opportunity, sustainability and community impact. Boeing’s team is committed to innovating for the future, leading with sustainability, and cultivating a culture based on the company’s core values of safety, quality and integrity. Technology for today and tomorrow The Boeing India Engineering & Technology Center (BIETC) is a 5500+ engineering workforce that contributes to global aerospace growth. Our engineers deliver cutting-edge R&D, innovation, and high-quality engineering work in global markets, and leverage new-age technologies such as AI/ML, IIoT, Cloud, Model-Based Engineering, and Additive Manufacturing, shaping the future of aerospace. People-driven culture At Boeing, we believe creativity and innovation thrives when every employee is trusted, empowered, and has the flexibility to choose, grow, learn, and explore. We offer variable arrangements depending upon business and customer needs, and professional pursuits that offer greater flexibility in the way our people work. We also believe that collaboration, frequent team engagements, and face-to-face meetings bring together different perspectives and thoughts – enabling every voice to be heard and every perspective to be respected. No matter where or how our teammates work, we are committed to positively shaping people’s careers and being thoughtful about employee wellbeing. Position Overview Boeing Test and Evaluation team is currently looking for one Associate Reliability Engineer to join their team in Bengaluru, KA. Test & Evaluation engineers at Boeing make sure that products at the world’s largest aerospace company continue to meet the highest standards. From quality and reliability to safety and performance, their expertise is vital to the concept, design and certifications of a wide variety of commercial and military systems. Boeing Test and Evaluation (BT&E) -India is an integral part of BT&E and is engaged in supporting lab and flight test for various programs. Position Responsibilities Reliability Engineer will refine various existing reliability tools as well as create new tools/processes for automating existing work making it more efficient and reliable. Examples include reliability analysis, automation of reliability group assignments, uncertainty analysis tool development, and statistical analysis of assets to determine possible advance warnings for groups of assets that could have suspect reliability. This role will be driving the Reliability Management Board for calibrated inventory to review the reliability reassessment recommendations coming from various sources like analytical tools, reports from the ground, feedback from calibration technicians and so on. This position would apply expertise in statistics and reliability to the field of predictive and preventive maintenance through data analysis. The candidate will also be responsible to perform risk assessments such as FMEA, FTA, leveraging RCM principles for Boeing’s production system. The candidate would analyze maintenance data from CMMS to identify trends in production system performance, life cycle cost modelling. Use results to develop maintenance strategies to optimize uptime, reliability and achieve business goal. Influences use of mathematical tools and processes. Forecasts mathematical needs and capabilities to address business requirements. Designs, codes, tests and maintain mathematical software. This position will also be responsible for coordinating and communicating regularly with experts in Boeing organizations around the world. Employer will not sponsor applicants for employment visa status. Basic Qualifications (Required Skills/Experience) A Bachelor’s degree or higher is required. Experience in DfR methodologies, with a strong focus on statistical and reliability modelling. Experience in reliability analysis of failure data such as Weibull, Exponential, PoF, Monte Carlo simulation. Proficient in Lifecycle data analysis, Cost modelling, Availability / Maintainability Modeling, with strong Reliability Engineering fundamentals Experience and knowledge in reliability management of metrology and measurement systems ensuring high reliability and maintainability. Strong background in reliability engineering methodologiesRCM, FMEA, FTA, and RCA Experience with predictive maintenance and prognostic health management techniques such as RUL estimation, conditioning monitoring alerts (AI and machine learning experience is a plus) Proficient with Reliability & Statistical analysis tools like Minitab, Mathematica, JMP, Tableau, Reliasoft Hands-on coding in Python/R/Matlab would be an added advantage Knowledge of failure modes of mechanical, electromechanical and electronic components Awareness of AS9100 or ISO9001 quality management system and ISO 17025 standard ASQ CRE certification will be added advantage Preferred Qualifications (Desired Skills/Experience) Bachelor’s/ Master’s Degree Typical Education & Experience Education/experience typically acquired through advanced education (e.g. Bachelor) and typically 5 or more years' related work experience or an equivalent combination of education and experience (e.g. Master+4 years' related work experience, 5+ years' related work experience, etc.). Relocation This position does offer relocation based on candidate eligibility within INDIA. Applications for this position will be accepted until Jun. 06, 2025 Export Control This is not an Export Control position. Education Bachelor's Degree or Equivalent Required Relocation This position offers relocation based on candidate eligibility. Visa Sponsorship Employer will not sponsor applicants for employment visa status. Shift Not a Shift Worker (India) Equal Opportunity Employer: We have teams in more than 65 countries, and each person plays a role in helping us become one of the world’s most innovative, diverse and inclusive companies. We are proud members of the Valuable 500 and welcome applications from candidates with disabilities. Applicants are encouraged to share with our recruitment team any accommodations required during the recruitment process. Accommodations may include but are not limited toconducting interviews in accessible locations that accommodate mobility needs, encouraging candidates to bring and use any existing assistive technology such as screen readers and offering flexible interview formats such as virtual or phone interviews.
Posted 1 month ago
2.0 - 6.0 years
4 - 8 Lacs
Bengaluru
Work from Office
The NOC + SRE + DevOps Engineerwill play a crucial role in maintaining and improving the reliability andperformance of our services. This hybrid role encompasses monitoring andsupporting our network operations, ensuring the reliability and scalability ofour infrastructure, and enhancing our CI/CD processes. The ideal candidate willhave a strong background in network operations, site reliability engineering,and DevOps practices with hands-on experience in Linux, AWS, Ansible,Terraform, and OpenStack. Key Responsibilities: Network Operations Center (NOC) Responsibilities: Monitor and manage the companys network infrastructure and servers 24/7. Respond to and troubleshoot network issues, outages, and performancebottlenecks. Ensure high availability and reliability of network services. Maintain and update network documentation and incident logs. Coordinate with stakeholders for issue resolution. Site Reliability Engineering (SRE) Responsibilities: Develop and implement strategies for maintaining system reliability,availability, and performance. Automate repetitive tasks to improve operational efficiency. Monitor system performance and reliability metrics, and proactivelyaddress potential issues. Conduct root cause analysis of incidents and implement preventivemeasures. Collaborate with development teams to ensure applications are designedfor reliability and scalability. DevOps Responsibilities : Manage and optimize CI/CD pipelines for application deployment. Collaborate with development and operations teams to automate andstreamline build, test, and deployment processes. Implement and maintain infrastructure as code (IaC) using tools likeTerraform and Ansible. Ensure security and compliance of the deployment pipeline andinfrastructure. Continuously evaluate and integrate new tools and technologies to improvethe DevOps process. Qualifications : Bachelors degree in Computer Science, Information Technology, or relatedfield (or equivalent experience).5+ years of experience in network operations,site reliability engineering, and/or DevOps roles. Strong experience with Linux system administration. Proficiency in cloud platforms, particularly AWS. Hands-on experience with configuration management tools such as Ansible. Proficiency in infrastructure as code (IaC) tools such as Terraform.
Posted 1 month ago
2.0 - 7.0 years
4 - 9 Lacs
Chennai
Work from Office
Diverse Lynx is looking for SRE to join our dynamic team and embark on a rewarding career journey. Ensuring system reliability : Collaborate with development teams to design and build scalable, reliable, and efficient systems. Monitoring and incident response : Implement and maintain monitoring and alerting systems to detect and respond to issues proactively. Participate in incident management and troubleshooting to minimize downtime and resolve issues quickly. Automation and tooling : Develop and maintain automation tools and frameworks to improve system provisioning, deployment, and maintenance processes. Capacity planning and scalability : Work closely with capacity planning teams to anticipate resource needs and scale systems to handle increasing traffic and workload demands. Performance optimization : Identify and address performance bottlenecks, optimize system components, and improve overall system performance. Security and compliance : Collaborate with security teams to implement and maintain secure systems, perform regular audits, and ensure compliance with relevant regulations and policies. Collaboration and documentation : Work cross-functionally with various teams including developers, operations, and QA to improve system reliability. Document processes, configurations, and troubleshooting guides.
Posted 1 month ago
5.0 - 10.0 years
12 - 22 Lacs
Bengaluru
Work from Office
Monitor and optimize plant operations across refining and LNG units. Develop simulation models, troubleshoot reliability threats, identify optimization opportunities, and ensure performance aligns with Integrity Operating Windows (IOWs). Required Candidate profile Chemical engineer with 3–10 years of experience in refinery/LNG operations. Skilled in process optimization, simulation, IOWs, reliability improvement, and collaboration with ops and business planning
Posted 2 months ago
3.0 - 8.0 years
20 - 35 Lacs
Bengaluru
Work from Office
Lead reliability studies, optimize maintenance strategies, conduct bad actor analysis, perform failure investigations, steward asset KPIs, and support reliability improvement initiatives across refinery and LNG facilities. Required Candidate profile BE/B.Tech in Mechanical, Chemical, or Materials Engineering with 3–10 yrs in reliability engineering, failure analysis, maintenance optimization; experience in refinery, LNG, or petrochemical sectors
Posted 2 months ago
8.0 - 13.0 years
30 - 45 Lacs
Bengaluru
Work from Office
Develop and optimize maintenance strategies, perform safeguard assessments, monitor instrument and electrical system reliability, manage SIL verifications, RIK assessments, and drive continuous improvement across LNG and refinery assets. Required Candidate profile BE/B.Tech in Instrumentation/Electrical/Electronics with 5–15 yrs in I&E reliability, SIL assessment,safeguard testing, SIS lifecycle management; experience in LNG, refinery, or petrochemical industry
Posted 2 months ago
8.0 - 13.0 years
30 - 45 Lacs
Bengaluru
Work from Office
Monitor and diagnose rotating equipment health, perform vibration analysis, optimize maintenance strategies, conduct root cause analysis, support turnaround planning, and enhance machinery reliability in LNG and refinery operations Required Candidate profile 5–15 yrs in rotating equipment reliability, vibration diagnostics, RCA, turnaround support; ISO 18436 certification, System1 and thermal performance analysis experience preferred.
Posted 2 months ago
8.0 - 13.0 years
20 - 35 Lacs
Bengaluru
Work from Office
Manage CMMS master data, asset hierarchies, preventive maintenance plans, user roles, configuration changes, and digital tool support (JDE, Maximo, SAP) to ensure maintenance data quality and asset reliability in refinery and LNG assets. Required Candidate profile 5–15 years’ experience in CMMS systems (JDE mandatory), master data management, and maintenance workflows in complex industrial facilities (LNG, refinery, petrochemical).
Posted 2 months ago
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
Accenture
39581 Jobs | Dublin
Wipro
19070 Jobs | Bengaluru
Accenture in India
14409 Jobs | Dublin 2
EY
14248 Jobs | London
Uplers
10536 Jobs | Ahmedabad
Amazon
10262 Jobs | Seattle,WA
IBM
9120 Jobs | Armonk
Oracle
8925 Jobs | Redwood City
Capgemini
7500 Jobs | Paris,France
Virtusa
7132 Jobs | Southborough