Jobs
Interviews

14 System Reliability Jobs

Setup a job Alert
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

2.0 - 6.0 years

0 Lacs

raichur, karnataka

On-site

The role available is a full-time position for a Projects and Mechanical Maintenance professional within the pharmaceutical industry at Shilpa Pharma Life Sciences Ltd, located in Raichur. As part of the role, you will be responsible for overseeing mechanical maintenance projects, conducting routine inspections, utilizing computer-aided design (CAD) tools for maintenance planning, and ensuring the efficiency and reliability of automotive and mechanical systems within the facility. To excel in this position, you should possess a Bachelor's degree in Mechanical Engineering or a related field, with experience in Mechanical Engineering and Maintenance. Proficiency in Computer-Aided Design (CAD) is essential, along with skills in conducting inspections and ensuring system reliability. A solid understanding of Automotive systems and their maintenance is required. Additionally, you should have excellent problem-solving abilities, attention to detail, strong organizational and time-management skills, and the capability to work both independently and collaboratively as part of a team. While prior experience in the pharmaceutical industry is beneficial, it is not a mandatory requirement. If you are a dedicated professional with a background in Mechanical Engineering and Maintenance, this role offers an exciting opportunity to contribute to the operational success of Shilpa Pharma Life Sciences Ltd.,

Posted 1 week ago

Apply

2.0 - 6.0 years

0 Lacs

lucknow, uttar pradesh

On-site

The Technology Manager position based in Lucknow is a full-time role requiring a minimum of 5 years of experience in IT/Software Development, including at least 2 years in a managerial capacity. As a Technology Manager, you will be responsible for leading the tech team, overseeing project planning and execution, and ensuring the delivery of innovative technology solutions. Your role will involve managing a cross-functional team, possessing a strong technical background, exceptional project management skills, and the ability to mentor team members effectively. Your key responsibilities will include leading and managing the tech team to guarantee timely project delivery, supervising system architecture, software development, and IT infrastructure, collaborating with stakeholders to comprehend technical requirements, ensuring data security, scalability, and system reliability, as well as staying updated with emerging technologies and implementing best practices. To qualify for this position, you should hold a Bachelor's or Master's degree in Computer Science, IT, or a related field, demonstrate proven experience in software development and project management, and exhibit strong leadership and communication skills. This is a full-time position with a day shift during the morning, and the work location is on-site.,

Posted 1 month ago

Apply

7.0 - 11.0 years

0 Lacs

chennai, tamil nadu

On-site

The Tech Lead Quantitative Trading position in Chennai, India requires a candidate with over 7 years of experience to undertake various key responsibilities. You will be responsible for designing and optimizing scalable backend systems using Python and C++, overseeing the deployment of real-time trading algorithms, managing cloud infrastructure, CI/CD pipelines, and API integrations, as well as leading and mentoring a high-performing engineering team. Additionally, you will play a crucial role in laying the foundation for AI-driven trading innovations. Your role demands strong leadership, software architecture expertise, and hands-on problem-solving skills to ensure the seamless execution and scalability of trading systems. As part of your responsibilities, you will lead the end-to-end development of the trading platform, ensuring scalability, security, and high availability. You will also architect and optimize backend infrastructure for real-time algorithmic trading and large-scale data processing, design and implement deployment pipelines and CI/CD workflows for efficient code integration, and introduce best practices for performance tuning, system reliability, and security. In the realm of backend and data engineering, you will own the Python-based backend, work on low-latency system design to support algorithmic trading strategies, optimize storage solutions for handling large-scale financial data, and implement API-driven architectures leveraging WebSocket API and RESTful API knowledge to integrate with brokers, third-party data sources, and trading systems. Furthermore, you will be responsible for monitoring and troubleshooting live trading systems to minimize downtime, handling broker communication during execution issues and API failures, setting up automated monitoring, logging, and alerting for production stability, leading, mentoring, and scaling a distributed engineering team, defining tasks, setting deadlines, and managing workflow using Zoho Projects, aligning team objectives with OKRs, driving execution, fostering a strong engineering culture, ensuring high performance and technical excellence, managing cloud infrastructure to ensure high availability, setting up monitoring, logging, and automated alerting for production stability, overseeing GitLab repositories, enforcing best practices for version control, and implementing robust CI/CD pipelines to accelerate deployment cycles. While preferred qualifications include 7+ years of hands-on experience in backend development with expertise in Python, proven experience leading engineering teams and delivering complex projects, strong knowledge of distributed systems, real-time data processing, and cloud computing, experience with DevOps, CI/CD, and containerized environments, familiarity with GitLab, AWS, and Linux-based cloud infrastructure, and bonus knowledge of quantitative trading, financial markets, or algorithmic trading. The ideal candidate for this position is a backend expert with a passion for building scalable, high-performance systems, enjoys leading teams, mentoring engineers, fostering a strong engineering culture, can balance hands-on coding with high-level architecture and leadership, thrives in a fast-paced, data-driven environment, and loves solving complex technical challenges.,

Posted 1 month ago

Apply

8.0 - 12.0 years

0 Lacs

karnataka

On-site

As a Principal Engineer - Site Reliability Engineering (SRE) within the Digital Business team at Sonyliv, you will play a crucial role in ensuring the availability, scalability, and performance of our cutting-edge OTT platform. With a global user base, we are dedicated to providing seamless, high-quality streaming experiences to our audience. Your primary responsibility will be to design, build, and maintain a robust and scalable infrastructure that supports our OTT platform. Leveraging your extensive SRE experience and developer mindset, you will lead initiatives to enhance system reliability and operational efficiency. You will take full ownership of system operations, ensuring application and infrastructure reliability while demonstrating a strong support mindset to address critical incidents, even outside regular business hours. Additionally, you will collaborate closely with cross-functional teams to align goals and enhance operational excellence. Key responsibilities include managing full system ownership, developing tools and automation to improve reliability, responding to critical system issues promptly, designing and managing infrastructure solutions, driving observability best practices, and continuously improving system reliability and performance. To excel in this role, you should have at least 8 years of experience, a deep understanding of observability, and the ability to lead reliability initiatives across systems and teams. Strong technical proficiency in containers (Docker, Kubernetes), networking concepts, CDNs, infrastructure-as-code tools, cloud platforms, observability solutions, scripting/programming languages, and incident handling is essential. We are looking for a candidate with a passion for system reliability, scalability, and performance optimization, along with excellent communication, collaboration, and leadership skills. Your willingness to participate in a 24x7 on-call rotation and support critical systems during off-hours will be crucial for success in this role. Join us at Sony Pictures Networks to be part of a dynamic team that is shaping the future of entertainment in India. With leading entertainment channels and a promising streaming platform like Sony LIV, we are committed to creating a diverse and inclusive workplace where you can thrive and make a meaningful impact.,

Posted 1 month ago

Apply

8.0 - 12.0 years

0 Lacs

maharashtra

On-site

The Chief Technical Officer (CTO) will be overseeing and managing the operation of Banks network infrastructure, enterprise applications, server infrastructure, storage systems, vendor/telco, data center/cloud infrastructure, and cybersecurity technologies, systems, and applications by reporting to the CEO/Board IT Committee. The desired candidate should have a strong background and hands-on experience in Service Now based IT Operations Management (ITOM) with a specific focus in the banking/fin-tech sector. As a key member of the executive management team of the Bank, the CTO interacts with senior stakeholders and other members of the Bank. **Operations:** - Responsible for monitoring and alerting of IT systems to ensure early detection and correction of problems. - Ensure proper management, maintenance, provisioning, and monitoring of all production environments. - Collaborate with the department of Network & Security, Application Administration to manage change initiatives in a structured manner to ensure systems availability, performance, and reliability. **DevOps:** - Build and maintain continuous integration, testing, monitoring, deployment using configuration management and other essential DevOps tools. - Automate deployment of applications, system configurations, and security settings. - Design, build, and improve tools and technologies that make up Continuous Integration (CD) and Continuous Delivery (CD) pipelines. **Service Management:** - Responsible for ITIL functions of Service Management, Service Mapping, Change Management, Configuration Management, and Problem Management Incident Management, Service Level Management, Capacity Management, Release Management processes. - Work with the IT department and business units to develop, enforce, monitor, and maintain Service Level Agreements (SLAs) with vendors/service providers. - Oversee timely handover of new services from delivery teams to Operations. **System Reliability and Availability:** - Partners with IT/business units to establish appropriate service level objectives for system reliability and availability and implements technology and processes to achieve SLA objectives. **Backup/Disaster Recovery:** - Leads the development and testing of Disaster Recovery/Business Continuity plans and processes. - Manage system maintenance/backups and ensure the integrity of service recovery. **Supplier Management:** - Manages relationships with and the performance of supplier/outsourcing partners to ensure KPIs and SLAs are being achieved. **KPI:** - Tracks and communicates the KPIs of owned processes and present operational metrics to CEO/Board committee on a consistent basis. - Lead and manage the internal teams to ensure all KPIs, SLAs are met or exceeded. **Software Assets and Budget:** - Manages spending within the area of responsibility to adhere to the budget. - Prepare & manage the IT Operations budget. - Verify software assets with license contracts, confirm hardware assets with actual inventory, and initiate corrective actions and track them. - Managing a detailed inventory of software and hardware assets, and applying the information gathered in making the best decision concerning IT-related asset purchases, storage, handling, and redistribution. **Knowledge and Experience:** - Demonstrated experience in designing/applying operating process expertise in settings such as IT Service Management, IT Operations Management, Network Operations Centers, and Data Centre operations. - Experience in ITSM operations and policy development. - The candidate should have experience using tools such as AWS CloudWatch, ServiceNow, etc. *To apply, send your resume to: recruitment@bmcbank.co.in*,

Posted 1 month ago

Apply

3.0 - 7.0 years

0 Lacs

indore, madhya pradesh

On-site

The Senior Associate Process Maintenance (Instrumentation) role involves being responsible for the maintenance, calibration, troubleshooting, and repair of instrumentation and control systems utilized in manufacturing or process industries. You will be expected to perform preventive and corrective maintenance on various instrumentation and control systems such as sensors, transmitters, PLCs, DCS, and SCADA. Additionally, troubleshooting and resolving issues related to these systems will be a key aspect of your responsibilities. Interpreting and working from P&IDs, loop diagrams, wiring schematics, and technical manuals will be essential for your day-to-day tasks. You will also be involved in assisting with the installation and commissioning of new equipment and systems. Keeping accurate documentation of maintenance activities and equipment history is crucial, as is ensuring compliance with safety, health, and environmental regulations. Collaboration with operations, engineering, and other maintenance teams to enhance system reliability will be part of your duties. Moreover, providing technical guidance and training to junior technicians or associates will be expected from you in this role.,

Posted 1 month ago

Apply

4.0 - 8.0 years

0 Lacs

mundra, gujarat

On-site

You will be joining Adani Power Limited (APL), the largest private thermal power producer in India, as the Lead for DCS_PLC (Distributed Control System_ Programmable Logic Control). Your primary responsibility will be to ensure the continuous availability, reliability, and functionality of DCS and PLC systems. This includes conducting regular system backups, maintenance, troubleshooting, and coordinating with Original Equipment Manufacturers (OEMs) for system upgrades. Your role will also involve managing hardware and software resources, network integrity, and cybersecurity measures to prevent data loss and system vulnerabilities. Your key responsibilities will include: - Ensuring the availability and functioning of all control loops in auto mode, maintaining equipment protection reliability. - Scheduling and executing regular backups of DCS and PLC systems to prevent data loss. - Performing routine maintenance and troubleshooting of DCS and PLC hardware to minimize downtime. - Monitoring system alarms daily and ensuring system healthiness and redundancy. - Coordinating with OEMs for annual maintenance activities and system upgrades. - Managing the proper functioning of Operator Workstations (OWS) and Engineering Workstations (EWS). - Maintaining backups and availability of historian systems for data integrity and recovery. - Managing the availability of DCS and PLC hardware, software, and necessary spares for emergency replacements. - Keeping DCS and PLC licenses up-to-date and managing renewals. - Conducting patch updates of DCS and PLC software to address vulnerabilities and enhance performance. - Upgrading DCS and PLC systems proactively to prevent obsolescence and maintain compatibility with new technologies. - Ensuring compliance with cybersecurity policies and strengthening network security measures. - Overseeing data transfer to third-party systems securely and reliably. - Promoting safety through training and adherence to safety protocols. - Implementing risk management practices and emergency response plans. - Ensuring adherence to statutory compliances and regulations. - Implementing Management of Change (MoC) protocols for safe modifications. - Providing support for Root Cause Analysis (RCA), Failure Mode and Effects Analysis (FMEA), and Zero Forced Outage (ZFO) to enhance system reliability. - Implementing field-related ZFO action items and AWMS for maintenance improvement. - Driving digitization and automation initiatives to optimize operational efficiency. - Identifying opportunities for automation and digitization enhancements through data analysis and system performance evaluation. You should hold a Bachelor's degree in C&I, Electronics & Communication, or an equivalent field, along with at least 4 years of experience in industrial automation, specifically with DCS and PLC systems. Experience in power generation, petrochemical, oil and gas, or heavy industrial sectors will be preferred for this role. Key stakeholders you will work closely with include internal teams from Environment & Sustainability, Techno Commercial, Operations & Maintenance, Security, Stores, Support Functions, ENDORSE, and ENOC, as well as external vendors.,

Posted 1 month ago

Apply

5.0 - 10.0 years

1 - 2 Lacs

Thane

Work from Office

Role & responsibilities We are seeking a skilled and proactive IT Administrator to join our global IT operations team. This role is based in India and will support our international offices in Asia and Africa, and the parent company in Germany. The ideal candidate will be responsible for maintaining and improving IT infrastructure across these locations and must be open to frequent international travel. Provide day-to-day IT support for users across multiple international offices. Maintain and troubleshoot network infrastructure, servers, and endpoint devices. Coordinate with local and global teams to ensure system reliability and security. Manage software deployments, updates, and license compliance. Support onboarding/offboarding processes and user access management. Travel to supported countries and the parent company in Germany for on-site support and project implementation Proven experience as an IT Administrator or similar role. Strong knowledge of Windows systems, networking, and Azure cloud services. Excellent troubleshooting and communication skills. Ability to work independently and manage multiple priorities across time zones. Willingness and ability to travel internationally on a regular basis. Fluency in English (spoken and written). Preferred candidate profile Bachelors degree in information technology /computer science, Computer applications and specialized in network, cloud data Experience working in multicultural and international environments. Familiarity with cybersecurity best practices. Certifications such Microsoft Certified: Azure Administrator, or simila

Posted 1 month ago

Apply

2.0 - 4.0 years

2 - 4 Lacs

Hyderabad, Chennai, Bengaluru

Work from Office

Embedded Systems Engineer Job Title : Embedded Systems Engineer Location : Chennai, Hyderabad, Bangalore Experience : 2-4 Overview: Focuses on developing software and firmware for embedded devices, integrating hardware and software for real-time applications. Key Responsibilities: Write and optimize firmware for microcontrollers and SoCs. Interface with sensors, actuators, and communication modules. Debug and test embedded systems in lab environments. Ensure system reliability and performance. Tools & Technologies: C/C++, Python, Assembly ARM Cortex, STM32, ESP32 Keil, MPLAB, IAR Embedded Workbench Career Path: Embedded Lead Systems Architect Embedded Solutions Manager

Posted 1 month ago

Apply

2.0 - 5.0 years

3 - 10 Lacs

Hyderabad, Telangana, India

On-site

Building software and systems to manage platform infrastructure and applications Improving reliability, quality, and time-to-market Measurement and optimization of system performance Explore and evaluate new technologies and solutions to push our capabilities forward, getting ahead of our customers needs, getting people incentivized to transform, innovate and continually improve Qualifications Experience with designing and developing globally distributed resilient applications Strong Scala knowledge Experience with DevOps tools, processes, and culture Experience with distributed systems Experience supporting mission-critical systems Strong communication skills and ability to work effectively across multiple business and technical teams Would be a plus DevOps experience, Web/UI skills, Security and Network infrastructure knowledge

Posted 2 months ago

Apply

5.0 - 8.0 years

13 - 17 Lacs

Gurugram

Work from Office

POSITION SUMMARY : In this role, you will play a crucial part in shaping the firm's infrastructure reliability and efficiency by implementing robust Site Reliability Engineering practices. Your contribution will be pivotal in ensuring the availability, scalability, and performance of our systems and applications. Leveraging your strong technical skills and expertise in DevOps principles, you will work towards enhancing the reliability of our infrastructure and minimizing downtime, thus enabling the organization to deliver high-quality software with maximum efficiency EXPERIENCE AND REQUIRED SKILL SETS : - Ensure 24-7 uptime and stability of production systems - Investigate and troubleshoot production issues - Collaborate with developers to optimize system performance - Participate in on-call rotation to provide 24/7 support for critical systems - Work on automation and enhancements to reduce manual processes / intervention. - Relevant 5+ years of experience in SRE / Production/Product Support role, with a track record of implementing SRE practices - Basic understanding of cloud solutions provided by providers such as AWS or Azure. - Basic-Intermediate knowledge of Scripting in either of Bash/Python/PowerShell. - Good presentation, communication and interpersonal skills with the ability to collaborate effectively with cross-functional teams and stakeholders across different countries and cultures. - Good problem solving and troubleshooting skills - Continuous learning mindset and willingness to adapt to new technologies and industry trends. - Good Understanding of Operating System Commands (Linux), SQL (Ability to write, analyze queries and deduce / build important information per requirement) - In-depth knowledge of Trading Life Cycle: The candidate should possess a comprehensive understanding of trading life cycle, including order management, trade execution, settlement and post-trade processes. Familiarity with various financial products like Equities, Derivatives, Currencies, Commodities, FX is a plus. - Incident and Problem Management Expertise: The candidate must demonstrate strong problem-solving skills and the ability to manage incidents frequently and efficiently within a fast paced trading environment. This includes identifying, analyzing and resolving issues related to trading systems and processes as well as collaborating with cross-functional teams to implement long-term solutions and improve operational efficiency. - Good Understanding of Tools : (a) Orchestration Autosys / Airflow or Cron (b) Monitoring & Logging PagerDuty, Prometheus & Grafana or Datadog, Splunk (c) Project Management / ITSM Service Now (Basic ability to navigate / create change tickets / incidents) , Jira (Basic ability to create Jira Tickets , ability to filter your work) EDUCATION : - Bachelors degree or masters in computer science, Engineering, Software Engineering or a relevant field

Posted 2 months ago

Apply

5.0 - 10.0 years

6 - 10 Lacs

Kolkata

Work from Office

Seeking an AWS-certified professional with expertise in cloud platforms, serverless architecture, monitoring, and highly available systems to manage, optimize, and secure AWS infrastructure while leading and mentoring teams. Key Skills: - AWS Services: IAM, EC2, VPC, ELB/ALB, Auto Scaling, Lambda - AWS Managed Products: EKS, ECS, ECR, Route 53, SES, ElastiCache, RDS, Redshift - Cloud Platforms: Expertise in AWS infrastructure and services - Serverless Development Architecture - Operating Systems: Linux - Monitoring and Alerting: Implementing and improving monitoring stacks - Security: SSH, cloud connectivity, and security protocols - System Reliability: High availability, production systems, and configuration management - Automation and Scripting: Installing and enhancing scripts - Team Leadership: Mentoring and guiding teams on new technologies - Certifications: AWS Certified Solutions Architect, Developer, DevOps Engineer, SysOps Administrator

Posted 3 months ago

Apply

8 - 13 years

30 - 45 Lacs

Bengaluru

Work from Office

Drive SRE implementation and DevOps best practices. Reduce technical debt, automate reliability workflows, and ensure performance, scalability, and observability across cloud-based digital platforms. Required Candidate profile Experienced SRE with deep knowledge of Azure cloud, CI/CD, observability, automation, and programming. Strong DevOps mindset, troubleshooting ability, and alignment with digital transformation goals

Posted 3 months ago

Apply

5.0 - 10.0 years

10 - 20 Lacs

pune

Work from Office

Senior System Reliability Engineer | C2H Location: Pune, Maharashtra, India (Zip Code: 411057) Experience & Max. Rate (per day, INR): 5 to 6 yrs: 4,500 6 to 7 yrs: 5,000 7 to 8 yrs: 6,500 8 to 9 yrs: 7,000 9 to 10 yrs: 7,500 Role Overview We are seeking a Senior System Reliability Engineer (SRE) to plan, manage, and optimize our production environments. This role involves designing monitoring and alerting solutions, improving platform reliability, automating operational processes, and collaborating with global teams to ensure high availability and performance. Key Responsibilities Plan and oversee production environment operations, ensuring stability and performance Define and implement application performance monitoring and optimization strategies Respond to incidents, identify root causes, and reduce incident recurrence over time Support code deployments across multiple lower environments and drive process automation Design, develop, and standardize monitoring and alerting mechanisms for supported applications Troubleshoot complex issues across the entire technology stack to minimize recovery time Provide feedback to development teams on operational gaps and resiliency improvements Participate in capacity planning, system design consulting, and launch reviews Maintain and monitor services for availability, latency, and system health Lead CI/CD pipeline support and DevOps best practices adoption Scale systems sustainably through automation and continuous improvement Participate in on-call rotations and occasional off-hours support Must-Have Skills Linux administration and certificate renewals Shell scripting ITIL / ITSM process knowledge SQL and application troubleshooting skills Experience with monitoring tools (preferably Splunk or Dynatrace) Jenkins CI/CD with Groovy scripting and YAML Basic Git/Bitbucket usage Networking knowledge (F5, load balancers, HSM, security keys, SSL/TLS certificates) Good-to-Have Skills Knowledge of payment flows, switching, settlements, and authorization flows Event framework architecture Basic experience with Ansible or Chef

Posted Date not available

Apply
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies