Jobs
Interviews

1121 Monitoring Tools Jobs - Page 43

Setup a job Alert
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

4.0 - 5.0 years

12 - 15 Lacs

noida, gurugram

Work from Office

The Internet of Things (IoT) will unlock trillions of dollars in value over the next 10 years as 50 billion devices are brought online. Aeris is at the forefront of this industry, building networks and applications to enable Fortune 500 clients like Chrysler, Honda and Bosch fundamentally improve their businesses. Headquartered in Silicon Valley with offices in Bucharest, Chicago, London, Delhi, Bangalore, Helsinki, and Tokyo as well as other markets. We rank among the top ten cellular providers for the IoT globally, powering critical projects across energy, transportation, retail, healthcare and more. Built from the ground up for IoT and road-tested at scale, Aeris IoT Services are based on the broadest technology stack in the industry, spanning connectivity up to vertical solutions. As veterans of the industry, we know that implementing an IoT solution can be complex, and we pride ourselves on making it simpler. Our company is in an enviable spot. We’re profitable, and both our bottom line and our global reach are growing rapidly. We’re playing in an exploding market where technology evolves daily and new IoT solutions and platforms are being created at a fast-pace. A few things to know about us: We put our customers first . When making decisions, we always seek to do what is right for our customer first, our company second, our teams third, and individual selves last. We do things differently. As a pioneer in a highly-competitive industry that is poised to reshape every sector of the global economy, we cannot fall back on old models. Rather, we must chart our own path and strive to out-innovate, out-learn, out-maneuver and out-pace the competition on the way. We walk the walk on diversity. We’re a brilliant and eclectic mix of ethnicities, religions, industry experiences, sexual orientations, generations and more – and that’s by design. We see diverse perspectives as a core competitive advantage. Integrity is essential. We believe in doing things well – and doing them right. Integrity is a core value here: you’ll see it embodied in our staff, our management approach and growing social impact work (we have a VP devoted to it). You’ll also see it embodied in the way we manage people and our HR issues: we expect employees and managers to deal with issues directly, immediately and with the utmost respect for each other and for the Company. We are owners. Strong managers enable and empower their teams to figure out how to solve problems. You will be no exception, and will have the ownership, accountability and autonomy needed to be truly creative. Aeris is looking for an experienced and visionary Observability Engineer to join our Infrastructure and Operations team. In this role, you will be responsible for designing and implementing robust observability solutions that provide deep insights into our systems, applications, and network infrastructure. Key Responsibilities Observability Systems Management : Design, deploy, and maintain observability tools and platforms, including monitoring, logging, and tracing systems. Ensure optimal configuration and performance of observability tools such as Prometheus, Loki, Grafana, ELK stack (Elasticsearch, Logstash, Kibana), Jaeger and cloud (AWS/GCP/Azure) Observability Tools. Monitoring and Alerting : Develop and manage dashboards using Kibana/Grafana and set up alerts with ElastAlert and Prometheus Alert Manager to monitor the health and performance of applications and infrastructure. Implement robust alerting mechanisms to detect and notify of anomalies, outages, and system performance issues in real-time. Logging and Tracing : Implement centralized logging solutions to aggregate logs from various systems and applications. Develop and maintain distributed tracing solutions to provide end-to-end visibility into system transactions. Performance Analysis and Optimization : Analyze system performance metrics and identify bottlenecks and performance degradation. Understanding of SLOs and SLIs Work with development and operations teams to remediate performance issues and optimize system performance. Automation and Scripting : Create automation scripts to streamline observability tasks and processes. Develop self-healing mechanisms through automated incident response. Collaboration and Communication : Work closely with development, operations, and SRE teams to align observability solutions with business and technical requirements. Provide guidance and training on observability tools and best practices to other team members. Documentation and Reporting : Create and maintain detailed documentation for observability systems, processes, and procedures. Generate periodic reports and dashboards to provide insights into system performance and reliability. Qualifications and Experience Education: Bachelor's degree in Computer Science, Information Technology, or a related field. Advanced degree preferred. Experience: Minimum of 4+ years of experience in IT infrastructure, with at least 3+ years in a observability or monitoring role. Proven experience in observability engineering, including deploying and managing observability solutions. Experience with monitoring tools (e.g., Prometheus, Grafana), logging tools (e.g., ELK stack), and tracing tools (e.g., Jaeger, OpenTelemetry). Experience with cloud platforms such as AWS, Azure, or Google Cloud and Database like MySQL. Technical Skills : Strong understanding of observability concepts including metrics, logging, and tracing. Proficiency in scripting languages such as Bash, Python, Perl or Go. Familiarity with containerization (e.g., Docker) and orchestration tools (e.g., Kubernetes) and CI/CD pipelines. Understanding of IP Network and monitoring on Network device (e.g. Router, Firewall). Experience with infrastructure as code tools (e.g., Terraform, Ansible). Soft Skills: Excellent problem-solving and analytical skills. Strong communication and collaboration skills. Ability to work independently and in a team-oriented environment. Preferred Qualifications: Experience with APM tools like New Relic, Datadog, or Dynatrace. Knowledge of service mesh technologies (e.g., Istio). Open-source contributions or relevant certifications in observability tools and methodologies. What is in it for you? You get to build the next leading edge connected vehicle platform and internet of things platform The ability to collaborate with our highly skilled groups who work with cutting edge technologies High visibility as you support the systems that drive our public facing services Career growth opportunities Roles and Responsibilities The Internet of Things (IoT) will unlock trillions of dollars in value over the next 10 years as 50 billion devices are brought online. Aeris is at the forefront of this industry, building networks and applications to enable Fortune 500 clients like Chrysler, Honda and Bosch fundamentally improve their businesses. Headquartered in Silicon Valley with offices in Bucharest, Chicago, London, Delhi, Bangalore, Helsinki, and Tokyo as well as other markets. We rank among the top ten cellular providers for the IoT globally, powering critical projects across energy, transportation, retail, healthcare and more. Built from the ground up for IoT and road-tested at scale, Aeris IoT Services are based on the broadest technology stack in the industry, spanning connectivity up to vertical solutions. As veterans of the industry, we know that implementing an IoT solution can be complex, and we pride ourselves on making it simpler. Our company is in an enviable spot. We’re profitable, and both our bottom line and our global reach are growing rapidly. We’re playing in an exploding market where technology evolves daily and new IoT solutions and platforms are being created at a fast-pace. A few things to know about us: We put our customers first . When making decisions, we always seek to do what is right for our customer first, our company second, our teams third, and individual selves last. We do things differently. As a pioneer in a highly-competitive industry that is poised to reshape every sector of the global economy, we cannot fall back on old models. Rather, we must chart our own path and strive to out-innovate, out-learn, out-maneuver and out-pace the competition on the way. We walk the walk on diversity. We’re a brilliant and eclectic mix of ethnicities, religions, industry experiences, sexual orientations, generations and more – and that’s by design. We see diverse perspectives as a core competitive advantage. Integrity is essential. We believe in doing things well – and doing them right. Integrity is a core value here: you’ll see it embodied in our staff, our management approach and growing social impact work (we have a VP devoted to it). You’ll also see it embodied in the way we manage people and our HR issues: we expect employees and managers to deal with issues directly, immediately and with the utmost respect for each other and for the Company. We are owners. Strong managers enable and empower their teams to figure out how to solve problems. You will be no exception, and will have the ownership, accountability and autonomy needed to be truly creative. Aeris is looking for an experienced and visionary Observability Engineer to join our Infrastructure and Operations team. In this role, you will be responsible for designing and implementing robust observability solutions that provide deep insights into our systems, applications, and network infrastructure. Key Responsibilities Observability Systems Management : Design, deploy, and maintain observability tools and platforms, including monitoring, logging, and tracing systems. Ensure optimal configuration and performance of observability tools such as Prometheus, Loki, Grafana, ELK stack (Elasticsearch, Logstash, Kibana), Jaeger and cloud (AWS/GCP/Azure) Observability Tools. Monitoring and Alerting : Develop and manage dashboards using Kibana/Grafana and set up alerts with ElastAlert and Prometheus Alert Manager to monitor the health and performance of applications and infrastructure. Implement robust alerting mechanisms to detect and notify of anomalies, outages, and system performance issues in real-time. Logging and Tracing : Implement centralized logging solutions to aggregate logs from various systems and applications. Develop and maintain distributed tracing solutions to provide end-to-end visibility into system transactions. Performance Analysis and Optimization : Analyze system performance metrics and identify bottlenecks and performance degradation. Understanding of SLOs and SLIs Work with development and operations teams to remediate performance issues and optimize system performance. Automation and Scripting : Create automation scripts to streamline observability tasks and processes. Develop self-healing mechanisms through automated incident response. Collaboration and Communication : Work closely with development, operations, and SRE teams to align observability solutions with business and technical requirements. Provide guidance and training on observability tools and best practices to other team members. Documentation and Reporting : Create and maintain detailed documentation for observability systems, processes, and procedures. Generate periodic reports and dashboards to provide insights into system performance and reliability. Qualifications and Experience Education: Bachelor's degree in Computer Science, Information Technology, or a related field. Advanced degree preferred. Experience: Minimum of 4+ years of experience in IT infrastructure, with at least 3+ years in a observability or monitoring role. Proven experience in observability engineering, including deploying and managing observability solutions. Experience with monitoring tools (e.g., Prometheus, Grafana), logging tools (e.g., ELK stack), and tracing tools (e.g., Jaeger, OpenTelemetry). Experience with cloud platforms such as AWS, Azure, or Google Cloud and Database like MySQL. Technical Skills : Strong understanding of observability concepts including metrics, logging, and tracing. Proficiency in scripting languages such as Bash, Python, Perl or Go. Familiarity with containerization (e.g., Docker) and orchestration tools (e.g., Kubernetes) and CI/CD pipelines. Understanding of IP Network and monitoring on Network device (e.g. Router, Firewall). Experience with infrastructure as code tools (e.g., Terraform, Ansible). Soft Skills: Excellent problem-solving and analytical skills. Strong communication and collaboration skills. Ability to work independently and in a team-oriented environment. Preferred Qualifications: Experience with APM tools like New Relic, Datadog, or Dynatrace. Knowledge of service mesh technologies (e.g., Istio). Open-source contributions or relevant certifications in observability tools and methodologies. What is in it for you? You get to build the next leading edge connected vehicle platform and internet of things platform The ability to collaborate with our highly skilled groups who work with cutting edge technologies High visibility as you support the systems that drive our public facing services Career growth opportunities

Posted Date not available

Apply

4.0 - 5.0 years

8 - 11 Lacs

gurugram

Work from Office

Position Overview : We are seeking an SRE to join our high-impact platform engineering team. You will maintain SLAs for real-time services deployed across hybrid clouds and Kubernetes clusters, contributing to automation, observability, and availability goals. Roles and Responsibilities : - Monitor application and infrastructure metrics; build dashboards and alerts (Prometheus, Grafana, ELK). - Automate health checks, incident remediation, and reliability guardrails. - Manage on-call rotations, conduct root cause analysis, and implement postmortem action plans. - Define and track SLOs, SLIs, and error budgets. - Use chaos engineering and resilience testing to ensure fault tolerance. Must Have Skills : - 4 - 5 years of experience in managing production-grade Kubernetes clusters and cloud-native platforms. - Proficiency in Linux system internals, containers, and networking. - Scripting/automation expertise in Python/Go/Shell. - Familiarity with incident management, runbooks, and observability standards. - Exposure to service discovery, DNS routing, and load balancing is a bonus. Qualification : BE/BTech/MCA/ME/MTech/MS in Computer Science or a related technical field or equivalent practical experience.

Posted Date not available

Apply

4.0 - 5.0 years

8 - 12 Lacs

gurugram

Work from Office

Position Overview : We are looking for a Mid-Level Kubernetes Administrator to support and maintain our on-premises container orchestration infrastructure built on open-source Rancher Kubernetes. This role will focus on day-to-day cluster operations, deployment support, and working closely with DevOps, Infra, and Application teams. Roles and Responsibilities : - Manage Rancher-based Kubernetes clusters in an on-premise environment. - Deploy and monitor containerized applications using Helm and Rancher UI/CLI. - Support pod scheduling, resource allocation, and namespace management. - Handle basic troubleshooting of workloads, networking, and storage issues. - Monitor and report cluster health using Prometheus, Grafana, or similar tools. - Manage users, roles, and access using Rancher-integrated RBAC. - Participate in system patching, cluster upgrades, and capacity planning. - Document standard operating procedures, deployment guides, and issue resolutions. Must Have Skills : - 45 years of experience in Kubernetes administration in on-prem environments. - Hands-on experience with Rancher for managing K8s clusters. - Working knowledge of Linux system administration and networking. - Experience in Docker, Helm, and basic YAML scripting. - Exposure to CI/CD pipelines and Git-based deployment workflows. - Experience with monitoring/logging stacks (Prometheus, Grafana). Good to Have Skills : - Certified Kubernetes Administrator (CKA). - Familiarity with RKE (Rancher Kubernetes Engine). - Experience with bare metal provisioning, VM infrastructure, or storage systems. Qualification : BE/BTech/MCA/ME/MTech/MS in Computer Science or a related technical field or equivalent practical experience.

Posted Date not available

Apply

3.0 - 6.0 years

9 - 18 Lacs

noida

Hybrid

Position Summary : The DevOps Cloud Infrastructure Engineer L2 plays a critical role in designing, implementing, maintaining, and supporting robust cloud infrastructure solutions within a fast-paced, innovative enterprise environment. This advanced position is tailored for an experienced professional who possesses in-depth knowledge of cloud technologies, automation, infrastructure as code (IaC), continuous integration and continuous deployment (CI/CD) pipelines, and possesses foundational networking expertise. The Level 2 designation signifies senior-level responsibilities, including ownership of complex technical issues, mentoring junior staff, and shaping the cloud infrastructure strategy for the organisation. Key Responsibilities : Cloud Infrastructure Design and Implementation: Architect and deploy scalable, highly available, and fault-tolerant cloud environments using industry-leading platforms such as AWS, Azure, or Google Cloud Platform (GCP). Automation and Configuration Management: Develop, maintain, and improve infrastructure automation using tools such as Terraform, CloudFormation, Ansible, or similar. Ensure systems can be deployed, configured, monitored, and managed via code. CI/CD Pipeline Development and Maintenance: Build and optimise CI/CD pipelines using tools like Jenkins, GitLab CI, Azure DevOps, or similar, to ensure rapid and reliable deployment of software and infrastructure changes. Monitoring, Logging, and Observability: Implement and manage monitoring, alerting, and logging solutions (e.g., Prometheus, Grafana, ELK/EFK Stack, CloudWatch, Stackdriver). Ensure high visibility into system operations and the ability to proactively respond to incidents. Incident Response and Troubleshooting: Lead the investigation and resolution of complex infrastructure and application issues. Serve as an escalation point for Level 2 engineers. Perform root cause analysis and drive continuous improvement initiatives. Security and Compliance: Implement and enforce cloud security best practices, including identity and access management (IAM), encryption, vulnerability management, and network segmentation. Contribute to compliance efforts (e.g., ISO 27001, SOC 2, GDPR) as required. Cost Optimisation: Monitor cloud usage and spending; recommend and implement strategies to manage and reduce costs without compromising performance or security. Networking: Design and manage cloud networking components such as VPCs, subnets, security groups, firewalls, and VPNs. Apply basic networking principles such as IP addressing, routing, DNS, TCP/IP, and load balancing to ensure secure and efficient connectivity. Documentation : Create and maintain technical documentation for infrastructure design, processes, and troubleshooting guides. Ensure knowledge is shared across the team. Mentorship and Collaboration: Mentor junior DevOps engineers and collaborate closely with developers, security teams, and other stakeholders to deliver high-quality solutions. Participate in code reviews, architectural discussions, and cross-functional meetings. Continuous Improvement: Research and recommend new technologies, tools, and practices that improve reliability, performance, and developer productivity. Contribute to a culture of innovation and learning. Qualifications and Experience Education: Bachelors degree in computer science, Information Technology, Engineering, or equivalent professional experience. Experience: Minimum 3+ years of relevant experience in DevOps, Cloud Engineering, or System Administration roles, with demonstrable expertise in cloud infrastructure at scale. Certifications (Preferred): Certifications such as AWS Certified Solutions Architect, Azure Solutions Architect Expert, Google Professional Cloud Architect, or equivalent. Networking Knowledge: Understanding of basic networking concepts, including but not limited to IP addressing, subnetting, routing, firewalls, DNS, and VPNs. Cloud Platform Proficiency: Advanced experience with at least one major cloud provider (AWS, Azure, GCP), including services such as compute, storage, networking, security, and database. Automation Tools: Proficient in infrastructure automation and configuration management tools (e.g., Terraform, Ansible, Puppet, CloudFormation). CI/CD Tools: Working knowledge of pipeline tools such as Jenkins, GitLab CI, Azure DevOps, or similar. Monitoring & Logging: Experience implementing and managing observability platforms (e.g., Prometheus, Grafana, ELK/EFK, Cloud-native solutions). Scripting: Proficiency in scripting languages such as Python, Bash, PowerShell, or similar for automation and orchestration. Operating Systems: Advanced knowledge of Linux/Unix and Windows server environments. Security Best Practices: Experience with IAM, encryption, vulnerability management, and incident response in cloud environments. This position is in Noida, India location. 3 days in office required. Work hours: 9:00 am – 6:00 pm IST

Posted Date not available

Apply

2.0 - 4.0 years

7 - 14 Lacs

bengaluru

Work from Office

We are looking for a skilled Java Developer / SRE who is passionate about building, maintaining, and optimizing high-performance systems. The ideal candidate will have strong expertise in Java coding, troubleshooting, and debugging , along with hands-on experience in DevOps practices, Kubernetes, CI/CD pipelines, and monitoring tools . Key Responsibilities: Develop, maintain, and optimize Java-based applications. Troubleshoot and debug production and development issues efficiently. Collaborate with DevOps teams to automate builds, deployments, and system monitoring. Manage and operate Kubernetes clusters for scalable and resilient applications. Set up and maintain CI/CD pipelines for seamless code integration and delivery. Implement and maintain monitoring and alerting solutions for system health and performance. Participate in incident response, root cause analysis, and continuous improvement. Requirements: Bachelors degree in Computer Science, Engineering, or related field (or equivalent experience). Strong proficiency in Java with proven debugging and problem-solving skills. Experience with Kubernetes and containerization (Docker). Knowledge of CI/CD tools (e.g., Jenkins, GitLab CI, GitHub Actions). Familiarity with monitoring tools (Prometheus, Grafana, ELK, Datadog, or similar). Good understanding of cloud platforms (AWS, GCP, or Azure). Strong collaboration and communication skills.

Posted Date not available

Apply

8.0 - 13.0 years

20 - 30 Lacs

chennai

Hybrid

Role: Senior DevOps Engineer Location: Chennai Experience: 8+ Years Roles and Responsibilities: Provision and manage resources in different cloud environments Recommend and implement cloud architecture best practices Understand and design plans for cloud migrations as per the project requirements Contribute to build automation and software deployments on various type of cloud environments Provide necessary support to the development team in achieving the project deliverables Collaborate closely with the development team and provide timely updates and status reports Have a strong problem-solving attitude and resolve issues understanding the priority Ability to make the infrastructure more resilient Required Technical and Professional Expertise: Minimum 10 years of experience as a DevOps engineer Experience in a public cloud provider like Amazon Web Services,. Good experience DevOps and Agile practices Hands on experience in a Continuous Integration tool like Jenkins, Gitlab etc. Expert in scripting and automation using Shell, Python or PowerShell etc. Strong in Docker and experience in containerization Hands on experience in Kubernetes Good experience in Linux Distributions and open-source DevOps tools Experience in using Git version control system Strong in a configuration management tool such as Ansible or Chef Experience in infrastructure automation tool Terraform Strong problem-solving attitude Good written and verbal communication skills with the ability to document and communicate technical information to the IT professionals Ability to take ownership and accountability of DevOps day-to-day activities Mandatory Skills: Cloud - AWS (VPC, EKS, SQS, Cognito, LAMDA, Stepfunction, etc..,) CI/CD - Jenkins | Gitlab | GitHub | Any other CI/CD tool Scripting language - Shell | Python | PowerShell (Minimum 1) Work experience in application architecture and producationizing the system Docker DB Knowledge Kubernetes Infrastructure as Code - Terraform | AWS CloudFormation (Minimum 1) Configuration Management - Ansible | Chef (Minimum 1) Monitoring - Prometheus, Zabbix, NewRelic, Splunk, Amazon CloudWatch (Minimum 1)

Posted Date not available

Apply

1.0 - 2.0 years

3 - 5 Lacs

chennai

Work from Office

Role & responsibilities Monitor Cloud Infrastructure: Continuously monitor cloud environments (AWS and Azure & GCP) using DataDog, AppTio, and Nagios to ensure optimal performance and availability. Incident Management: Detect, analyse, and resolve Cloud/Infrastructure issues in a timely manner to minimize downtime and impact on services. Performance Tuning: Identify and implement optimizations to improve cloud infrastructure performance. Reporting: Generate and analyse monitoring reports to provide insights and recommendations for infrastructure improvements. Collaboration: Work closely with development, DevOps, and Cloud/Infrastructure teams to ensure seamless integration and performance of cloud services. Documentation: Maintain comprehensive documentation of monitoring setups, procedures, and best practices. Compliance: Ensure all monitoring practices adhere to industry standards and compliance requirements. Required Qualifications Experience: Minimum of 1 to 2 years of experience in cloud monitoring or a related field. Tools and Technologies: Proficient in using DataDog, Grafana, and Nagios for monitoring and analysis. Cloud Platforms: Strong knowledge of AWS and Azure services and architecture, GCP. Scripting and Automation: Experience with scripting languages (e.g., Python, Shell) and automation tools. Incident Response: Proven experience in incident management and resolution. Analytical Skills: Strong analytical and problem-solving skills. Communication: Excellent verbal and written communication skills, with the ability to convey complex technical concepts to non-technical stakeholders. OS Competency - Linux & Windows, Micro Services - Docker Preferred Qualifications Experience with Other Monitoring Tools: Familiarity with other monitoring tools and platforms is a plus. DevOps Practices: Understanding of DevOps principles and practices.

Posted Date not available

Apply

2.0 - 5.0 years

2 - 3 Lacs

hyderabad

Work from Office

Plan production based on incoming orders Prioritize operations to maximize output, minimize delays Coordinate workflow, workforce & materials via MRP Monitor progress & resolve issues Collaborate with engineering, quality & supply chain teams Required Candidate profile Knowledge in production & BOM entry to ERP Strong Problem-Solving & Decision-Making Data-savvy with Analytics & Reporting Leadership, Adaptability & Communication skills

Posted Date not available

Apply

7.0 - 10.0 years

5 - 13 Lacs

bengaluru

Hybrid

Hiring for Sr. Support Analyst at bangalore location Role: Sr. Support Analyst Exp : 7 - 9 Years Job location : Bangalore Work Mode : Hybrid Responsibilities : APS team member would be working on standard banking software & in-house applications etc.. The APS team member is responsible for providing production support, maintenance of key application platforms, deployment within the GM TP APS domain. Responsibilities: Candidate must work as level 1/2 and be an escalation point for others in the team to bring technical and product issues to resolve. Responsible for monitoring production environment and act proactively to prevent performance issues or application crash. Responsible for resolving support issue by using technical expertise and flexible enough to look for solutions that may be out of the box. Handling ITIL Methodologies like Change, Incident, Problem, and Service Management Monitoring night batch and ensuring reports are generated well and transferred to client by adhering the SLA defined. Monitor the recurrent incidents, perform problem management and escalate to the next level of support or development team when required Coordinate with Infrastructure teams on events of patching & up gradation of servers to ensure the applications are stable & running after the infra work Analyzing/documenting problems, recommending solutions, & initiating corrective action Providing coaching and mentoring to junior colleagues, transferring skills and expertise as required. Technical & Behavioral Competencies: UNIX : Knowledge on Operating System Knowledge on shell scripting Scheduler & Monitoring Tools: Knowledge on Schedulers, Crontab, Autosys Knowledge on Geneos Tool , Dynatrace Scripting: Knowledge on Dos and Shell scripting WebLogic: Knowledge on MQ and JMS Databases: Knowledge on SQL Queries Knowledge on Troubleshooting

Posted Date not available

Apply

4.0 - 9.0 years

9 - 19 Lacs

hyderabad

Work from Office

• Continuous 24x7 infrastructure monitoring via NMS tools (SolarWinds, ManageEngine, Nagios, etc.) • Monitor: o AS400 systems o Database servers (SQL, Oracle, etc.) o Application servers o Raise and track alerts/tickets for system anomalies.

Posted Date not available

Apply

0.0 - 2.0 years

1 - 2 Lacs

gurugram

Work from Office

Role & responsibilities: 1. Ticket Management. Review, triage, and assign tickets raised by employees. Track ticket status and ensure timely resolution. Maintain documentation of recurring issues for knowledge sharing. 2. System Monitoring. Monitor dashboards and alerts to identify performance or system issues. Correlate alerts with tickets and incidents. Work with engineering/IT teams to resolve issues proactively. 3. Communication & Coordination. Serve as a liaison between employees and technical teams. Provide updates on ticket status and resolution timelines. Escalate high-priority issues promptly. 4. Reporting & Process Improvement. Generate reports on ticket volumes, resolution times, and system health. Recommend improvements in processes or monitoring setups. Preferred candidate profile: Have Relevant Technical Experience. Hands-on Jira experience creating, assigning, tracking, and reporting on tickets. Grafana familiarity reading dashboards, interpreting metrics, and responding to alerts. Basic understanding of IT infrastructure, networking, and application performance. Exposure to other ITSM tools (ServiceNow, Freshservice, Zendesk) is a plus.

Posted Date not available

Apply

6.0 - 11.0 years

14 - 24 Lacs

kochi, bengaluru, thiruvananthapuram

Work from Office

Min 5 to max 8 years. A Site Reliability Engineer is a professional who acts as a warrior to monitor, protect customer applications, taking charge on operational tasks to ensure the efficient functioning of a system. They are responsible for monitoring, automating, and improving the reliability, performance, and availability of any applications. Mandatory to have working experience as SRE Lead or Techno function role as Site Reliability Engineer (SRE) at customer work location in the e-com domain. Must have knowledge of Production Application Support. Working experience in interacting with Team/Onsite/customers who provide 24x7 coverage, help & guidance during India's night coverage. Should know how to gather SRE requirement from Tech and non-tech aspect from customer. Must have excellent knowledge of ensuring reliability and scalability of applications. Should have excellent automation skills to automate repetitive tasks, reduce false alarms using python and or any other languages. Working experience on how to gather requirements on health of applications, services to monitor, setting service levels. Must have Level 1, Level 2 and Level 3 support experience in eCommerce platforms. • Hands on experience in Monitoring, Logging, Alerting, Dashboarding, and report generation in any monitoring tools such as AppDynamics/Splunk/Dynatrace/Datadog/CloudWatch/ELK/Prome/New Relic). Must have knowledge in ITIL framework specifically on Alerts, Incident, change management, CAB, Production deployments, Risk and mitigation plan, SLA, SLI . Should be able to lead P1 calls, brief about the P1 to customer, proactive in gathering leads/ customers into the P1 calls till RCA. Experience working with postman. Should have knowledge on building and executing SOP, runbooks, handling any ITSM platforms (JIRA/ServiceNow/BMC Remedy). Must know how to work with the Dev team, cross functional teams across time zones. Should be able to generate WSR/MSR by extracting the tickets from ITSM platforms

Posted Date not available

Apply

4.0 - 9.0 years

8 - 10 Lacs

kolkata

Work from Office

SUMMARY Job Title: LoadRunner Performance Test Engineer Experience: 4 5 Years Location: Kolkata Must-Have: The candidate should have 3 years of relevant experience in LoadRunner Performance Testing Key Responsibilities: Develop and implement performance test scripts using LoadRunner. Plan and execute load, stress, and endurance tests to assess system performance. Analyze test results, pinpoint bottlenecks, and offer tuning recommendations. Collaborate with development, QA, and infrastructure teams to resolve performance issues. Create and maintain performance test documentation, including test plans, cases, and reports. Monitor system resources (CPU, memory, network, etc.) during test execution. Set up and maintain performance test environments. Required Skills: Proficient in LoadRunner (VuGen scripting, Controller, and Analysis components). Expertise in performance testing concepts (load, stress, endurance, spike testing). Knowledge of protocols like Web (HTTP/HTML), Web Services, and Database. Strong skills in performance analysis, monitoring, and tuning. Ability to interpret performance metrics and logs to identify root causes. Familiarity with performance monitoring tools (e.g., Dynatrace, AppDynamics, Grafana, etc.) preferred. Strong analytical and problem-solving skills. Effective communication skills for cross-functional collaboration. Good to Have: Familiarity with JMeter or other performance testing tools. Experience in CI/CD pipeline integration for performance testing. Educational Qualification: Bachelor’s degree in Computer Science, Engineering, or related field. Requirements Requirements: Bachelor’s degree in Computer Science, Engineering, or related field. 4 5 years of experience in performance testing. Strong hands-on experience with LoadRunner. Knowledge of performance monitoring tools (preferred). Good communication and problem-solving skills.

Posted Date not available

Apply

5.0 - 10.0 years

2 - 6 Lacs

surat

Work from Office

Job Title : Kafka Integration Specialist Job Description : We are seeking a highly skilled Kafka Integration Specialist to join our team. The ideal candidate will have extensive experience in designing, developing, and integrating Apache Kafka solutions to support real-time data streaming and distributed systems. Key Responsibilities : - Design, implement, and maintain Kafka-based data pipelines. - Develop integration solutions using Kafka Connect, Kafka Streams, and other related technologies. - Manage Kafka clusters, ensuring high availability, scalability, and performance. - Collaborate with cross-functional teams to understand integration requirements and deliver robust solutions. - Implement best practices for data streaming, including message serialization, partitioning, and replication. - Monitor and troubleshoot Kafka performance, latency, and security issues. - Ensure data integrity and implement failover strategies for critical data pipelines. Required Skills : - Strong experience in Apache Kafka (Kafka Streams, Kafka Connect). - Proficiency in programming languages like Java, Python, or Scala. - Experience with distributed systems and data streaming concepts. - Familiarity with Zookeeper, Confluent Kafka, and Kafka Broker configurations. - Expertise in creating and managing topics, partitions, and consumer groups. - Hands-on experience with integration tools such as REST APIs, MQ, or ESB. - Knowledge of cloud platforms like AWS, Azure, or GCP for Kafka deployment. Nice to Have : - Experience with monitoring tools like Prometheus, Grafana, or Datadog. - Exposure to DevOps practices, CI/CD pipelines, and infrastructure automation. - Knowledge of data serialization formats like Avro, Protobuf, or JSON. Qualifications : - Bachelor's degree in Computer Science, Information Technology, or related field. - 4+ years of hands-on experience in Kafka integration projects.

Posted Date not available

Apply

7.0 - 12.0 years

5 - 10 Lacs

visakhapatnam

Work from Office

Job Description: We are seeking a seasoned and proactive IT Lead to oversee and enhance our organizations IT infrastructure, security posture, and end-user computing environment. This leadership role demands deep technical expertise across networking, device management, server architecture, and cybersecurity, along with a proven track record in leading and mentoring IT support and infrastructure teams. You will be responsible for planning, implementing, optimizing, and maintaining all aspects of IT infrastructure and services while also managing a team of 58 IT professionals. Your mission is to ensure a reliable, secure, and scalable technology environment that aligns with business goals. Key Responsibilities Infrastructure & Network Management Design, deploy, and manage network infrastructure, including LAN, WAN, Wi-Fi, firewalls, switches, and routers. Ensure high availability and performance of on-premises and cloud-hosted infrastructure. Proactively monitor and remediate performance issues or risks in network and server environments. Security Oversight Define and implement best-in-class cybersecurity practices, including endpoint protection, patching strategies, firewalls, VPNs, and access control. Conduct vulnerability assessments and initiate remediation processes for security gaps. Ensure compliance with data protection standards (e.g., ISO 27001, GDPR, SOC2). Asset & Device Management Oversee lifecycle management of IT assets: procurement, deployment, monitoring, and decommissioning. Implement and manage MDM/EMM solutions to secure and control endpoints (laptops, mobile devices). Enforce standardization of operating environments and configurations across the organization. Server & Systems Administration Manage physical and virtualized servers (Windows/Linux) and cloud infrastructure (e.g., AWS, Azure, GCP). Administer AD, DNS, DHCP, backup systems, and centralized authentication (SSO, LDAP). Implement and optimize automation for server and application deployment. Team Leadership & Collaboration Lead, mentor, and develop a team of 58 IT engineers and support specialists. Define and track KPIs and SLAs for support and operational excellence. Collaborate with business units and leadership to align IT strategy with organizational goals. IT Governance & Operational Excellence Develop IT policies, SOPs, and disaster recovery plans. Manage service desk operations ensuring timely resolution of IT incidents and requests. Drive continuous improvement initiatives, vendor evaluations, and tool rationalization. Skills: Technical Proficiency 7+ years of experience in IT infrastructure and network administration, with 2+ years in a leadership capacity. Strong expertise in managing firewalls (e.g., Fortinet, Palo Alto), VPNs, and network security tools. Experience with Windows/Linux servers, Active Directory, Office 365, Exchange, and Azure AD. Familiarity with cloud infrastructure, virtual environments (e.g., VMware/Hyper-V), and automation tools. Proficient in IT asset management systems, ticketing platforms, and monitoring tools. Leadership & Communication Demonstrated ability to lead and scale technical teams. Excellent stakeholder management, reporting, and cross-functional collaboration. Strong problem-solving, crisis response, and strategic planning capabilities. Preferred Qualifications Certifications such as CCNA/CCNP, CompTIA Security+, Microsoft Certified: Azure Administrator, or CISSP. Experience working in regulated industries or with compliance requirements. Knowledge of DevSecOps and integrating security into CI/CD pipelines.

Posted Date not available

Apply

2.0 - 4.0 years

2 - 2 Lacs

chennai

Work from Office

We’re seeking a detail-driven Marketing Operations Associate to manage and optimize our inbound lead process. You’ll be responsible for capturing, qualifying, and handing off leads to our Sales team with speed and accuracy,

Posted Date not available

Apply

5.0 - 9.0 years

15 - 30 Lacs

bengaluru

Hybrid

Say hello to possibilities Its not everyday that you consider starting a new career. Were RingCentral, and we’re happy that someone as talented as you is considering this role. First, a little about us, we’re the $2 billion global leader in cloud-based communications and collaboration software. We are fundamentally changing the nature of human interaction—giving people the freedom to connect powerfully and personally from anywhere, at any time, on any device. This is where you and your skills come in. We’re currently looking for: A Container Operations Engineer, you will be part of a skilled global team responsible for deploying, supporting, and maintaining RingCentral’s containerization platforms. Your work will directly impact the stability, scalability, and reliability of the systems powering mission-critical services. Responsibilities: Deploy, support, and maintain the company’s Kubernetes clusters (on-premises and AWS EKS) Participate in incident response, troubleshooting, and root cause analysis for production issues Participate in an on-call rotation covering services in India and U.S. regions Drive continuous improvements in platform stability, performance, and automation Qualifications: 5+ years of experience with Linux systems administration Understanding of networking fundamentals Hands-on experience with automation tools such as Ansible, GitLab CI/CD, and scripting(Python or similar) Working knowledge of Kubernetes and container orchestration principles Familiarity with 24/7 production support environments, change management, and operational best practices Experience with AWS cloud platforms is a plus Strong problem-solving skills and a collaborative mindset What we offer: Mediclaim benefits Paid holidays Casual/Sick leave Privilege leave Bereavement leave Maternity & Paternity leave Wellness programs & coaching Employee referral bonus Professional development allowances Night shift allowances RingCentral’s Engineering team works on high-complexity projects that set the standard for performance and reliability at massive scale. What kind of scale? Millions of users today and hundreds of millions tomorrow. This is your chance to help imagine, develop and deliver products that raise the technological bar, and power human connections. If you’re a talented, ambitious, creative thinker, RingCentral is the perfect environment to join a world class team and bring your ideas to life. RingCentral’s work culture is the backbone of our success. And don’t just take our word for it: we are recognized as a Best Place to Work by Glassdoor, the Top Work Culture by Comparably and hold local BPTW awards in every major location. Bottom line: We are committed to hiring and retaining great people because we know you power our success. RingCentral offers on-site, remote and hybrid work options optimized for the ways we work and live now. About RingCentral RingCentral, Inc. (NYSE: RNG) is a leading provider of business cloud communications and contact center solutions based on its powerful Message Video Phone™(MVP™) global platform. More flexible and cost effective than legacy on-premises PBX and video conferencing systems that it replaces, RingCentral® empowers modern mobile and distributed workforces to communicate, collaborate, and connect via any mode, any device, and any location. RingCentral is headquartered in Belmont, California, and has offices around the world. RingCentral is an equal opportunity employer that truly values diversity. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

Posted Date not available

Apply

5.0 - 10.0 years

2 - 6 Lacs

pune

Work from Office

We are seeking a highly skilled Kafka Integration Specialist to join our team. The ideal candidate will have extensive experience in designing, developing, and integrating Apache Kafka solutions to support real-time data streaming and distributed systems. Key Responsibilities : - Design, implement, and maintain Kafka-based data pipelines. - Develop integration solutions using Kafka Connect, Kafka Streams, and other related technologies. - Manage Kafka clusters, ensuring high availability, scalability, and performance. - Collaborate with cross-functional teams to understand integration requirements and deliver robust solutions. - Implement best practices for data streaming, including message serialization, partitioning, and replication. - Monitor and troubleshoot Kafka performance, latency, and security issues. - Ensure data integrity and implement failover strategies for critical data pipelines. Required Skills : - Strong experience in Apache Kafka (Kafka Streams, Kafka Connect). - Proficiency in programming languages like Java, Python, or Scala. - Experience with distributed systems and data streaming concepts. - Familiarity with Zookeeper, Confluent Kafka, and Kafka Broker configurations. - Expertise in creating and managing topics, partitions, and consumer groups. - Hands-on experience with integration tools such as REST APIs, MQ, or ESB. - Knowledge of cloud platforms like AWS, Azure, or GCP for Kafka deployment. Nice to Have : - Experience with monitoring tools like Prometheus, Grafana, or Datadog. - Exposure to DevOps practices, CI/CD pipelines, and infrastructure automation. - Knowledge of data serialization formats like Avro, Protobuf, or JSON. Qualifications : - Bachelor's degree in Computer Science, Information Technology, or related field. - 4+ years of hands-on experience in Kafka integration projects.

Posted Date not available

Apply

9.0 - 13.0 years

22 - 37 Lacs

noida

Hybrid

Principal Site Reliability Engineers must be passionate about learning and evolving with current technology trends. They strive to innovate and are relentless in pursuing a flawless customer experience. They have an automate everything” mindset, helping us bring value to our customers by deploying services with incredible speed, consistency, and availability. Primary/Essential Duties and Key Responsibilities: • Engage in and improve the lifecycle of services from conception to EOL, including system design consulting, and capacity planning • Define and implement standards and best practices related to: System Architecture, Service delivery, metrics and the automation of operational tasks • Support services, product & engineering teams by providing common tooling and frameworks to deliver increased availability and improved incident response • Improve system performance, application delivery and efficiency through automation, process refinement, postmortem reviews, and in-depth configuration analysis • Collaborate closely with engineering professionals within the organization to deliver reliable services • Increase operational efficiency, effectiveness, and quality of services by treating operational challenges as a software engineering problem (reduce toil) • Guide junior team members and serve as a champion for Site Reliability Engineering • Actively participate in incident response, including on-call responsibilities • Partner with stakeholders to influence and help drive the best possible technical and business outcomes. Qualifications • Engineering degree, or a related technical discipline, or equivalent work experience • Experience coding in higher-level languages (e.g., Python, JavaScript, C++, or Java) • Knowledge of Cloud based applications & Containerization Technologies • Demonstrated understanding of best practices in metric generation and collection, log aggregation pipelines, time-series databases, and distributed tracing • Demonstrable fundamentals in 2 of the following: Computer Science, Cloud Architecture, Security, or Network Design fundamentals • Working experience with industry standards like Terraform, Ansible (Experience, Education, Certification, License and Training) • Must have at least 7 years of hands-on experience working in Engineering or Cloud • Minimum 5 years' experience with public cloud platforms (e.g. GCP, AWS, Azure) • Minimum 3 years' Experience in configuration and maintenance of applications and/or systems infrastructure for large scale customer facing company • Experience with distributed system design and architecture.

Posted Date not available

Apply

5.0 - 10.0 years

15 - 25 Lacs

bengaluru

Work from Office

Role Overview Were seeking a Senior Operations Engineer to lead the stability, scalability, and reliability of our Java-based microservices platform. In this role, youll own production incident response, guide system architecture for resilience, and drive SRE practices such as SLOs and error budgets. You’ll also work across teams to scale observability, automate toil, and embed operational excellence in every service. Key Responsibilities Lead critical incident response efforts and own the resolution of high-severity outages. Drive root cause analysis and long-term reliability improvements. Architect and maintain observability solutions (Datadog, Prometheus). Define and track service-level objectives (SLOs) and participate in error budget planning. Collaborate with developers on the onboarding of new services with operational readiness standards. Eliminate toil through automation and process optimization. Mentor junior and mid-level engineers, helping define team standards and best practices. Participate in compliance and risk reviews related to operational infrastructure. Skills & Qualifications 5+ years in a technical operations, DevOps, or site reliability engineering role. Deep experience in production operations of microservices (Java/Spring Boot preferred). Expert in Kafka, MongoDB, Kubernetes, Terraform, and CI/CD pipeline management. Strong background in observability (Datadog, ELK, OpenTelemetry) and monitoring best practices. Familiarity with ITIL-aligned processes and a strong bias for DevOps and SRE principles. Leadership presence, ability to influence cross-functional teams, and mentor others

Posted Date not available

Apply

5.0 - 10.0 years

2 - 6 Lacs

ahmedabad

Work from Office

We are seeking a highly skilled Kafka Integration Specialist to join our team. The ideal candidate will have extensive experience in designing, developing, and integrating Apache Kafka solutions to support real-time data streaming and distributed systems. Key Responsibilities : - Design, implement, and maintain Kafka-based data pipelines. - Develop integration solutions using Kafka Connect, Kafka Streams, and other related technologies. - Manage Kafka clusters, ensuring high availability, scalability, and performance. - Collaborate with cross-functional teams to understand integration requirements and deliver robust solutions. - Implement best practices for data streaming, including message serialization, partitioning, and replication. - Monitor and troubleshoot Kafka performance, latency, and security issues. - Ensure data integrity and implement failover strategies for critical data pipelines. Required Skills : - Strong experience in Apache Kafka (Kafka Streams, Kafka Connect). - Proficiency in programming languages like Java, Python, or Scala. - Experience with distributed systems and data streaming concepts. - Familiarity with Zookeeper, Confluent Kafka, and Kafka Broker configurations. - Expertise in creating and managing topics, partitions, and consumer groups. - Hands-on experience with integration tools such as REST APIs, MQ, or ESB. - Knowledge of cloud platforms like AWS, Azure, or GCP for Kafka deployment. Nice to Have : - Experience with monitoring tools like Prometheus, Grafana, or Datadog. - Exposure to DevOps practices, CI/CD pipelines, and infrastructure automation. - Knowledge of data serialization formats like Avro, Protobuf, or JSON. Qualifications : - Bachelor's degree in Computer Science, Information Technology, or related field. - 4+ years of hands-on experience in Kafka integration projects.

Posted Date not available

Apply

8.0 - 13.0 years

20 - 27 Lacs

bengaluru

Hybrid

Job Title: Site Reliability Engineer Job Description: We are looking for an experienced Senior Cloud Devop /Site Reliability Engineer (SRE) to join our growing team. This role demands a focus on enhancing the reliability, efficiency, and performance of our infrastructure and hosted applications. The role will collaborate with other SRE team members, applying your expertise in coding, algorithms, complexity analysis, and large-scale system design to tackle complex challenges. The ideal candidate will have hands-on experience in deploying and managing Kubernetes clusters, utilizing Grafana and Prometheus for monitoring, and supporting Microsoft SQL Server and VMware virtualization. Expertise in cloud deployments, Windows operating systems, and network technologies are a plus. Prior experience in DevOps, including CI/CD pipeline management and automation, is highly desirable. We emphasize self-directed, impactful work while providing the necessary support and mentorship for professional growth. Key Responsibilities: Deploy, manage, and scale Kubernetes clusters to support containerized applications. Design, implement, and optimize continuous integration and continuous deployment (CI/CD) pipelines to streamline code deployment and enhance delivery efficiency. Implement and maintain monitoring solutions with Kubernetes, Rancher, Grafana and Prometheus to ensure system reliability. Drive efficiency by automating manual and repetitive tasks to reduce operational costs. Assist in managing and enhancing the entire production stack to deliver reliable systems for a diverse range of internal and external customers. Assist our internal Database teams to optimize and support Microsoft SQL Server databases, including performance tuning and maintenance. Assist our internal Virtualization/Server teams to ensure high availability and efficient resource use in our technology stack. Deploy and manage cloud resources on Azure or AWS, focusing on cost efficiency and performance optimization. Support and troubleshoot Windows operating systems and network technologies. Required Skills: Demonstrated ability to automate processes and improve efficiency. Strong focus on managing and optimizing the full production stack for a wide user base. Proven experience as a DevOps Engineer, with a strong foundation in Kubernetes, Rancher, Grafana, Prometheus, Microsoft SQL Server, and VMware Experience in designing, implementing, and managing CI/CD pipelines using tools like Jenkins, GitHub, or similar. Experience with Windows operating systems and network technologies. Qualifications: Bachelor's degree in computer science, Engineering, or a related field. Relevant experience in a Site Reliability Engineer or similar role, with a background in DevOps and CI/CD pipeline management preferred.

Posted Date not available

Apply

4.0 - 9.0 years

8 - 18 Lacs

guwahati

Work from Office

Job Overview We are seeking a skilled MySQL Database Administrator to join our team in Guwahati. This mid-level position requires a minimum of 4 years of relevant experience. The successful candidate will play a vital role in managing and maintaining our database systems, utilizing both on-premises and cloud environments. Qualifications and Skills Must have 5+ years relevant experience in PostgreSQL Core DBA Mandatory Skills/Knowledge 1. Excellent Knowledge on PostgreSQL Installation, configuration and troubleshooting replication tools - -Troubleshoot issues involving Unix levels commands 2. Fix issues related to Backup and Recovery with commands. -Migration tool from Oracle to Postgresql setup. 3. Good Hands-on on troubleshooting related to any user related problem like access management, space management, generic errors, etc 4. database upgrades and migrations on any Cloud Enviorment 5. Different type of replication concepts like Group replication, Multi master replication 6. Excellent performance troubleshooting using pgadmin commands, tools like pgpool, repmgr, oswatcher, etc 7. Troubleshooting issues related Database upgrades and migrations. -Install, configure and troubleshooting replication tools 8. Troubleshoot issues involving Unix, storage or network problem 9. Knowledge on VCS Cluster -Knowledge on, Grafana, kibana will be an added advantage Mentor and team builder. Reliable, hardworking, self-motivated, able to work independently or as a team member with the ability to meet deadlines. Adoption attitude based of situation.

Posted Date not available

Apply

4.0 - 6.0 years

6 - 10 Lacs

pune

Hybrid

Role & responsibilities The Role: Define strategies for Application Performance Monitoring, Optimization in Prod environment. Respond to Incidents and improvise platform based on feedback and measure the reduction of incidents over time. Ensures that batch production scheduling and process are accurate and timely. Able to create and execute queries to big data platforms and relational data tables to identify process issues or to perform mass updates, preferred. Performs ad hoc requests from users such as data research, file manipulation/transfer, research of process issues, etc. Take a holistic approach to problem solving, by connecting the dots during a production event through the various technology stack that makes up the platform, to optimize meantime to recover. Engage in and improve the whole lifecycle of servicesfrom inception and design, through deployment, operation and refinement. Analyze ITSM activities of the platform and provide feedback loop to development teams on operational gaps or resiliency concerns. Support services before they go live through activities such as system design consulting, capacity planning and launch reviews. Support the application CI/CD pipeline for promoting software into higher environments through validation and operational gating, and lead in DevOps automation and best practices. Maintain services once they are live by measuring and monitoring availability, latency and overall system health. Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity. Work with a global team spread across tech hubs in multiple geographies and time zones. Ability to share knowledge and explain processes and procedures to others. Requirements Skills: Must Have: Linux Kubernetes ITIL / ITSM Application Troubleshooting Any Monitoring tool (Preferred Splunk/Dynatrace) Jenkins - CI/CD Good To Have: Even Framework architecture Git basic/bit bucket Ansible/Chef- Basic Shell Scripting - Basic SQL Groovy Scripting/Yaml

Posted Date not available

Apply

5.0 - 7.0 years

10 - 13 Lacs

noida

Work from Office

Roles and Responsibilities: We are seeking a proactive and experienced Operations Team Lead to oversee the day-to-day operations of our IT infrastructure. This role requires strong expertise in server management, system monitoring, security compliance, and customer-facing issue resolution. The ideal candidate will lead a team responsible for ensuring high availability, performance, and security of all operational systems. Key Responsibilities: Lead the operations team in managing and maintaining production and staging servers, ensuring optimal uptime, performance, and security. Monitor system health, logs, and metrics using tools like Nagios, Prometheus, Grafana, or ELK stack , and proactively identify and resolve issues before they impact services. Drive root cause analysis and resolution for incidents and service disruptions, including performance bottlenecks, software bugs, and configuration issues. Ensure secure configuration and operation of infrastructure by enforcing best practices in firewalling, access control, patch management, and vulnerability mitigation . Implement and refine standard operating procedures (SOPs) for system maintenance, monitoring, deployments, and escalation management. Serve as the escalation point for technical issues, customer concerns, and production incidents, coordinating timely resolution and communication. Collaborate with internal engineering, QA, and support teams to streamline operations, improve system resilience, and enhance customer satisfaction. Plan and manage regular system updates, security audits, and failover testing to ensure business continuity. Provide mentorship, training, and performance oversight for team members, promoting a culture of accountability and continuous improvement. Maintain documentation for infrastructure architecture, operational processes, and incident reports. Desired Candidate Profile: Bachelors degree in Computer Science, Information Technology, or a related field. 5+ years of experience in IT operations, system administration, or DevOps, including team leadership or supervisory roles. Strong hands-on experience with Linux/Unix server environments , networking concepts, and infrastructure tools. Proficient in troubleshooting, performance tuning, log analysis , and system debugging. Knowledge of monitoring and alerting systems, automation/scripting (e.g., Bash, Python), and infrastructure-as-code tools is a plus. Familiarity with customer support practices, incident communication, and service-level management. Experience working in environments with high availability, disaster recovery, and security compliance requirements. Strong interpersonal and communication skills, with the ability to manage stakeholders, lead teams, and handle customer-facing situations with professionalism. Certifications such as RHCE, AWS SysOps, or CompTIA Security+ are advantageous.

Posted Date not available

Apply
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies