Jobs
Interviews

4 Monitoringobservability Tools Jobs

Setup a job Alert
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

2.0 - 6.0 years

0 Lacs

kerala

On-site

We are hiring! Founded by alumni from IIT and ISB, Osfin.ai is an emerging global B2B SaaS firm on a mission to automate the complex financial operations that enterprises face daily. We partner with global banks, premier fintechs, leading e-commerce platforms, and the next generation of unicorns to transform their FinOps processes driving operational and commercial excellence. We are looking for professionals who are analytical, proactive, self-driven, and thrive in fast-paced environments. At Osfin.ai, ownership and responsibility are valued as much as technical expertise. As a Site Reliability Engineer at Osfin.ai, you will ensure reliability, scalability, and performance of our globally distributed SaaS platform. You will design and implement monitoring, alerting, and incident response systems to maintain 24/7 availability. Collaborating with engineering and product teams, you will embed reliability best practices into the development lifecycle. You will also drive automation of deployments, infrastructure provisioning, and system health checks. Improving resilience of microservice-based architectures, ensuring fault tolerance and rapid recovery, will be crucial. Establishing SLOs/SLIs/SLAs and continuously optimizing for better reliability and customer experience will be part of your responsibilities. To be considered for this role, you should have 2-4 years of experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering. Strong hands-on experience with cloud-native platforms such as AWS, GCP, or Azure, and container orchestration tools like Kubernetes and Docker is required. Expertise in databases like Oracle RDBMS, MS-SQL, or similar, with experience in scaling and performance tuning is essential. Proficiency in monitoring/observability tools such as Prometheus, Grafana, ELK/EFK, Datadog, etc., is a must. A solid understanding of CI/CD pipelines, configuration management, and infrastructure-as-code using tools like Terraform or Ansible is expected. Knowledge of microservices architecture, system design, and distributed systems reliability is highly desirable. A strong problem-solving mindset with the ability to handle on-call responsibilities is crucial. Join us at Osfin.ai to work with a high-caliber team of industry experts driving impact in the global FinOps space. We offer competitive compensation and a generous ESOP program. You will have exposure to leading global customers across fintech, banking, and e-commerce. Experience a collaborative and innovative culture where your contributions directly shape the future of financial automation.,

Posted 20 hours ago

Apply

8.0 - 12.0 years

0 Lacs

karnataka

On-site

As a member of the SSE Site Reliability Engineering team at Cisco, you will be responsible for working on Deployment, End to End monitoring, observability, automation, compliance, and reporting of Cisco Secure Access services globally. The team is spread across multiple locations including India, Europe, Canada, and the United States of America. In this role, you will play a critical part in expanding capabilities to manage a larger fleet and support faster troubleshooting of customer issues. Your responsibilities will include designing, implementing, and maintaining observability solutions for cloud-native applications and infrastructure. You will develop and optimize diagnostics tooling to identify and resolve system or application-level issues efficiently. Monitoring cloud infrastructure for uptime, performance, and scalability, responding promptly to incidents and outages will be a key part of your role. Collaboration with development, operations, and support teams to drive improvements in system observability and troubleshooting workflows is essential. You will lead root cause analysis for major incidents, driving long-term fixes to prevent recurrence, and resolve customer-facing operational issues in a timely and effective manner. Automation of operational processes and incident response tasks will be necessary to reduce manual interventions and improve efficiency. Continuous assessment and improvement of cloud observability tools, integrating new features and technologies where necessary, are also part of the role. Additionally, creating and maintaining comprehensive documentation on cloud observability frameworks, tools, and processes is vital. You will collaborate with various teams including Software Engineering, Product Management, DevOps, Infrastructure, Customer Support, Security and Compliance, Global Network Operations, and Data Analytics and Reporting. This collaboration will involve ensuring new features and services are reliable, scalable, and observable, engaging with product managers to incorporate reliability and performance into product roadmaps, working with customer support teams to diagnose and resolve incidents, maintaining compliance with industry standards, supporting service reliability and incident response, and creating meaningful dashboards and reports for insights. To be successful in this role, you should have a Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent work experience, along with 8+ years of experience in cloud engineering, SRE, or DevOps. Expertise with cloud platforms and related monitoring/observability tools, strong experience with diagnostics and troubleshooting tools for cloud services, proficiency in scripting languages and infrastructure-as-code, experience in operational incident management, solid understanding of containerization and microservices architecture, knowledge of network performance monitoring and debugging techniques, and a desire to solve complex problems are essential. Proactive communication skills and the ability to collaborate with Engineering teams are also crucial for this role. At Cisco, you will be part of a diverse and inclusive culture that values innovation, creativity, and equality for all. The company focuses on digital transformation, implementing change in digital businesses, and creating solutions that power how humans and technology work together across the physical and digital worlds. Cisco encourages a culture of giving and taking, accountability, bold steps, diversity of thought, and dedication to equality. As part of the team, you will have the opportunity to work on meaningful solutions that have a global impact and revolutionize how data and infrastructure connect and protect organizations in the AI era and beyond. Cisco provides unparalleled security, visibility, and insights across the entire digital footprint, offering limitless opportunities to grow and build solutions that make a difference on a global scale.,

Posted 1 week ago

Apply

8.0 - 12.0 years

0 Lacs

karnataka

On-site

As an experienced professional in Site Reliability Engineering (SRE) or Production Support roles, your primary responsibility will be to ensure the reliability and availability of Boomi integration platforms in Linux environments. You will play a crucial role in installing, configuring, monitoring, and managing these platforms to guarantee optimal performance. In this role, you will design and implement monitoring and alerting solutions to proactively identify and address infrastructure issues before they impact service levels. Root cause analysis for incidents will be a key aspect of your responsibilities, enabling you to facilitate rapid recovery and implement preventive measures effectively. Automation will be a core focus, as you will be expected to develop scripts and tools to streamline operational tasks, enhance efficiency, and minimize manual interventions. Furthermore, you will collaborate with cross-functional teams to diagnose and resolve critical production issues related to Boomi integrations. Maintaining operational documentation, including runbooks, playbooks, and standard operating procedures (SOPs), will be essential to ensure seamless operations. Regular health checks, maintenance activities, and patch management for Boomi environments will also fall under your purview. Your expertise in system architecture, deployment processes, and performance tuning will be leveraged to identify areas for improvement and implement necessary enhancements. Collaboration with development teams, cloud operations, network teams, and other integration teams will be crucial to implementing performance optimizations and best practices effectively. Staying abreast of the latest developments in Boomi and cloud technologies will be expected, and you will be required to apply relevant advancements to the infrastructure. Maintaining comprehensive documentation of systems, processes, and procedures will be essential for knowledge sharing and operational continuity. Additionally, you will be responsible for preparing and presenting regular reports on system performance, incident response, and reliability metrics. Experience in cloud platforms such as AWS, Azure, or GCP, as well as familiarity with monitoring and observability tools like Datadog and Grafana, will be advantageous in this role. If you have 8+ years of experience in SRE, Production Support, or related roles with a specific focus on Boomi platform administration, we invite you to apply and contribute your expertise to our team.,

Posted 3 weeks ago

Apply

3.0 - 7.0 years

0 Lacs

pune, maharashtra

On-site

Rojo Integrations, a comprehensive SAP integration leader, was founded in 2011. Partnering with top software vendors like SAP, Coupa, SnapLogic, and Solace, Rojo specializes in seamless enterprise integration and data analytics solutions. Trusted by global Bluechip companies such as Heineken and Siemens, Rojo delivers tailored services to meet unique business needs. The company is headquartered in the Netherlands and operates globally from offices in the Netherlands, Spain, and India, focusing on SAP integration modernization and business processes to improve data integration and business strategies. Rojo's portfolio includes consultancy, software development, and managed services to streamline integration, enhance observability, and drive growth. The Rojo Managed Services team ensures customer satisfaction with real-time monitoring, error reporting, troubleshooting, and active performance improvements. The team aims to prevent incidents and provide sustainable solutions promptly while tackling new challenges daily. Join the team of puzzlers and contribute to solving the next big challenge. To succeed in this role, you should have 3-6 years of experience within an IT organization, preferably in Integration Support. Knowledge of Monitoring/Observability tools like Splunk/Data Dog and leading integration platforms such as SAP CI, SnapLogic, or MuleSoft is essential. A passion for technology and programming, professional English proficiency, strong customer service orientation, and the ability to work in a diverse, global 24/7 team are required. Familiarity with Event Driven Architecture and past experience with applications like Salesforce, AWS, Snowflake, MS Dynamics CRM, and other ERP tools is beneficial. Basic programming experience, familiarity with JIRA service desk, and flexibility to work rotational/flexible/weekend shifts in a hybrid work environment are necessary. Immediate joiners are preferred. Additional desired skills include a bachelor's degree in computer science, Software Engineering, or equivalent, analytical skills, a Continuous Improvement mindset, the ability to work according to procedures and best practices, and autonomy in decision-making. Rojo offers the opportunity to gain work experience in a dynamic environment with growth opportunities, innovative projects, and a supportive learning environment. Training, mentoring, an international atmosphere, and diverse working climate are provided with exciting region-specific benefits. If you are interested in this opportunity, apply now as Rojo values diversity and encourages applicants who may not meet all criteria to still apply.,

Posted 1 month ago

Apply
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies