Jobs
Interviews

1098 Monitoring Tools Jobs - Page 22

Setup a job Alert
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

1.0 - 4.0 years

5 - 10 Lacs

Bengaluru

Work from Office

UI - Developer P1: HTML, ReactJS, Javascript, Typescript, modular CSS, UNIT testing, Redux, UI debugging skills, Writing efficient & quality code, integrating with BE APIs, Figma or similar design tool for UI-Mocks. P2: Logging & Monitoring tools like Quantum Metrics, Splunk, Grafana etc., UI performance engineering & security. .

Posted 1 month ago

Apply

4.0 - 6.0 years

0 Lacs

Noida

Remote

Role Summary While many vendors treat monitoring as a reactive afterthought, we embed Datadog-trained Observability Engineers directly into our engineering and operations teams to deliver real-time visibility, proactive tuning, and smarter incident management. We are looking for a highly capable Observability & Monitoring Engineer with 46 years of experience in Datadog and related observability practices. The engineer will be at the forefront of transforming how systems are monitored—reducing noise, accelerating root-cause discovery, and enabling smarter, correlated event flows across cloud-native environments. Core Responsibilities: Datadog Ownership: Build and maintain Datadog dashboards, monitors, and SLOs with a focus on business and operational relevance. Configure and tune alerts to eliminate noise and reduce false positives, enabling focused responses and intelligent routing. Proactive Monitoring & Alert Tuning: Implement proactive alert strategies based on usage patterns and event behavior. Continuously optimize thresholds, baselines, and anomaly detection logic to ensure actionable monitoring signals. Observability & Root-Cause Analysis (RCA): Correlate metrics, logs, and traces across distributed systems to facilitate rapid root-cause triangulation. Drive investigations from high CPU alerts to middleware issues such as queue overloads, using Datadog APM and tracing. Integrated Support & Event Correlation: Work closely with L2/Smart L3 and platform teams to support event correlation, AWS incident flows, and CI/CD telemetry. Participate in day-to-day IT operations, functional system support, and incident escalation workflows. SAP CPI API Monitoring: Build and maintain targeted dashboards for SAP CPI APIs to ensure availability, throughput, and performance visibility. What Makes This Role Unique: You are embedded in the core delivery team, not isolated in a separate monitoring silo. You work on proactive monitoring, not just reacting to alerts. You support a platform aligned with Smart’s tooling and architecture, including high-frequency CI tracing and real-time AWS integration. You help evolve how we define “observability maturity” by integrating it deeply into development and ops workflows. Required Skills & Experience: 4–6 years of experience in observability, SRE, or DevOps roles with strong exposure to Datadog. Experience with configuring and managing Datadog’s dashboards, monitors, APM, and logs. Deep understanding of observability principles: metrics, logs, distributed traces, RUM, and synthetic monitoring. Experience tracing infrastructure or application alerts (e.g., CPU, latency) to actual service or middleware-level bottlenecks. Familiarity with cloud platforms like AWS (preferred), Azure, or GCP. Hands-on experience in event management, incident support, and RCA documentation. Exposure to SAP CPI monitoring or other enterprise integration middleware is a plus. What You’ll Get: The opportunity to redefine observability in a modern, fast-paced environment. Ownership of critical monitoring pipelines and real-time troubleshooting tools. Work with global engineering and platform teams to drive performance and reliability. Flexible work environment and access to upskilling resources.

Posted 1 month ago

Apply

2.0 - 5.0 years

3 - 6 Lacs

Hyderabad

Work from Office

Keyloop bridges the gap between dealers, manufacturers, technology suppliers and car buyers. We empower car dealers and manufacturers to fully embrace digital transformation. How? By creating innovative technology that makes selling cars better for our customers, and buying and owning cars better for theirs. We use cutting-edge technology to link our clients’ systems, departments and sites. We provide an open technology platform that’s shaping the industry for the future. We use data to help clients become more efficient, increase profitability and give more customers an amazing experience. Want to be part of it? Overview We are looking for a motivated DevOps & AWS Engineer to join our Keyloop Infrastructure and Cloud Operations team. You will work closely with senior engineers to support the management of AWS environments, Infrastructure as Code (IaC) with Terraform, and application provisioning with Ansible. This role is ideal for someone who wants to develop strong DevOps skills in a real-world production environment, learn modern automation practices, and grow under the guidance of experienced mentors. Roles and Responsibilities Assist in the implementation and maintenance of CI/CD pipelines using tools like Git and Jenkins under senior team guidance. Support the development of infrastructure as code (IaC) using Terraform to automate the provisioning of AWS resources such as EC2,S3,RDS, and VPCs. Help maintain configuration management using Ansible, ensuring environments are consistent and compliant. Write basic shell scripts for automation tasks under supervision. Assist in monitoring and maintaining Windows and Linux servers, helping to ensure performance, uptime, and security. Collaborate with development, QA, and operations teams to follow best practices in DevOps workflows. Support system monitoring and reporting using tools such as CloudWatch and Grafana. Follow security best practices and support the team in maintaining compliance and proper IAM configuration. Participate in knowledge-sharing sessions and seek guidance from senior engineers to grow technical skills. Support incident response efforts and on-call tasks as part of a team rotation, with mentorship. Be open to researching new tools and assisting in proof-of-concept tasks for automation and infrastructure improvement. Skills / Knowledge & Experience Essential Good interpersonal and communication skills. Strong willingness to learn and grow in a dynamic, fast-paced environment. 2-5 years of relevant experience in DevOps or Cloud Engineering. Good understanding of AWS services (EC2, S3, IAM, VPC). Familiarity with version control using Git. Knowledge on any Windows and Linux environments is must Exposure to scripting languages such as Bash or PowerShell. Eagerness to work with Terraform and Ansible—hands-on experience is a plus but not mandatory if you can demonstrate foundational knowledge. Good troubleshooting skills and a problem-solving mindset. Ability to follow instructions, seek clarification when needed, and document work clearly. Self-motivated and organised, with attention to detail Desirable Experience with Cloud Technologies (AWS, Azure). Familiarity with system monitoring tools that support proactive interventions to prevent service impact. Understanding of ITIL or other service management frameworks. Previous experience working in a 24/7 Infrastructure Operations environment Additional Information This position is work from office, requiring presence during all shifts, including nights and weekends as part of the 24/7 Operations Center and office cab transport is provided for all shifts. Why join us? We’re on a journey to become market leaders in our space – and with that comes some incredible opportunities. Collaborate and learn from industry experts from all over the globe. Work with game-changing products and services. Get the training and support you need to try new things, adapt to quick changes and explore different paths. Join Keyloop and progress your career, your way. An inclusive environment to thrive We’re committed to fostering an inclusive work environment. One that respects all dimensions of diversity. We promote an inclusive culture within our business, and we celebrate different employees and lifestyles – not just on key days, but every day. Be rewarded for your efforts We believe people should be paid based on their performance so our pay and benefits reflect this and are designed to attract the very best talent. We encourage everyone in our organisation to explore opportunities which enable them to grow their career through investment in their development but equally by working in a culture which fosters support and unbridled collaboration. Keyloop doesn’t require academic qualifications for this position. We select based on experience and potential, not credentials. We are also an equal opportunity employer committed to building a diverse and inclusive workforce. We value diversity and encourage candidates of all backgrounds to apply .

Posted 1 month ago

Apply

6.0 - 10.0 years

8 - 18 Lacs

Bengaluru

Hybrid

We are seeking a Senior Production Support Engineer with deep expertise in Java, Linux , and system-level troubleshooting to join our dynamic team. This role involves handling critical production issues, managing system uptime, and ensuring performance through proactive support and real-time incident resolution. Key Responsibilities: Troubleshoot, research, and resolve major defects or inconsistencies in application/system functions. Handle system alerts, escalations, and outages with speed and accuracy. Provide 24/7 on-call support on a rotational basis. Perform in-depth code reviews and understand system behavior from both infra and application perspectives. Maintain and monitor server space, health, and performance. Lead service restoration efforts, focusing on reducing MTTR. Collaborate with vendors and internal teams for root cause analysis and permanent fixes. Track system performance and suggest improvements proactively. Participate in regular project and planning meetings, offering technical insights. Mentor junior support engineers and guide them on best practices. Act as the SME (Subject Matter Expert) for designated applications. Required Skills: 4-year degree in Computer Science or equivalent field. 5+ years of experience in Java and Production Support . Experience with Spring Boot 2 or 3 , and Java 17+ is a plus. Strong expertise in Linux OS , command-line operations, and performance tuning. Good experience with Oracle or MySQL databases. Hands-on with shell scripting , Python , Perl , or Ruby (at least one). Excellent communication skills, able to coordinate during critical incidents. Proven experience leading service restoration and performing root cause analysis. Ability to understand and debug large, complex enterprise applications. Nice to Have: Familiarity with Java 17+ , microservices, and containerized environments. Experience with monitoring tools (Nagios, Prometheus, ELK, etc.). Prior experience in telecom, banking, or high-availability platforms.

Posted 1 month ago

Apply

0.0 - 1.0 years

2 - 4 Lacs

Chennai

Remote

Site Reliability Engineer: Role Description: 1. Incident management (tickets) 2. partner communication exposure 3. release management (CI/CD knowledge and hands on experience) 4. Monitoring and alert management 5. Linux knowledge in detail 6. Automation experience 7. DevOps tools knowledge 8. Exposure to cloud technology AWS 9. Kubernetes knowledge and working experience 10. Ansible, terraform, cloud formation knowledge is added advantage. 11. General (should Flexible during critical issues) Skills Needed: 1. Technology Skills required: SRE java - struts, JSP, spring-boot, MySQL, python, automation. 2. Monitoring tools nagios, grafana prometheus, newRelic, Kibana 3. Networking basics - TCP/IP, ports, Load-balancers, Network Flow 4. Cloud Technology basics AWS infra, cloud watch knowledge will be an advantage 5. Automation - Ansible, cloud formation, Terraform OS: Primarily Linux Commands and shell scripting, Python. Regards, Gladwin

Posted 1 month ago

Apply

5.0 - 9.0 years

0 Lacs

thiruvananthapuram, kerala

On-site

Techvantage.ai is a next-generation technology and product engineering company at the forefront of innovation in Generative AI, Agentic AI, and autonomous intelligent systems. We build intelligent, scalable, and future-ready digital platforms that drive the next wave of AI-powered transformation. We are seeking a highly skilled and experienced Senior Node.js Developer with 5+ years of hands-on experience in backend development. As part of our engineering team, you will be responsible for architecting and building scalable APIs, services, and infrastructure that power high-performance AI-driven applications. You'll collaborate with front-end developers, DevOps, and data teams to ensure fast, secure, and efficient back-end functionality that meets the needs of modern AI-first products. What we are looking for in an ideal candidate: - Design, build, and maintain scalable server-side applications and APIs using Node.js and related frameworks. - Implement RESTful and GraphQL APIs for data-driven and real-time applications. - Collaborate with front-end, DevOps, and data teams to build seamless end-to-end solutions. - Optimize application performance, scalability, and security. - Write clean, maintainable, and well-documented code. - Integrate with third-party services and internal microservices. - Apply best practices in code quality, testing (unit/integration), and continuous integration/deployment. - Troubleshoot production issues and implement monitoring and alerting solutions. Requirements: - 5+ years of professional experience in backend development using Node.js. - Proficiency in JavaScript (ES6+) and strong experience with Express.js, NestJS, or similar frameworks. - Experience with SQL and NoSQL databases (e.g., PostgreSQL, MySQL, MongoDB). - Strong understanding of API security, authentication (OAuth2, JWT), and rate limiting. - Experience building scalable microservices and working with message queues (e.g., RabbitMQ, Kafka). - Familiarity with containerized applications using Docker and orchestration via Kubernetes. - Proficient in using Git, CI/CD pipelines, and version control best practices. - Solid understanding of performance tuning, caching, and system design. Preferred Qualifications: - Experience in cloud platforms like AWS, GCP, or Azure. - Exposure to building backends for AI/ML platforms, data pipelines, or analytics dashboards. - Familiarity with GraphQL, WebSockets, or real-time communication. - Knowledge of infrastructure-as-code tools like Terraform is a plus. - Experience with monitoring tools like Prometheus, Grafana, or New Relic. What We Offer: - The chance to work on cutting-edge products leveraging AI and intelligent automation. - A high-growth, innovation-driven environment with global exposure. - Access to modern development tools and cloud-native technologies. - Attractive compensation - no constraints for the right candidate.,

Posted 1 month ago

Apply

3.0 - 7.0 years

0 Lacs

haryana

On-site

As a Kafka Administrator at our Merchant Ecommerce platform in Noida Sector 62, you will be responsible for managing, maintaining, and optimizing our distributed, multi-cluster Kafka infrastructure in an on-premise environment. You should have a deep understanding of Kafka internals, Zookeeper administration, and performance tuning to ensure operational excellence in high-throughput, low-latency production systems. Experience with API gateway operations (specifically Kong) and observability tooling is considered a plus. Your key responsibilities will include: - Managing multiple Kafka clusters with high-availability Zookeeper setups - Providing end-to-end operational support including deployment, configuration, and health monitoring of Kafka brokers and Zookeeper nodes - Conducting capacity planning, partition strategy optimization, and topic lifecycle management - Implementing backup and disaster recovery processes with defined RPO/RTO targets - Enforcing security configurations such as TLS encryption, authentication (SASL, mTLS), and ACL management - Optimizing Kafka producer and consumer performance to meet low-latency, high-throughput requirements - Planning and executing Kafka and Zookeeper upgrades and patching with minimal/zero downtime - Integrating Kafka with monitoring platforms like Prometheus, Grafana, or similar tools - Defining and enforcing log retention and archival policies in line with compliance requirements Additionally, you will be responsible for integrating Kafka metrics and logs with centralized observability and logging tools, creating dashboards and alerts to monitor Kafka consumer lag, partition health, and broker performance, and collaborating with DevOps/SRE teams to ensure visibility into Kafka services. You will also be involved in applying CIS benchmarks, performing automated security scans across Kafka nodes, managing secret and certificate rotation, supporting regular vulnerability assessments, and ensuring timely remediation. To be successful in this role, you should have: - 3+ years of hands-on Kafka administration experience in production environments - Strong understanding of Kafka internals and Zookeeper management - Experience in Kafka performance tuning, troubleshooting, and security mechanisms - Proficiency in monitoring and logging tools and scripting skills for operational automation Preferred qualifications include experience with API gateways, Kubernetes-based environments, compliance standards, security hardening practices, and Infrastructure as Code (IaC) tools. In return, we offer you a mission-critical role in managing large-scale real-time data infrastructure, a flexible work environment, opportunities for growth, and access to modern observability and automation tools.,

Posted 1 month ago

Apply

5.0 - 9.0 years

0 Lacs

coimbatore, tamil nadu

On-site

As a Lead Database Administrator (DBA) at our organization, you will play a crucial role in managing and optimizing our database infrastructure. Your primary responsibilities will include designing, implementing, and overseeing scalable and high-performance database solutions across various environments. You will be working with a variety of relational database management systems such as MySQL, MSSQL, PostgreSQL, and MongoDB, as well as cloud database management on AWS. In this leadership position, you will lead database migrations, both on-premise and to the cloud, ensuring minimal downtime and a smooth transition. You will also be responsible for implementing best practices for database backup, disaster recovery, and security across multiple database systems. Your expertise in database performance tuning and query optimization will be essential in enhancing application performance. Additionally, you will be involved in capacity planning to ensure that our database environments are adequately scaled to meet application demands. Implementing automation tools for database monitoring, reporting, and health checks will also be a part of your responsibilities. You will be required to develop and enforce database policies, procedures, and documentation while staying up to date on industry trends and emerging technologies in database management and cloud platforms. The ideal candidate for this role will possess a strong background in database migrations, AWS cloud services, and various database technologies. Proficiency in database design, optimization, backup, recovery, and high availability is essential. Strong knowledge of database security best practices, automation and scripting, leadership and collaboration, problem-solving skills, and relevant certifications such as AWS Certified Database Specialty or AWS Solutions Architect are preferred. Additional qualifications that would be beneficial for this role include experience with big data technologies, CI/CD pipelines, database monitoring tools, and DevOps methodologies. Your ability to work with monitoring tools, SQL proficiency, high availability solutions, data security, performance tuning, and team leadership will be key to succeeding in this position.,

Posted 1 month ago

Apply

8.0 - 12.0 years

0 Lacs

chennai, tamil nadu

On-site

As a Performance Testing Engineer at Lexitas India Pvt. Ltd., you will play a crucial role in ensuring the optimal performance and load testing for multiple SAAS-based products. You will take ownership of developing, executing, analyzing, and maintaining performance scripts while identifying bottlenecks and recommending areas for improvement. Your responsibilities will also include setting up performance benchmarking, signing off releases from a performance standpoint, and managing environments and infrastructure. You will be expected to organize demos for stakeholders, present comprehensive reports with detailed analysis, understand customer challenges, and provide insightful recommendations. Collaboration with product, development, and QA teams is essential, and you will be responsible for coaching and mentoring junior engineers to enhance their skills. To excel in this role, you should have a minimum of 8 years of experience in performance testing, with expertise in server, client, and database-side performance and load testing. Proficiency in tools such as JMeter for performance and load testing, as well as monitoring tools like Datadog, AppDynamics, or Dynatrace, is crucial. Knowledge of cloud-based environments, particularly Azure, for build, deployment, and infrastructure management is preferred. Experience in version control systems like GIT (Bitbucket), hands-on skills in scripting, analyzing reports, and presenting findings to stakeholders are vital for success in this position. Strong capabilities in performance bottleneck analysis, benchmarking, along with excellent communication and collaboration skills, will be key assets. Ideally, you should hold a bachelor's degree in computer science, while a master's degree is preferred. Your proven track record of at least 8 years in the field of performance testing will be instrumental in meeting the demands of this role effectively.,

Posted 1 month ago

Apply

4.0 - 8.0 years

0 Lacs

hyderabad, telangana

On-site

Job Description: As an Infrastructure & Performance Test Engineer, you will be responsible for designing, executing, and optimizing load, stress, and distributed testing strategies for cloud-based systems. Your expertise in HTTP/HTTPS traffic analysis, monitoring tools, and reporting, along with a solid understanding of AWS infrastructure and performance at scale, will be crucial in this role. Your key responsibilities will include planning and conducting Load Testing, Stress Testing, and Distributed Load Testing to simulate real-world traffic patterns. You will create and manage test datasets to ensure accurate simulations and validations, while monitoring and analyzing HTTP/HTTPS calls and system metrics (CPU, memory, IOPS, network) during test execution. Utilizing tools such as JMeter, Gatling, k6, or Locust for performance testing will also be part of your daily tasks. Additionally, you will automate end-to-end test cases using Selenium for UI validation when necessary. Collaboration with DevOps to test upgrades and infrastructure changes, ensuring no degradation in requests per second or latency, will be essential. You will leverage AWS services such as CloudWatch, EC2, ELB, RDS, etc., for monitoring and test environment setup. Identifying bottlenecks and recommending system improvements before major releases will also be a key aspect of your role. To excel in this position, you should have at least 3-5 years of experience in performance/infrastructure testing or DevOps QA. Proficiency with load testing tools like JMeter, Gatling, k6, familiarity with Selenium for UI test automation, and a strong understanding of HTTP/HTTPS protocols and API testing are required. Experience in AWS infrastructure, monitoring tools, distributed test execution, parallel load generation, latency and response time checking, as well as scripting skills in Python, Bash, or similar languages are necessary. Preferred qualifications include experience in continuous integration environments (e.g., Jenkins, GitHub Actions), exposure to Infrastructure as Code (IaC) tools like Terraform or CloudFormation, and previous experience with major system upgrades and verifying post-upgrade performance baselines.,

Posted 1 month ago

Apply

3.0 - 8.0 years

0 Lacs

karnataka

On-site

Seeking a highly motivated and experienced Couchbase Developer to join our dynamic team. In this role, you will be responsible for providing technical support and maintenance for our mission-critical applications powered by Couchbase Server. You will play a crucial role in ensuring the stability, performance, and security of our Couchbase deployments. Diagnose and resolve complex issues related to Couchbase Server, including performance bottlenecks, data inconsistencies, replication issues, and query optimization. Monitor Couchbase performance metrics, identify areas for improvement, and implement optimizations to ensure optimal performance and scalability. This includes query tuning, index optimization, and cluster configuration adjustments. Implement proactive monitoring solutions to detect and address potential issues before they impact production. Perform regular maintenance tasks, such as backups, upgrades, and security patching. Participate in incident response and resolution, working closely with other teams to minimize downtime and restore service quickly. Work closely with developers, operations teams, and other stakeholders to understand their needs and provide technical guidance on Couchbase best practices. Communicate effectively with both technical and non-technical audiences. Maintain accurate and up-to-date documentation for Couchbase deployments, including configuration settings, troubleshooting guides, and best practices. Contribute to capacity planning efforts, forecasting future needs and recommending appropriate hardware and software configurations. Implement and maintain security best practices for Couchbase Server, including access control, data encryption, and vulnerability management. Develop and maintain automation scripts for routine tasks, such as backups, monitoring, and deployments. Stay up-to-date with the latest Couchbase features, best practices, and security updates. Required Skills: - Bachelor's degree in Computer Science or a related field. - 3+ years of experience in developing and supporting applications using Couchbase Server. - Strong understanding of Couchbase architecture, including data modeling, indexing, querying, and replication. - Experience with N1QL query language and performance tuning. - Proficiency in at least one scripting language (e.g., Python, Bash). - Experience with Linux operating systems. - Strong troubleshooting and problem-solving skills. - Excellent communication and collaboration skills. - Experience with monitoring tools (e.g., Prometheus, Grafana) is a plus. - Experience with containerization technologies (e.g., Docker, Kubernetes) is a plus. - Couchbase certifications are a plus.,

Posted 1 month ago

Apply

4.0 - 10.0 years

0 Lacs

navi mumbai, maharashtra

On-site

As a Performance Tester, you will be a vital part of our team, ensuring that our applications perform optimally in terms of scalability and reliability. Your responsibilities will include designing and conducting performance tests, analyzing the results, identifying bottlenecks, and collaborating with the development teams to enhance performance. You will collaborate with the business, QA, and development teams to understand performance goals and expectations, develop comprehensive performance test strategies covering various metrics and scenarios, and execute tests using tools like JMeter, LoadRunner, and Gatling. During test execution, you will monitor system performance metrics and work closely with developers to troubleshoot any performance issues. Your role will involve analyzing test results to pinpoint bottlenecks and areas for improvement, creating detailed performance reports with recommendations and KPIs, and presenting findings to both technical and non-technical stakeholders. Additionally, you will work with development teams to implement performance optimization techniques and conduct tuning exercises to enhance system performance. Ideal candidates will possess a strong grasp of performance testing methodologies, proficiency in tools like JMeter and LoadRunner, scripting skills in languages such as Java and Python, knowledge of network protocols and database concepts, as well as excellent analytical and communication skills. Experience with cloud platforms, CI/CD pipelines, automation tools, and monitoring applications will be advantageous. If you are a passionate performance tester with a meticulous approach and a commitment to delivering high-quality software, we invite you to apply for this full-time position based in Bangalore, Chennai, Delhi, Gurgaon, Hyderabad, Kolkata, Navi Mumbai, Noida, Pune, or Vadodara. The role requires onsite work and candidates with 10+ years of experience, including 4+ years in performance testing, are preferred. This position follows the UK work shift.,

Posted 1 month ago

Apply

4.0 - 8.0 years

0 Lacs

pune, maharashtra

On-site

As a Scala/Akka Actor System Developer in Pune, you will be responsible for designing and implementing large-scale Actor-based systems using the Akka toolkit. Your expertise in handling actor supervision, message delivery, and system failure recovery will be essential in building high-performance messaging systems with reactive design. Working closely with engineering leads, you will contribute to the development of reusable and scalable services while maintaining and enhancing existing Akka-based components. Your in-depth understanding of the Actor Model in Akka, proficiency in Scala and asynchronous programming, and strong skills in stateful actors and fault-tolerant design are crucial for success in this role. Experience in scheduling and timers within actor systems will further enhance your capabilities. Preferred skills such as experience with Akka Cluster and Akka Sharding, performance tuning, and load testing of actor-based systems, as well as familiarity with metrics, tracing, and monitoring tools, will be advantageous. In addition to technical skills, strong logical reasoning and debugging abilities are essential for this role. As an independent contributor and team collaborator, you will play a key role in core system-level engineering work, deep involvement in scalable backend design, and addressing real-time product use cases. If you are looking for a challenging role that offers the opportunity to work on critical backend systems, this position provides a platform for your professional growth and development.,

Posted 1 month ago

Apply

2.0 - 6.0 years

0 Lacs

karnataka

On-site

As a Technology Support II team member at JPMorgan Chase, you will be instrumental in maintaining the operational stability, availability, and performance of our production application flows. Your primary responsibilities will include analyzing and troubleshooting production application flows to ensure seamless service delivery, participating in problem management to enhance operational stability and availability, monitoring production environments for anomalies, and communicating effectively with stakeholders to address and resolve issues promptly. You will also be expected to identify trends and provide support for incidents, problems, and changes related to full stack technology systems, applications, or infrastructure. This role may involve providing on-call coverage during weekends to ensure continuous operational support. The ideal candidate for this position should have at least 2 years of experience working with Data/Python applications in a Production environment. Proficiency in programming or scripting language, particularly Python, is required. Experience with containers and container orchestration (such as Kubernetes), orchestration tools (like Control-M), cloud platforms (specifically AWS) with infrastructure provisioning using Terraform, as well as exposure to observability and monitoring tools, will be beneficial. Strong communication and collaboration skills are essential for effective engagement in a fast-paced, dynamic environment. Additionally, preferred qualifications include experience supporting applications on platforms like Databricks, Snowflake, or AWS EMR (with Databricks being preferred), a proactive approach to self-education and evaluation of new technologies, and knowledge of virtualization, cloud architecture, services, and automated deployments.,

Posted 1 month ago

Apply

10.0 - 15.0 years

0 Lacs

chennai, tamil nadu

On-site

As a Staff Software Engineer specializing in Java at Walmart Global Tech in Chennai, you will play a crucial role in guiding the team in making architectural decisions and best practices for building scalable applications. Your responsibilities will include driving design, development, implementation, and documentation of cutting-edge solutions that impact associates of Walmart globally. You will collaborate with engineering teams across different locations, engage with Product Management and Business to drive product agendas, and work closely with architects to ensure solutions meet Quality, Cost, and Delivery standards. With a Bachelor's/Master's degree in Computer Science or a related field and a minimum of 10 years of experience in software design, development, and automated deployments, you will bring valuable expertise to the team. Your prior experience in delivering highly scalable Java applications, strong system design skills, and proficiency in CS fundamentals, Microservices, Data Structures, and Algorithms will be essential for success in this role. You should have hands-on experience with CICD development environments and tools like Git, Maven, and Jenkins, as well as expertise in writing modular and testable code using frameworks such as JUnit and Mockito. Your experience in building Java-based backend systems, working with cloud-based solutions, and familiarity with technologies like Spring Boot, Kafka, and Spark will be crucial. Additionally, you should be well-versed in microservices architecture, distributed concepts, design patterns, and cloud-native development. Your experience with relational and NoSQL databases, caching technologies, event-based systems like Kafka, monitoring tools like Prometheus and Splunk, and containerization tools like Docker and Kubernetes will be highly valuable. At Walmart Global Tech, you will have the opportunity to work in an innovative environment where your contributions can impact millions of people. The company values diversity, inclusion, and belonging, and offers a flexible, hybrid work environment along with competitive compensation, benefits, and opportunities for personal and professional growth. As an Equal Opportunity Employer, Walmart fosters a workplace culture where every individual is respected and valued, contributing to a welcoming and inclusive environment for all associates, customers, and suppliers.,

Posted 1 month ago

Apply

2.0 - 3.0 years

2 - 3 Lacs

Bengaluru

Hybrid

Life on the team Its an exciting opportunity for a technical role in Computacenters dynamic and rapidly expanding Network team. You will bring your IT experiences and deliver the quality of services to our customers across the globe and will help us to shape the team. Youll get to work with some of the most talented and passionate people in the business and get exposure to leading edge technologies which will enable you to advance your skills. Once you started journey in Computacenter, you’ll get to know the work lifestyle and culture of Computacenter in no time. What you’ll do Remote management - Perform a range of technical work activities remotely to meet business and customer requirements. You will be working on industry leading technology and products including Cisco, HP, Aruba, etc. You will be working on monitoring tools including Spectrum, NNMi, Solarwinds, HP IMC. Ticketing tools like Service-Now. Provide quality of service to our customers for the ticket assigned to you. Communicate effectively to the customers on the ticket you are working to avoid escalations and improve customer satisfaction levels. You will be working with dynamic team, who supports our customers 24/7. Maintain SLA performance targets. Understand and adhere to compliance policies and procedures. Keep skills up to date with IT industry standards as appropriate to the role/contract. What you’ll need 2+ years industry experience in a network operations/support environment. • Cisco CCNA or equivalent. • Technical ability to provide 1st level network monitoring. • Technical ability to provide basic fault finding and fault escalation. • Monitoring ticket queue and resolving issues within defined SLA • Working with global team to resolve issues. • Monitoring connectivity and troubleshooting • Good understanding of incident management activities. • Good ability to follow and documented operational processes. • Strong experience with ticket management and change request management systems. • Strong experience with network monitoring / management tools/ analysis tools. • Experience with liaising with carriers and vendors for faults; and • Good communications and interpersonal skills. • A self-starter able to work independently but comfortable working in a team environment. • Good analytical and problem-solving skills.

Posted 1 month ago

Apply

1.0 - 5.0 years

2 - 5 Lacs

Chennai

Work from Office

We are seeking a skilled DevOps Engineer to manage and optimize the cloud infrastructure, deployment pipelines, and operational tooling for Bytize . You will work closely with development, QA, and product teams to ensure rapid, secure, and scalable delivery of services. Key Responsibilities Design, implement, and maintain CI/CD pipelines using tools like Jenkins, GitHub Actions. Containerize microservices and manage deployments using Docker and Kubernetes (EKS/AKS) . Manage cloud infrastructure (preferably AWS, Azure, or GCP) using Infrastructure as Code (IaC) tools such as Terraform or ARM templates. Ensure high availability , scalability , and monitoring using tools like Prometheus, Grafana, ELK stack. Implement and enforce DevSecOps practices security scanning, vulnerability assessment, secrets management. Set up automated testing and deployment strategies across staging and production environments. Monitor and troubleshoot infrastructure and deployment issues proactively. Support disaster recovery planning and failover automation. Required Skills & Qualifications Bachelor's in Computer Science, Engineering, or related field. Strong experience with CI/CD tools (e.g., Jenkins, GitHub Actions). Proficiency in Docker , Kubernetes , and container orchestration. Experience with at least one cloud provider ( AWS, Azure, GCP ). Expertise in IaC tools (Terraform, Ansible, Bicep, etc.). Familiarity with monitoring/logging tools : ELK, Prometheus/Grafana, CloudWatch. Experience with Git and branching strategies (GitFlow, trunk-based). Scripting skills (Bash, Python, or similar). Working knowledge of networking, security best practices , and performance tuning . Strong communication and collaboration skills.

Posted 1 month ago

Apply

4.0 - 8.0 years

15 - 25 Lacs

Bengaluru

Work from Office

Job Summary: We are looking for a skilled Apache Solr Engineer to design, implement, and maintain scalable and high-performance search solutions. The ideal candidate will have hands-on experience with Solr/SolrCloud, strong analytical skills, and the ability to work in cross-functional teams to deliver efficient search functionalities across enterprise or customer-facing applications. Experience: 4–8 years Roles and Responsibilities Key Responsibilities: Design, develop, and maintain enterprise-grade search solutions using Apache Solr and SolrCloud . Develop and optimize search indexes and schema based on use cases like product search, document search, or order/invoice search. Integrate Solr with backend systems, databases and APIs. Implement full-text search , faceted search , auto-suggestions , ranking , and relevancy tuning . Optimize search performance, indexing throughput, and query response time. Ensure data consistency and high availability using SolrCloud and Zookeeper (cluster coordination & configuration management). Monitor search system health and troubleshoot issues in production. Collaborate with product teams, data engineers, and DevOps teams for smooth delivery. Stay up to date with new features of Apache Lucene/Solr and recommend improvements. Required Skills & Qualifications: Strong experience in Apache Solr & SolrCloud Good understanding of Lucene , inverted index , analyzers , tokenizers , and search relevance tuning . Proficient in Java or Python for backend integration and development. Experience with RESTful APIs , data pipelines, and real-time indexing. Familiarity with Zookeeper , Docker , Kubernetes (for SolrCloud deployments). Knowledge of JSON , XML , and schema design in Solr. Experience with log analysis , performance tuning , and monitoring tools like Prometheus/Grafana is a plus. Exposure to e-commerce or document management search use cases is an advantage. Preferred Qualifications: Bachelor’s or Master’s degree in Computer Science, Engineering, or related field. Experience with Elasticsearch or other search technologies is a plus. Working knowledge of CI/CD pipelines and cloud platforms ( Azure).

Posted 1 month ago

Apply

7.0 - 8.0 years

9 - 10 Lacs

Noida

Remote

Job Title : Azure DevOps Architect (7+ years) Zorba Consulting India is a leading consultancy firm dedicated to providing top-notch IT solutions and consulting services. Our mission is to empower businesses through innovative technology and operational excellence. We pride ourselves on our diverse team of experts who are passionate about delivering quality service and driving success for our clients. At Zorba, we value collaboration, integrity, and continuous improvement, striving to create a positive impact in the industry. As we expand our operations, we are looking for a skilled Azure DevOps Architect to join our remote team in India. Role Responsibilities : - Design and implement scalable Azure DevOps solutions. - Develop Continuous Integration and Continuous Deployment (CI/CD) pipelines. - Automate infrastructure provisioning using Infrastructure as Code (IaC) practices. - Collaborate with software development teams to enhance product delivery. - Monitor system performance and optimize resource utilization. - Ensure application security and compliance with industry standards. - Lead DevOps transformations and best practices implementation. - Provide technical guidance and support to cross-functional teams. - Identify and resolve technical issues and bottlenecks. - Document and maintain architecture designs and deployment procedures. - Stay updated with the latest technologies and advancements in Azure. - Facilitate training sessions for team members on DevOps tools. - Engage with stakeholders to gather requirements and feedback. - Participate in planning and estimation activities for projects. - Contribute to a culture of continuous improvement and innovation. Qualifications : - Bachelor's degree in Computer Science, Information Technology, or related field. - Minimum of 7 years of experience in DevOps engineering. - Proven experience with Azure DevOps tools and services. - Strong knowledge of CI/CD tools such as Azure Pipelines, Jenkins, or GitLab CI. - Experience with Infrastructure as Code tools such as Terraform or ARM Templates. - Hands-on experience with containerization technologies like Docker and Kubernetes. - Solid understanding of cloud architecture and deployment strategies. - Proficiency in scripting languages such as PowerShell, Bash, or Python. - Familiarity with Agile methodologies and practices. - Experience with monitoring tools like Azure Monitor or Grafana. - Excellent communication and collaboration skills. - Strong analytical and problem-solving abilities. - Ability to work independently in a remote team environment. - Certifications in Azure (e.g., Azure Solutions Architect Expert) are a plus. - A background in software development is advantageous.

Posted 1 month ago

Apply

10.0 - 12.0 years

11 - 15 Lacs

Gurugram

Work from Office

About the Role : We are seeking an experienced and highly skilled Senior AWS Engineer with over 10 years of professional experience to join our dynamic and growing team. This is a fully remote position, requiring strong expertise in serverless architectures, AWS services, and infrastructure as code. You will play a pivotal role in designing, implementing, and maintaining robust, scalable, and secure cloud solutions. Key Responsibilities : - Design & Implementation : Lead the design and implementation of highly scalable, resilient, and cost-effective cloud-native applications leveraging a wide array of AWS services, with a strong focus on serverless architecture and event-driven design. - AWS Services Expertise : Architect and develop solutions using core AWS services including AWS Lambda, API Gateway, S3, DynamoDB, Step Functions, SQS, AppSync, Amazon Pinpoint, and Cognito. - Infrastructure as Code (IaC) : Develop, maintain, and optimize infrastructure using AWS CDK (Cloud Development Kit) to ensure consistent, repeatable, and version-controlled deployments. Drive the adoption and implementation of CodePipeline for automated CI/CD. - Serverless & Event-Driven Design : Champion serverless patterns and event-driven architectures to build highly efficient and decoupled systems. - Cloud Monitoring & Observability : Implement comprehensive monitoring and observability solutions using CloudWatch Logs, X-Ray, and custom metrics to proactively identify and resolve issues, ensuring optimal application performance and health. - Security & Compliance : Enforce stringent security best practices, including the establishment of robust IAM roles and boundaries, PHI/PII tagging, secure configurations with Cognito and KMS, and adherence to HIPAA standards. Implement isolation patterns and fine-grained access control mechanisms. - Cost Optimization : Proactively identify and implement strategies for AWS cost optimization, including S3 lifecycle policies, leveraging serverless tiers, and strategic service selection (e.g., evaluating Amazon Pinpoint vs. SES based on cost-effectiveness). - Scalability & Resilience : Design and implement highly scalable and resilient systems incorporating features like auto-scaling, Dead-Letter Queues (DLQs), retry/backoff mechanisms, and circuit breakers to ensure high availability and fault tolerance. - CI/CD Pipeline : Contribute to the design and evolution of CI/CD pipelines, ensuring automated, efficient, and reliable software delivery. - Documentation & Workflow Design : Create clear, concise, and comprehensive technical documentation for architectures, workflows, and operational procedures. - Cross-Functional Collaboration : Collaborate effectively with cross-functional teams, including developers, QA, and product managers, to deliver high-quality solutions. - AWS Best Practices : Advocate for and ensure adherence to AWS best practices across all development and operational activities. Required Skills & Experience : of hands-on experience as an AWS Engineer or similar role. - Deep expertise in AWS Services : Lambda, API Gateway, S3, DynamoDB, Step Functions, SQS, AppSync, CloudWatch Logs, X-Ray, EventBridge, Amazon Pinpoint, Cognito, KMS. - Proficiency in Infrastructure as Code (IaC) with AWS CDK; experience with CodePipeline is a significant plus. - Extensive experience with Serverless Architecture & Event-Driven Design. - Strong understanding of Cloud Monitoring & Observability tools : CloudWatch Logs, X-Ray, Custom Metrics. - Proven ability to implement and enforce Security & Compliance measures, including IAM roles boundaries, PHI/PII tagging, Cognito, KMS, HIPAA standards, Isolation Pattern, and Access Control. - Demonstrated experience with Cost Optimization techniques (S3 lifecycle policies, serverless tiers, service selection). - Expertise in designing and implementing Scalability & Resilience patterns (auto-scaling, DLQs, retry/backoff, circuit breakers). - Familiarity with CI/CD Pipeline Concepts. - Excellent Documentation & Workflow Design skills. - Exceptional Cross-Functional Collaboration abilities. - Commitment to implementing AWS Best Practices.

Posted 1 month ago

Apply

5.0 - 10.0 years

3 - 6 Lacs

Noida

Work from Office

We are seeking a highly skilled Kafka Integration Specialist to join our team. The ideal candidate will have extensive experience in designing, developing, and integrating Apache Kafka solutions to support real-time data streaming and distributed systems. Key Responsibilities : - Design, implement, and maintain Kafka-based data pipelines. - Develop integration solutions using Kafka Connect, Kafka Streams, and other related technologies. - Manage Kafka clusters, ensuring high availability, scalability, and performance. - Collaborate with cross-functional teams to understand integration requirements and deliver robust solutions. - Implement best practices for data streaming, including message serialization, partitioning, and replication. - Monitor and troubleshoot Kafka performance, latency, and security issues. - Ensure data integrity and implement failover strategies for critical data pipelines. Required Skills : - Strong experience in Apache Kafka (Kafka Streams, Kafka Connect). - Proficiency in programming languages like Java, Python, or Scala. - Experience with distributed systems and data streaming concepts. - Familiarity with Zookeeper, Confluent Kafka, and Kafka Broker configurations. - Expertise in creating and managing topics, partitions, and consumer groups. - Hands-on experience with integration tools such as REST APIs, MQ, or ESB. - Knowledge of cloud platforms like AWS, Azure, or GCP for Kafka deployment. Nice to Have : - Experience with monitoring tools like Prometheus, Grafana, or Datadog. - Exposure to DevOps practices, CI/CD pipelines, and infrastructure automation. - Knowledge of data serialization formats like Avro, Protobuf, or JSON. Qualifications : - Bachelor's degree in Computer Science, Information Technology, or related field. - 4+ years of hands-on experience in Kafka integration projects.

Posted 1 month ago

Apply

5.0 - 10.0 years

3 - 6 Lacs

Pune

Work from Office

We are seeking a highly skilled Kafka Integration Specialist to join our team. The ideal candidate will have extensive experience in designing, developing, and integrating Apache Kafka solutions to support real-time data streaming and distributed systems. Key Responsibilities : - Design, implement, and maintain Kafka-based data pipelines. - Develop integration solutions using Kafka Connect, Kafka Streams, and other related technologies. - Manage Kafka clusters, ensuring high availability, scalability, and performance. - Collaborate with cross-functional teams to understand integration requirements and deliver robust solutions. - Implement best practices for data streaming, including message serialization, partitioning, and replication. - Monitor and troubleshoot Kafka performance, latency, and security issues. - Ensure data integrity and implement failover strategies for critical data pipelines. Required Skills : - Strong experience in Apache Kafka (Kafka Streams, Kafka Connect). - Proficiency in programming languages like Java, Python, or Scala. - Experience with distributed systems and data streaming concepts. - Familiarity with Zookeeper, Confluent Kafka, and Kafka Broker configurations. - Expertise in creating and managing topics, partitions, and consumer groups. - Hands-on experience with integration tools such as REST APIs, MQ, or ESB. - Knowledge of cloud platforms like AWS, Azure, or GCP for Kafka deployment. Nice to Have : - Experience with monitoring tools like Prometheus, Grafana, or Datadog. - Exposure to DevOps practices, CI/CD pipelines, and infrastructure automation. - Knowledge of data serialization formats like Avro, Protobuf, or JSON. Qualifications : - Bachelor's degree in Computer Science, Information Technology, or related field. - 4+ years of hands-on experience in Kafka integration projects.

Posted 1 month ago

Apply

1.0 - 4.0 years

4 - 7 Lacs

Pune

Work from Office

Job Summary: We are seeking a proactive and detail-oriented Site Reliability Engineer (SRE) focused on Monitoring to join our observability team. The candidate will be responsible for ensuring the reliability, availability, and performance of our systems through robust monitoring, alerting, and incident response practices. Key Responsibilities: Monitor Application, IT infrastructure environment Drive the end-to-end incident response and resolution Design, implement, and maintain monitoring and alerting systems for infrastructure and applications. Continuously improve observability by integrating logs, metrics, and traces into a unified monitoring platform. Collaborate with development and operations teams to define and track SLIs, SLOs, and SLAs. Analyze system performance and reliability data to identify trends and potential issues. Participate in incident response, root cause analysis, and post-mortem documentation. Automate repetitive monitoring tasks and improve alert accuracy to reduce noise. Required Skills & Qualifications: 2+ years of experience in application/system monitoring, SRE, or DevOps roles. Proficiency with monitoring tools such as Prometheus, Grafana, ELK, APM, Nagios, Zabbix, Datadog, or similar. Strong scripting skills (Python, Bash, or similar) for automation. Experience with cloud platforms (AWS, Azure) and container orchestration (Kubernetes). Solid understanding of Linux/Unix systems and networking fundamentals. Excellent problem-solving and communication skills.

Posted 1 month ago

Apply

0.0 - 2.0 years

2 - 3 Lacs

Mumbai

Work from Office

Monitor systems via GCP tools (Stackdriver, Logging) Use Linux for log analysis & health checks Run SQL queries for DB validation Generate infra/service health reports Work with tools like Grafana, Splunk Escalate issues with clear documentation

Posted 1 month ago

Apply

5.0 - 9.0 years

13 - 18 Lacs

Bengaluru

Hybrid

SPECIFIC ASSIGNMENTS : As part of the Food, Feed & Agro Testing Europe Operations L2 Team you will act as primary contact to support infrastructure regarding: Windows Server configuration and operational support Backup configuration Monitoring alert troubleshooting Identity and Access Management queries Applications related problems requiring triage to respective application support group. Main activities, but not limited to, are: Maintain current IT User Zones and IT Application Availability with Azure and local infrastructure. Maintain IT Infrastructure Transformation / Integration / Segregation throughout Food, Feed & Agro Testing businesses across Europe. Work closely with Monitoring, Cloud, Compute, Network and Application team members do deliver the IT Infrastructure as wider project. Act based on the end-to-end view of the infrastructure. Own the incoming requests and manages escalations. Support in creation of operational documents, communications and processes. REQUIRED SKILLS : An ideal candidate should have strong 5+ years experience in all of the below: Experience in server (virtual & physical) infrastructure design and transitions Windows Server 2016/2019/2022 deployment & support, file share, print server Veeam Backup Monitoring tools Very good command of English written and spoken Below skills will be an advantage: Automation with PowerShell, Ansible etc. Service Now Ticketing usage ITIL Foundation certificate Site 24x7 Monitoring tool

Posted 1 month ago

Apply
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies