Jobs
Interviews

1162 Prometheus Jobs - Page 23

Setup a job Alert
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

6.0 - 9.0 years

10 - 15 Lacs

Ahmedabad

Remote

Design, build, and maintain scalable CI/CD pipelines using tools like Jenkins, GitLab CI, CircleCI, or GitHub Actions Automate infrastructure provisioning using tools such as Terraform, Ansible CloudFormation. Manage platforms like Kubernetes or ECS.

Posted 4 weeks ago

Apply

7.0 - 12.0 years

30 - 35 Lacs

Gurugram

Remote

Role: Site Reliability Engineer Shift : 5 Days working ( Fixed Shift, 5 days working:- 7 pm to 4 am IST) Location: Remote Work From Home Rackspace is building up its Professional Services Center of Excellence on Application Performance Monitoring Suites. If you enjoy solving complex business problems and can contribute to building next generation of modern applications for our customers helping them understand the connections between application performance, user experience and business outcomes creating amazing customer experiences, with modern interpretations of SRE, Observability using Datadog, New Relic, AppDynamics or Dynatrace, working with their suite of products and integrations, then join us! Rackspace enables businesses to accelerate digital transformation through our innovative data, integration solutions tools that help you fix problems quickly, maintain complex systems and improve code. We believe Datadog, AppDynamics or New Relicwill be a large contributor to what we do, and we want talented, creative, and thoughtful individuals to join our team to shape Observability Engineering for our customers. Key Responsibilities Work with customers and implement Observability solutions Build and maintain scalable systems and robust automation that supports engineering goals. Develop and maintain monitoring tools, alerts, and dashboards to provide visibility into system health and performance Proactively gather and analyze both metric and log data from systems and applications to perform anomaly detection, performance tuning, capacity planning and fault isolation. Collaborate with development teams to implement and deploy new features and enhancements, ensuring they meet reliability, security and performance standards Collaborate with team members to document and share solutions Maintain a deep understanding of the customers business as well as their technical environment Identifying performance bottlenecks, identifying anomalous system behavior, and resolving root cause of service issues Required Bachelors degree in engineering/computer science or equivalent Experience Range: 7 to 14 yrs Senior-level experience with Site Reliability Engineering, DevOps, Code level application support and troubleshooting, AWS Infrastructure design, implementation and optimization, Automation for deployment, scaling and reliability. Experience with observability solutions tools like Splunk, Datadog, SignalFx, etc. Experience deploying, maintaining and supporting software applications/services in the AWS ecosystem Proactive approach to identifying problems and solutions Experience writing code with one or more interpreted languages such as Python, PHP, Perl, Ruby,Linux Shell Experience with Terraform or Cloud Formation scripting Experience with configuration management tools like Ansible, Chef or Puppet Experience with standard software development best practices and tools such as code repositories (Git preferred) Experience executing in an agile software development environment Good understanding of pricing/cost models across AWS services, especially compute, storage, and database offerings A clear understanding of network & system Management solutions Excellent organizational and project management skills Excellent communication, critical thinking & analytical skills

Posted 1 month ago

Apply

2.0 - 4.0 years

10 - 14 Lacs

Pune

Hybrid

So, what’s the role all about? Seeking a skilled and experienced DevOps Engineer in designing, producing, and testing high-quality software that meets specified functional and non-functional requirements within the time and resource constraints given. How will you make an impact? Design, implement, and maintain CI/CD pipelines using Jenkins to support automated builds, testing, and deployments. Manage and optimize AWS infrastructure for scalability, reliability, and cost-effectiveness. To streamline operational workflows and develop automation scripts and tools using shell scripting and other programming languages. Collaborate with cross-functional teams (Development, QA, Operations) to ensure seamless software delivery and deployment. Monitor and troubleshoot infrastructure, build failures, and deployment issues to ensure high availability and performance. Implement and maintain robust configuration management practices and infrastructure-as-code principles. Document processes, systems, and configurations to ensure knowledge sharing and maintain operational consistency. Performing ongoing maintenance and upgrades (Production & non-production) Occasional weekend or after-hours work as needed Have you got what it takes? Experience: 2-4 years in DevOps or a similar role. Cloud Expertise: Proficient in AWS services such as EC2, S3, RDS, Lambda, IAM, CloudFormation, or similar. CI/CD Tools: Hands-on experience with Jenkins pipelines (declarative and scripted). Scripting Skills: Proficiency in either shell scripting or powershell Programming Knowledge: Familiarity with at least one programming language (e.g., Python, Java, or Go). Version Control: Experience with Git and Git-based workflows. Monitoring Tools: Familiarity with tools like CloudWatch, Prometheus, or similar. Problem-solving: Strong analytical and troubleshooting skills in a fast-paced environment. CDK Knowledge in AWS DevOps. You will have an advantage if you also have: Development experience is a significant advantage. Windows system administration is a significant advantage. Experience with monitoring and log analysis tools is an advantage. Jenkins pipeline knowledge What’s in it for you? Join an ever-growing, market disrupting, global company where the teams – comprised of the best of the best – work in a fast-paced, collaborative, and creative environment! As the market leader, every day at NiCE is a chance to learn and grow, and there are endless internal career opportunities across multiple roles, disciplines, domains, and locations. If you are passionate, innovative, and excited to constantly raise the bar, you may just be our next NiCEr! Enjoy NiCE-FLEX! At NiCE, we work according to the NiCE-FLEX hybrid model, which enables maximum flexibility: 2 days working from the office and 3 days of remote work, each week. Naturally, office days focus on face-to-face meetings, where teamwork and collaborative thinking generate innovation, new ideas, and a vibrant, interactive atmosphere. Reporting into: Tech Manager Role Type: Individual Contributor

Posted 1 month ago

Apply

3.0 - 8.0 years

5 - 10 Lacs

Hyderabad, Pune, Bengaluru

Work from Office

Project description The project is in Treasury Domain, which is supported by the IT team. The platform operates 24/7, supporting teams in Sydney. The platform undergoes constant change as it provides services to a number of business stakeholders. Our team is composed of engineers and technology leaders who bring in the right mix of skills to enable this transformation. We also work very closely with our business and operations colleagues to support these services, which are critical to the Australian and Global economy. Our team is also responsible also to drive Engineering Governance, Continuous Delivery, and key technological simplification pillars such as Cloud and payments-based architecture. Responsibilities Design and implement automation for production support activities, including alert triage and resolution flows. Reduce testing cycle time from seven to three weeks through creative automation strategies. Explore and implement deployment automation and streamline QRM-related alerts. Build intuitive UIs for visualizing product and support status, if needed. Partner with existing team members to reverse-engineer domain knowledge into reusable automation components. Identify repetitive manual tasks and apply DevOps practices to eliminate them. SkillsMust have Minimum 3+ years of experience as a DevOps Automation Engineer AWS EC2, Docker, Kubernetes Locations-PUNE,BANGALORE,HYDERABAD,CHENNAI,NOIDA PowerShell, Python (or similar scripting tools) Microservices & RESTful API development ASP.NET Core, C#, or alternatively Java/Python with React CI/CD toolsTeamCity, GitHub Actions, Artifactory, Octopus Deploy (or equivalents) Highly Preferred Instrumentation & observability toolsGrafana, Prometheus, Splunk, Dynatrace, or AppDynamics Nice to have Exposure to security tools like SonarQube, Checkmarx, or Snyk Familiarity with AI for business process automation Basic SQL and schema design (MSSQL or Oracle)

Posted 1 month ago

Apply

5.0 - 10.0 years

7 - 12 Lacs

Bengaluru

Work from Office

Project description Institutional Banking Data Platform (IDP) is state-of-the-art cloud platform engineered to streamline data ingestion, transformation, and data distribution workflows that underpin Regulatory Reporting, Market Risk, Credit Risk, Quants, and Trader Surveillance. In your role as Software Engineer, you will be responsible for ensuring the stability of the platform, performing maintenance and support activities, and driving innovative process improvements that add significant business value. Responsibilities Problem solving advanced analytical and problem-solving skills to analyse complex information for key insights and present as meaningful information to senior management Communication excellent verbal and written communication skills with the ability to lead discussions with a varied stakeholder across levels Risk Mindset You are expected to proactively identify and understand, openly discuss, and act on current and future risks SkillsMust have Bachelor's degree in computer science, Engineering, or a related field/experience. 5+ years of proven experience as a Software Engineer or similar role, with a strong track record of successfully maintaining and supporting complex applications. Strong hands-on experience with Ab Initio GDE, including Express>It, Control Centre, Continuous>flow. Should have handled and worked with XML, JSON, and Web API. Strong hands-on experience in SQL. Hands-on experience in any shell scripting language. Experience with Batch and streaming-based integrations. Nice to have Knowledge of CI/CD tools such as TeamCity, Artifactory, Octopus, Jenkins, SonarQube, etc. Knowledge of AWS services including EC2, S3, CloudFormation, CloudWatch, RDS and others. Knowledge of Snowflake and Apache Kafka is highly desirable. Experience with configuration management and infrastructure-as-code tools such as Ansible, Packer, and Terraform. Experience with monitoring and observability tools like Prometheus/Grafana.

Posted 1 month ago

Apply

8.0 - 13.0 years

10 - 15 Lacs

Bengaluru

Work from Office

Project description We are seeking a highly skilled and motivated DevOps Engineer with 8+ years of experience to join our engineering team. You will work in a collaborative environment, automating and streamlining processes related to infrastructure, development, and deployment. As a DevOps Specialist, you will help implement and manage CI/CD pipelines, configure on-prem Windows OS infrastructure, and ensure the reliability and scalability of our systems. The system is on Windows with Microsoft SQL. Responsibilities CI/CD Pipeline ManagementDesign from scratch, implement, and manage automated build, test, and deployment pipelines to ensure smooth code integration and delivery. Infrastructure as Code (IaC)Develop and maintain infrastructure using tools for automated provisioning and management. System Monitoring & MaintenanceSet up monitoring systems for production and staging environments, analyze system performance, and provide solutions to increase efficiency. Deploy and manage configuration using fit-to-purpose tools and scripts with version controls, CI, etc. CollaborationWork closely with software developers, QA teams, and IT staff to define, develop, and improve DevOps processes and solutions. Automation & ScriptingCreate and maintain custom scripts to automate manual processes for deployment, scaling, and monitoring. SecurityImplement security practices and ensure compliance with industry standards and regulations related to cloud infrastructure. Troubleshooting & Issue ResolutionDiagnose and resolve issues related to system performance, deployments, and infrastructure. Drive DevOps thought leadership and delivery experience to the offshore client delivery team. Implement DevOps best practices based on developed patterns. SkillsMust have Total 9 to 12 years of experience as a DevOps Engineer 3+ years of experience in AWS Excellent knowledge of DevOps toolchains like GitHub Actions /GitHub Co-pilot Self-starter, capable of driving solutions from 0 to 1 and able to deliver projects from scratch Familiarity with containerization and orchestration tools (Docker, Kubernetes) Working understanding of platform security constructs Good exposure to Monitoring tools/Dashboards like Grafana, Obstack, or similar monitoring solutions Experience of working with Jira, Agile SDLC practices Expert knowledge of CI/CD Excellent written and verbal communication skills, strong collaboration, and teamwork skills Proficient in scripting languages like Python and PowerShell, and Database knowledge of MS SQL Experience with Windows or IIS, including installation, configuration, and maintenance Strong troubleshooting skills, with the ability to think critically, work under pressure, and resolve complex issues Excellent communication skills with the ability to work cross-functionally with development, operations, and IT teams Security Best PracticesKnowledge of security protocols, network security, and compliance standards Adaptability to new learning and strong attention to detail with a proactive approach to identifying issues before they arise Nice to have Cloud CertificationsAWS Certified DevOps Engineer, Google Cloud Professional DevOps Engineer, or equivalent. IAC pipelines and best practice Snyk, sysdiag knowledge Worked on windows OS, SRE, monitoring on Prometheus

Posted 1 month ago

Apply

8.0 - 13.0 years

10 - 15 Lacs

Bengaluru

Work from Office

As an Sr. Engineer II for the Scale & Performance team, you will play a critical role in ensuring the scalability, performance, and reliability of HashiCorp’s cloud and enterprise offerings. Your work will be central to enhancing system resilience, optimizing performance at scale, and ensuring HashiCorp’s products deliver high availability in dynamic cloud environments. Your experience in Performance engineering, or systems engineering, or reliability engineering or a related field, you will lead efforts to identify performance bottlenecks, address, and mitigate operational challenges before they impact our customers. Your expertise in load testing, performance analysis, and system hardening will ensure that our services meet the highest standards of scale and performance excellence. You’ll have the opportunity to dive deep into the architecture of HashiCorp’s products, including both our cloud and enterprise offerings. You’ll take ownership of building and maintaining an advanced automation framework that powers ephemeral, scalable environments, enabling controlled scaling efforts and performance regression testing. Your work will directly impact how we validate and optimize performance across our systems. From spinning up environments to scaling them dynamically and tearing them down on demand, you’ll own the end-to-end lifecycle of our test engines. Beyond that, you’ll play an important role in analysing results, creating insightful dashboards, and delivering actionable reports to help teams identify and resolve performance bottlenecks and throttling issues. What you’ll do (responsibilities) Implement best practices for system reliability, including proactive identification of potential failure points and the development of automated mitigations Design and execute comprehensive performance testing strategies to identify performance bottlenecks and scalability limits across our cloud products Work with the engineering teams to identify potential application and infrastructure bottlenecks and suggest changes. Work closely with engineering and product teams to integrate scale and performance readiness into the development lifecycle, enhancing product stability and user satisfaction. Build and refine tools and frameworks for automated testing, environment simulation, and incident reproduction, reducing manual effort and increasing test coverage. Conduct in-depth analysis of testing results, documenting findings and making actionable recommendations for system enhancements. Drive Systemic Improvements to the products by introducing Chaos Testing and partnering with product development teams. Share your knowledge and expertise with team members, fostering a culture of learning and continuous improvement. Develop and implement disaster recovery and backup strategies to ensure data integrity and system resilience. Required education Bachelor's Degree Required technical and professional expertise 8+ years of experience in performance engineering systems engineering, reliability engineering or non functional testing roles with a focus on performance testing, load testing or system scalability. Strong programming skills in Python / Golang and exposure to scripting languages like javascript or shell script Experience with version control systems such as Git. Strong experience with performance testing tools like K6, Artillery, Vegeta, Locust etc or similar tools for deriving key performance metrics for a product Proven track record of leading successful performance testing and optimization initiatives in cloud and on-prem environments. Experience in creating and managing test environments for automated testing. Experience in creating CI/CD pipelines and maintaining quality gates for system testing. Understanding of monitoring and observability tools such as Datadog or Prometheus to develop dashboards indicating metrics that accurately reflect system performance and load break points and regressions. Exposure to cloud technologies ( AWS, Azure, Or GCP) and container technologies like Nomad or Kubernetes and/Or working in a Hybrid cloud environment. Effective communication and collaboration skills, capable of working with cross-functional teams and articulating technical concepts to diverse audiences. Preferred technical and professional experience You have experience using HashiCorp products (Terraform, Packer, Waypoint, Nomad, Vault, Boundary, Consul). Experience with Javascript development / using any test framework based on Java script is a plus. Experience in driving systemic improvements through Chaos engineering is a plus. #LI-Hybrid

Posted 1 month ago

Apply

2.0 - 5.0 years

3 - 7 Lacs

Bengaluru

Work from Office

Review and implement functional business requirements and non-functional technical requirements Translate business requirements into technical design documents and drive implementation with developers Research and analyze new technologies to be used (e.g., Libraries, IDE’s, tools) Develop high-level architecture and detailed design for application stack – backend Assist engineering and operational teams in debugging critical production problems Perform application code review, ensure creation and maintenance of appropriate artifacts for architecture and design work Develop back-end portions of web services. You will primarily focus on building backend REST API services. Work to implement server-side or application logic and design architectures. Create and talk to REST services. Shift between multiple projects and technologies. Write clean code and test it throughout the development process to ensure the quality is up to standards. Work on software that is used by millions of people all around the world is a challenge that you're willing to tackle. Perform peer reviews and mentor the team to evolve into backend developers. Encourage a self-motivated squad model of working from handling design, development, test and operations for the micro services. Required education Bachelor's Degree Required technical and professional expertise Kubernetes: Deep knowledge of Kubernetes architecture, pods, deployments, services, and persistent volumes. Storage Classes & Volumes: How Kubernetes manages persistent storage and snapshots. Networking Basics: Understanding Kubernetes networking Container Storage Interface (CSI): Familiarity with how storage plugins work in Kubernetes. CI/CD Pipelines: Integrating backup/restore into automation pipelines using Jenkins , GitHub action , travis etc. Scripting: Proficiency in Bash , Python , or Go for writing automation scripts. Disaster Recovery: Designing and implementing DR solutions for containerized environments. Data Replication: Understanding of synchronous and asynchronous replication techniques. Access Control: Implementing RBAC (Role-Based Access Control) in Kubernetes. Good to have: Compliance Knowledge: GDPR, HIPAA, or other data protection regulations relevant to backup data. Monitoring & Logging: Using tools like Prometheus , Grafana , ICD to monitor backup jobs and system health. Backup Tools: Experience with tools like Velero , Kasten K10 , Rsync , Restic , or Portworx for Kubernetes. Should have 5+ experience on Back end services development and Microservices Architecture. Proven experience implementing distributed applications in a container environment (Docker/Kubernetes) along with considerable experience configuring and administrating Linux (or other Unix-like) systems Software engineering experience designing Enterprise Cloud Applications with Go Lang, C, C++, Python etc., Proven experience in REST API development experience (APIs like REST / RESTful APIs). Expertise in defining business architecture, business process definition & modelling, use cases, and requirements definition, and associated best practice processes for defining these artifacts Proven proficiency in grasping requirements and building illustrative features with minimal specifications Experience working in agile development environments. Preferred Professional and Technical Expertise Understanding of Networking concepts and experience in Network development. Understanding of cloud storage concepts and experience in cloud storage development. Knowledge of security and compliance standards & requirements

Posted 1 month ago

Apply

10.0 - 12.0 years

20 - 30 Lacs

Pune

Work from Office

We have an urgent requirement for strong Lead Backend Developer with deep expertise in Node.js, Microservices architecture , and SaaS platforms . This is a high-priority position , and we need bench profile on immediate basis 1. Lead Backend Developer 10 - 12 Years Experience (with 70% hands-on development and 30% team management) Key Responsibilities: Design and develop scalable microservices for a cloud-native SaaS platform. Build and maintain robust RESTful APIs. Optimize performance, observability, and security in multi-tenant environments. Lead architecture discussions and conduct code reviews. Collaborate across teams to deliver high-quality features. Ensure best practices in coding standards, DevSecOps, and compliance. Must-Have Skills: Node.js (ES6+), Express.js Microservices, SaaS Architecture RESTful API Design & Security Message Brokers: Kafka / RabbitMQ / NATS SQL & NoSQL (PostgreSQL / MongoDB), Caching, Query Optimization AWS / Azure / GCP (S3, Pub/Sub, etc.) CI/CD, Docker, Kubernetes, Git Strong understanding of design patterns, clean code, scalability Excellent communication & documentation skills Preferred: Experience in insurance/financial domain Familiarity with Prometheus, Grafana, ELK stack API Gateway, RBAC/ABAC implementation

Posted 1 month ago

Apply

6.0 - 8.0 years

8 - 10 Lacs

Bengaluru

Work from Office

Key Roles & Responsibilities Min. 6 Years of experience into Design, implement, and maintain highly available, scalable,and secure infrastructure using Terraform (IaC). Manage and operate Kubernetes clusters (EKS) for microservices-based applications. Develop and maintain Docker containers and optimize build and deployment pipelines. Build and support AWS cloud infrastructure including EC2, S3, IAM, CloudWatch, andnetworking (VPC, ALB/NLB). Monitor infrastructure health and application performance using tools like Prometheus,Grafana, ELK, CloudWatch,. Handle incident response, root cause analysis, and postmortems for production issues. Collaborate with development teams to enforce CI/CD, automation, and SRE best practices. Implement and maintain observability, log aggregation, and alerting systems. Contribute to improving deployment reliability, latency, availability, and capacityplanning. Key Qualifications & Skill Sets Terraform: Hands-on experience writing reusable modules and managing multi-environmentdeployments. Kubernetes: Strong knowledge of cluster management, Helm charts, rolling updates, anddebugging services. Docker: Experience building, managing, and optimizing containerized applications. AWS: Proficiency with core services (EC2, VPC, S3, IAM, EKS,ELB preferred). Linux: Strong command-line and system-level troubleshooting skills. CI/CD: Experience with tools like Jenkins, or ArgoCD. Familiarity with scripting (Bash or Python preferred). Understanding of networking concepts, firewalls, DNS, TLS, load balancing. Preferred/Good to Have Experience with Karpenter, Service Mesh (Istio/Linkerd), or KEDA. Knowledge of PostgreSQL, Redis, Kafka,ELK in production. Exposure to security best practices: IAM, secrets management, TLS. Certification: AWS Certified SysOps/DevOps, CKA, or Terraform Associate.

Posted 1 month ago

Apply

2.0 - 7.0 years

7 - 14 Lacs

Bengaluru

Hybrid

Job Description Synchronoss Technologies (Nasdaq: SNCR), a global leader in personal Cloud solutions, empowers service providers to establish secure and meaningful connections with their subscribers. Our SaaS Cloud platform simplifies onboarding processes and fosters subscriber engagement, resulting in enhanced revenue streams, reduced expenses, and faster time-to-market. Millions of subscribers trust Synchronoss to safeguard their most cherished memories and important digital content. We are seeking a fulltime team member for our fantastic Network Operations Center in India. This is a great chance to join a fantastic Team with phenomenal career opportunities. We welcome Freshers and Returners to the workforce. Diversity is at the heart of who we are. We are seeking quick learners with exceptional communication skills. The ideal candidates must have attention to detail and thrive in the NOC arena and be open to innovation and change. The current team are a diverse mix of dynamic and enthusiastic professionals who enjoy fast paced career development and have fantastic opportunities to develop their career with us. The ideal candidates must be comfortable in an NOC environment, ready to troubleshoot client-side issues, monitor the health of our systems and applications and specific Transformation project activities as we continue to grow our Team and business. Our NOC Engineers play a critical role for Synchronoss in delivering highly scalable and reliable cloud platform solutions to our customers. The NOC is at the heart of our Customer Success and all we do. How you will help: Provide world class customer support to internal and external customers Monitor the networks to ensure its availability to all users Use your ability to manage multiple high priority issues under pressure Respond to automatically generated alarms produced by NOC monitoring software and by using tools and/or engaging Tier 2 & 3 support Facilitate Internal Bridge and send status updates during outages and severity 1 system issues Perform up checks on the various applications to ensure connectivity Monitor Facility Systems such as HVAC and power generators. Process and prioritize customer requests by phone, e-mail for service-related assistance applying to a variety of technologies Identify, diagnose and resolve problems of moderate complexity affecting network performance Follow prescribed trouble and operational procedures and aids in refining them to suit internal and customer needs Perform notifications and follows prescribed escalation procedures Document trouble tickets as actions are taken Record, update, and maintain outage notifications and announcements Shift based role that may include weekends Rotational support may be required Any other duties as determined to be needed by Management suiting the intent of the role Expected job duty breakdown: monitoring 80%, troubleshoot 5%, incident management 5%, training 5%, general administration 5% Who we have in mind: Working knowledge of Linux Basic knowledge of database preferably Oracle/Casandra database Basic knowledge of network elements. Excellent written, verbal communication skills and bridge handling capabilities Working knowledge of monitoring tools, including but not limited to Grafana (Prometheus, Thanos), SolarWinds, Zabbix Proactive production monitoring knowledge preferably for Cloud services, Analytics services Very good analytical skills and initial troubleshooting skills for most of production issues Very good documentation and coordination skills (both internally and externally) Flexible to work 10 hours shift based on company need Hands on experience in Incident Management, ITSM Problem Management (RCAs), CAB, CCT and change coordination/change management Working knowledge on Security alarms monitoring and log analysis (SOC) Very good presentation skills It would be great if you had: Associate degree; or equivalent experience and/or training; or, combination of both ITIL Foundation certified/experience in ITIL principles Assists the LAN Operations Support Technician team when required Assist SNCR SRE team with non-production application builds on REL (Redhat Enterprise Linux) environments. This includes bringing the application down, stopping web logic, deploying new code tags, restarting web logic, bringing up the application and then testing proper functionality Promethues, Thanos, Grafana administration which includes working with development teams to create and apply appropriate SQL expressions and triggers to properly alert stakeholders of potential problems to SNCR production environments. Experience: 2 to 5 years What we offer: Synchronoss is proud to be an equal opportunity employer. As a global company, we value and celebrate diversity and are committed to a workplace free from discrimination and harassment. We take pride in fostering an inclusive environment based on mutual respect and merit. We are at our best when our workforce is dynamic in thought, experience, skill set, race, age, gender, sexual orientation, sexual expression, national origin and beyond.

Posted 1 month ago

Apply

5.0 - 8.0 years

14 - 24 Lacs

Kochi

Hybrid

We are looking for someone who thrives in automation, system observability, and high-scale operations, while also supporting CI/CD and deployment pipelines. You will blend operational execution with engineering rigor to support system reliability, incident response, and automation at scale. This role provides a unique opportunity to grow into full-fledged SRE responsibilities while working in tight coordination with our global reliability strategy. Responsibilities: Maintain, standardize, and enhance CI/CD pipelines (GitHub Actions, Azure Pipelines, GitLab). Automate testing, deployment, and rollback processes. Champion end-to-end CI/CD workflow reliabilityincluding build validation, environment consistency, and deployment rollbacks. Deploy and manage observability tools (Datadog, Grafana, Prometheus, ELK). Assist in root cause analysis using telemetry and logs. Maintain alerting systems and participate in incident drills. Shadow and support Houston-based SRE team during follow-the-sun incident response. Create postmortem documentation for incidents and track remediation tasks. Develop scripts and tooling to reduce operational toil. Contribute to performance tuning of PostgreSQL and containerized services. Assist in distributed system optimization efforts (AKKA.NET knowledge is a bonus). Participate in rollout strategies, canary releases, and availability planning. Requirements: 5+ years in DevOps, SRE, or Infrastructure Engineering. Strong scripting ability (Python, Bash, PowerShell). Experience in managing Kubernetes clusters and container-based deployments. Working knowledge of SQL databases and performance optimization. Hands-on experience with CI/CD tools and source control systems (GitHub, GitLab). Exposure to monitoring and observability platforms (Datadog, Prometheus, ELK). Experience with incident management and postmortems. Familiarity with distributed systems (bonus: AKKA.NET or similar frameworks). Infrastructure as Code (Terraform) and GitOps practices. Exposure to global operations teams and 24/7 handover workflows.

Posted 1 month ago

Apply

7.0 - 12.0 years

20 - 35 Lacs

Chennai

Work from Office

Back-End Developer - Python Experience Range: 05 - 12 years Location of Requirement: Chennai Job Description Build and maintain the server-side logic, APIs, and databases that power web and mobile applications. Focus on performance, scalability, and data integrity. Desired Candidate Profile: Must-Have Skills: Python, SQL, Docker, Kubernetes Good-to-Have Skills: Scala, Clojure, Node.js, Typescript, Rest, GraphQL, Git, Terraform, Helm, Gitlab, OpenAPI, Oauth2.0 Soft Skills: Communication, English, Documentation Tools: Gitlab, Github, Datadog, JIRA, Prometheus, Grafana (nice-to-have)

Posted 1 month ago

Apply

7.0 - 12.0 years

20 - 35 Lacs

Chennai

Work from Office

Back-End Developer - C#/.NET Build and maintain the server-side logic, APIs, and databases that power web and mobile applications. Focus on performance, scalability, and data integrity. Experience Range: 05 - 12 years Location of Requirement: Chennai Desired Candidate Profile: Must-Have Skills: C#, .Net, SQL, Microservices, Docker, Kubernetes Good-to-Have Skills: Node.js, Typescript, Rest, GraphQL, Git, Terraform, Helm, Gitlab, OpenAPI, Oauth2.0 Soft Skills: Communication, English, documentation Tools: Gitlab, Github, Datadog, JIRA, Prometheus, Grafana (nice-to-have)

Posted 1 month ago

Apply

8.0 - 13.0 years

10 - 19 Lacs

Chennai

Work from Office

Did you know KONE moves two billion people every day? As a global leader in the elevator and escalator industry, we employ over 60,000 driven professionals in more than 60 countries worldwide joined together by a shared purpose, to shape the future of cities. In 2023, we had annual net sales of EUR 11.0 billion. Why this role? This role is responsible for designing, implementing, and maintaining the infrastructure and automation solutions that support the software development and delivery lifecycle. It involves applying best practices in CI/CD, cloud infrastructure, system monitoring, and automation to ensure faster, secure, and reliable deployments. They collaborate closely with development, QA, and operations teams to streamline build, test, and release processes, ensuring they are efficient, scalable, and resilient. By fostering a culture of collaboration and continuous improvement, the role contributes significantly to improving development speed, operational stability, and overall quality across the R&D organization. What will you be doing? Key Responsibilities: Design, implement, and maintain scalable CI/CD pipelines using Jenkins and GitLab CI. Deploy and manage containerized applications using Docker and Kubernetes in AWS cloud environments. Develop and support test automation tools and libraries, primarily using Python. Monitor, troubleshoot, and enhance Linux-based systems to ensure reliability and performance. Collaborate with development, QA, and infrastructure teams to streamline and optimize the software delivery lifecycle. Assist in server configuration management using Puppet. Implement security practices, including automated compliance and security checks. Required Qualifications: Bachelor's degree in Computer Science, Technology, Engineering, or a related field. 8+ years of hands-on experience in Python development, automation, and scripting. Proven expertise with Jenkins and GitLab CI/CD pipelines. Strong practical experience with AWS, Docker, and Kubernetes. Deep understanding of Linux system administration and shell scripting. Experience with infrastructure-as-code tools such as Puppet. Familiarity with monitoring, logging, and alerting tools (e.g., Prometheus, Grafana, ELK). Strong problem-solving skills with a proactive, self-driven approach. Excellent interpersonal and communication skills, with the ability to work across functional teams. Comfortable working in Agile environments with an emphasis on continuous improvement. Awareness of CI/CD security best practices and compliance requirements. What do we offer? Development and growth opportunities within a global organization. Warm and friendly international working environment. Being part of an industry leader in sustainability.

Posted 1 month ago

Apply

5.0 - 8.0 years

5 - 15 Lacs

Hyderabad

Work from Office

In Time Tec is an award-winning IT & software company. In Time Tec offers progressive software development services, enabling its clients to keep their brightest and most valuable talent focused on innovation. In Time Tec has a leadership team averaging 15 years in software/firmware R&D, and 20 years building onshore/offshore R&D teams. We are looking for rare talent to join us. People having a positive mindset and great organizational skills will be drawn to the position. Your capacity to take initiative and solve problems as they emerge, flexibility, and honesty, will be key factors for your success at In Time Tec. Job Overview: The position requires an experienced and ambitious candidate who is passionate about technology and is self-driven. We have a challenging workplace where we welcome innovative ideas and offers growth opportunities and positive environment for accomplishing goals. Our purpose is to create abundance for everyone we touch. Your primary responsibility will be to ensure the highest availability of services across all environments, including production. You will design, deploy, and manage computing servers and virtual operating environments running critical services such as virtual network routers and firewalls, DNS, DHCP, and LDAP. The role involves evaluating business requirements, implementing solutions, automating workflows, maintaining servers, tuning performance, implementing server monitoring, and performing traditional UNIX sysadmin tasks including security, troubleshooting, preventative maintenance, and SSL certificate management. In this role, you will: Deploy and maintain infrastructure to host critical applications Automate repetitive tasks to improve efficiency Engineer solutions to enhance security, availability, and efficiency of infrastructure resources Key Qualifications Minimum 5 years of hands-on experience as a systems and storage engineer Strong knowledge of Linux distributions (OEL, RHEL, CentOS) utilities and programs Expertise with version control systems (Git, SVN) and automated build/testing tools (Jenkins, Vagrant) Experience in deployment processes Familiarity with backend storage systems such as NetApp, Isilon, Object Storage, etc is a strong plus, especially in environments leveraging Kubernetes/EKS Experience with configuration management tools (e.g., Puppet, Ansible) is required Expertise with enterprise-grade x86 hardware platforms Proficient in virtualization technologies (KVM, containers) Experience managing applications on Kubernetes clusters Familiarity with monitoring and alerting tools like Grafana, Prometheus, New Relic, Splunk, Dynatrace, PagerDuty, or similar Experience automating workflows using Python, Ruby, or Go Strong knowledge of IPv6, DNS, and DHCP Knowledge of hardware tuning to meet specific performance goals is a plus Expert understanding of operating systems, networking stack, TCP/IP, Linux bridges, and network interface drivers is highly desired Understanding of systems/application security and encryption Strong debugging and problem-solving skills Excellent verbal and written communication skills Note: This is a remote opportunity, with the position based out of Hyderabad. How You’ll Grow at In Time Tec In Time Tec, has made significant investments to create a stimulating environment for its people to grow. We want each of our employees to grow in their way and play their roles while honing their ownership abilities. As part of those efforts, we provide our professionals with a range of educational opportunities to help them grow in their career. Our guiding principles of leadership, trust, transparency, and integrity serve as the foundation for everything we do and every success we achieve. We are proud of these fundamental principles since they demonstrate our dedication towards them as a “One Team”. We value every individual by giving them the freedom to make daily decisions that can support their health, well-being, confidence, and awareness. Our leadership team is there to offer the safe base by giving the right budding environment, instruction, tools, and chances necessary for your professional development in achieving your goals. Our people and culture work together in a collaborative environment, making In Time Tec a thriving place to work. You can find out more about Life at In Time Tec here.

Posted 1 month ago

Apply

4.0 - 6.0 years

10 - 20 Lacs

Bengaluru

Remote

Key Responsibilities: Manage and automate cloud infrastructure on AWS . Work with DevOps tools like Jenkins or Spinnaker , Terraform , Docker , and Kubernetes . Develop automation and integration using Java (mandatory) or Golang/Python . Implement and manage monitoring using Dynatrace , Splunk , Datadog , Prometheus , or Grafana (any one). Manage and optimize databases using DynamoDB (mandatory). Collaborate with cross-functional teams and ensure smooth operations. Communicate effectively with stakeholders. Requirements: 4-6 years of IT experience with 4+ years in AWS Cloud . Strong expertise in CI/CD, Infrastructure as Code, and container orchestration. Programming experience in Java (mandatory) or Golang/Python . Hands-on experience with at least one monitoring tool ( Dynatrace , Splunk , Datadog , Prometheus , or Grafana ). DynamoDB experience is mandatory. Excellent communication skills.

Posted 1 month ago

Apply

6.0 - 11.0 years

15 - 25 Lacs

Pune, Chennai, Bengaluru

Hybrid

Hiring: Python Developer with Grafana Expertise- Immediate Joiners Preferred Location - Pan India Start - July Joiners only Experience - 6+ years What We're Looking For: Were urgently hiring a Python Developer who also brings strong hands-on experience in Grafana dashboard development and data integration. You'll be part of a team building monitoring and observability solutions for applications developed in Python, GoLang, and Flutter — and visualizing key metrics using Grafana . Must-Have Skills: Strong experience in Python development (backend) Deep hands-on experience with Grafana (creating dashboards, data visualization) Working knowledge of data sources like: Prometheus , InfluxDB , Elasticsearch Experience integrating APIs or logs into dashboards Exposure to real-time monitoring and alerting systems Nice-to-Have Skills: Basic understanding of Core Java Experience with GoLang or Flutter apps (optional) Familiarity with CI/CD pipelines Perks: No client interview Quick onboarding Great project exposure in monitoring/observability space

Posted 1 month ago

Apply

2.0 - 3.0 years

3 - 3 Lacs

Greater Noida

Work from Office

Key Responsibilities: Diagnose and fix performance bottlenecks across backend services, WebSocket connections, and API response times. Investigate issues related to high memory usage, CPU spikes, and slow query execution. Debug and optimize database queries (PostgreSQL) and ORM (Prisma) performance. Implement and fine-tune connection pooling strategies for PostgreSQL and Redis. Configure and maintain Kafka brokers, producers, and consumers to ensure high throughput. Monitor and debug WebSocket issues like connection drops, latency, and reconnection strategies. Optimize Redis usage and troubleshoot memory leaks or blocking commands. Set up or maintain Prometheus + Grafana for service and infrastructure monitoring. Work on containerized infrastructure using Docker and Kubernetes, including load balancing and scaling services. Collaborate with developers to fix memory leaks, inefficient queries, and slow endpoints. Maintain high availability and fault tolerance across all backend components. Requirements: Technical Skills: Strong proficiency in Node.js and TypeScript. Deep knowledge of Prisma ORM and PostgreSQL optimization. Hands-on experience with Redis (pub/sub, caching, memory tuning). Solid understanding of WebSockets performance and reconnection handling. Experience working with Kafka (event streaming, partitions, consumer groups). Familiar with Docker, container lifecycle, and multi-service orchestration. Experience with Kubernetes (deployments, pods, autoscaling, resource limits). Familiar with connection pooling strategies for DB and services. Comfortable with performance monitoring tools like Prometheus, Grafana, UptimeRobot, etc. Soft Skills: Excellent debugging and analytical skills. Able to work independently and solve complex issues. Strong communication and documentation habits. Preferred Qualifications: 3+ years of experience in backend development. Experience with CI/CD pipelines and production deployments. Prior work with large-scale distributed systems is a plus.

Posted 1 month ago

Apply

3.0 - 5.0 years

6 - 8 Lacs

Mumbai, Delhi / NCR, Bengaluru

Work from Office

About the Role: We are hiring 4 experienced DevOps Support Engineers for a remote contract position. If you're proficient with Kubernetes, Snowflake, Python, Azure, ADF, and cloud infrastructure support, and are comfortable working in a CST overlap shiftthis is for you! Key Responsibilities: Platform Support & Maintenance Ensure high availability and performance of Kubernetes clusters and Azure infrastructure Monitor Snowflake pipelines, troubleshoot failures, resolve incidents Infrastructure Automation & Optimization Develop automation in Python for deployment, scaling, and monitoring Optimize Kubernetes configurations and Azure usage Data Pipeline Management Support pipelines in Snowflake and Azure Data Factory (ADF) Ensure reliability, integrity, and performance of ETL/ELT processes Security & Compliance Maintain RBAC, perform audits, enforce cloud security standards Collaboration & On-Call Support Participate in on-call rotations Work closely with QA, engineering, and product teams Document runbooks, processes, and incident resolutions Must-Have Skills: Kubernetes: Deployment, scaling, monitoring Snowflake: Data modeling, query optimization, troubleshooting Python: For scripting and automation Azure Services: Compute, storage, networking, identity ADF (Azure Data Factory): Pipeline creation and monitoring CI/CD & IaC: Jenkins, GitHub Actions, Azure DevOps, Terraform Monitoring: Prometheus, Grafana, Azure Monitor Strong communication and proactive troubleshooting skills Location: Remote Contract Duration: 6 Months Work Hours: Overlap till 12 PM CST (i.e., 2:30 PM to 11:30 PM IST)

Posted 1 month ago

Apply

0.0 - 3.0 years

2 - 5 Lacs

Bengaluru

Work from Office

Key Responsibilities: Deliver engaging and interactive training sessions (24 hours total) based on structured modules. Teach integration of monitoring, logging, and observability tools with machine learning. Guide learners in real-time anomaly detection, incident management, root cause analysis, and predictive scaling. Support learners in deploying tools like Prometheus, Grafana, OpenTelemetry, Neo4j, Falco, and KEDA. Conduct hands-on labs using LangChain, Ollama, Prophet, and other AI/ML frameworks. Help participants set up smart workflows for alert classification and routing using open-source stacks. Prepare learners to handle security, threat detection, and runtime anomaly classification using LLMs. Provide post-training support and mentorship when necessary. Observability & Monitoring: Prometheus, Grafana, OpenTelemetry, ELK Stack, FluentBit AI/ML: Python, scikit-learn, Prophet, LangChain, Ollama (LLMs) Security Tools: Falco, KubeArmor, Sysdig Secure Dev Tools: Docker, VSCode, Jupyter Notebooks LLMs & Automation: LangChain, Neo4j, GPT-based explanation tools, Slack Webhooks.

Posted 1 month ago

Apply

6.0 - 11.0 years

11 - 14 Lacs

Bengaluru

Work from Office

Educational Bachelor of Engineering Service Line Quality Responsibilities Experience in one or more high level programming language like Python or Ruby or GoLang and familiar with Object Oriented Programming. Proficient with designing, deploying and managing distributed systems and service-oriented architectures Design and implement the CI/CD/CT pipeline on one or more tool stack, like Jenkins, Bamboo, azure DevOps, and AWS Code pipeline with hands on experience in common DevOps tools (Jenkins, Sonar, Maven, Git, Nexus, and UCD etc.) Experience in deploying, managing and monitoring applications and services on one or more Cloud and on-premises infrastructure like AWS, Azure, OpenStack, Cloud Foundry, Open shift etc. Proficiency in one or more Infrastructure as code tools (e.g. Terraform, Cloud Formation, Azure ARM etc) Developing, managing monitoring tools and log analysis tools to manage operations with exposure to tools such as App Dynamics, Data Dog, Splunk, Kibana, Prometheus, Grafana Elasticsearch etc. Proven ability to maintain enterprise-scale production software with the knowledge of heterogeneous system landscapes (e.g. Linux, Windows) Expertise in analyzing and troubleshooting large-scale distributed systems and Micro Services with experience with Unix/Linux operating systems internals and administration (e.g., file systems, inodes, system calls) and networking (e.g., TCP/IP, routing, network topologies). Preferred Skills: Technology-DevOps-DevOps Architecture Consultancy

Posted 1 month ago

Apply

3.0 - 5.0 years

10 - 14 Lacs

Bengaluru

Work from Office

Educational Bachelor of Engineering,BTech,BCA,MSc,MTech,MCA Service Line Strategic Technology Group Responsibilities Infosys has designed the Power Programmers career track for polyglots who are passionate about programming and want to do more than programming as usual. We are looking for highly skilled Specialist Programmers with reliability engineering focus and Senior Technologists who are passionate about IT Resiliency. In this role, you will build and help create robust platforms, marketplaces, innovative solutions, contribute and develop open software and ensure mission critical systems deliver to their promises. You will collaborate with some of the best talent in the industry to create and implement innovative high quality software solutions. You will be part of a learning culture, where teamwork and collaboration are encouraged, excellence is rewarded, and diversity is respected and valued. Technical and Professional : Kubernetes, AWS/Azure, DevOps/DevSecOps, Monitoring Tools - App Dynamics/ Dynatrace/New Relic, Build and Release, Prometheus, Python, Node.JS Preferred Skills: Technology-Cloud Platform-Azure Devops Technology-Cloud Platform-Azure IOT-Azure Sphere

Posted 1 month ago

Apply

3.0 - 6.0 years

18 - 22 Lacs

Bengaluru

Work from Office

Job Overview As a Senior Backend Engineer at Streamlyn, you will be responsible for designing, building, and scaling backend services and APIs that support our ad-serving infrastructure, analytics pipelines, and internal tools. You will work closely with cross-functional teams, mentor junior developers, and contribute to high-performance, scalable architecture in a fast-paced AdTech environment. Roles and Responsibilities Design and develop robust backend services and APIs using Spring Boot and Java. Build and maintain microservices that support scalable ad-serving and reporting infrastructure. Write unit, integration, and performance tests to ensure software quality and reliability. Work with databases such as MariaDB, Redis, and Kafka for real-time data processing and persistence. Collaborate with frontend, DevOps, and product teams to build end-to-end features. Mentor junior developers and contribute to system architecture discussions. Primary (Core Requirements) Programming Language: Java 17+ Framework: Spring Boot 3.x (Web, Security, Data JPA, Validation) Build Tool: Maven or Gradle ORM/DB: Hibernate / JOOQ + MariaDB / PostgreSQL Caching: Redis (Jedis), Caffeine (optional) Version Control: Git, GitLab Flow Focus (Good to Know / Day-to-Day Use) Event Streaming: Kafka (producer/consumer) Documentation: Swagger/OpenAPI Testing: JUnit 5, Mockito, Testcontainers CI/CD: GitLab CI, Docker Monitoring: Prometheus + Grafana Security: JWT, role-based access, OWASP best practices Additional Exposure (Nice to Have / Growth Areas) ETL and reporting pipelines Multi-tenant support systems Performance tuning and caching strategies Familiarity with Kubernetes and service mesh concepts Hands-on with AWS services or equivalent cloud platforms Qualifications 4+ years of experience in backend development. Strong knowledge of Spring Boot, REST API design, and distributed systems. Proficiency in database schema design, optimization, and indexing strategies. Experience with asynchronous processing and messaging queues. Good communication, documentation, and collaboration skills.

Posted 1 month ago

Apply

3.0 - 6.0 years

3 - 7 Lacs

Mumbai, Mumbai Suburban, Delhi

Work from Office

Job Description Education B.E./B.Tech/MCA in Computer Science Experience 3 to 6 Years of Experience in Kubernetes/GKE/AKS/OpenShift Administration Mandatory Skills ( Docker and Kubernetes) Should have good understanding of various components of various types of kubernetes clusters (Community/AKS/GKE/OpenShift) Should have provisioning experience of various type of kubernetes clusters (Community/AKS/GKE/OpenSHIFT) Should have Upgradation and monitoring experience of variouos type of kubernetes clusters (Community/AKS/GKE/OpenSHIFT) Should have good experience on Conatiner Security Should have good experience of Container storage Should have good experience on CICD workflow (Preferable Azure DevOps, Ansible and Jenkin) Should have goood experiene / knowlede of cloud platforms preferably Azure / Google / OpenStack Should have good experience of container runtimes like docker/cotainerd Should have basic understanding of application life cycle management on container platform Should have good understatning of container registry Should have good understanding of Helm and Helm Charts Should have good understanding of container monitoring tools like Prometheus, Grafana and ELK Should have good exeperince on Linux operating system Should have basis understanding of enterprise networks and container networks Should able to handle Severity#2 and Severity#3 incidents Good communication skills Should have capability to provide the support Should have analytical and problem solving capabilities, ability to work with teams Should have experince on 24*7 operation support framework) Should have knowledge of ITIL Process Preferred Skills/Knowledge Container Platforms - Docker, Kubernetes, GKE, AKS OR OpenShift Automation Platforms - Shell Scripts, Ansible, Jenkin Cloud Platforms - GCP/AZURE/OpenStack Operating System - Linux/CentOS/Ubuntu Container Storage and Backup Desired Skills 1. Certified Kubernetes Administrator OR 2. Certified Redhat OpenShift Administrator 3. Certification of administration of any Cloud Platform will be an added advantage Soft Skills 1. Must have good troubleshooting skills 2. Must be ready to learn new technologies and acquire new skills 3. Must be a Team Player 4. Should be good in Spoken and Written English

Posted 1 month ago

Apply
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies