Jobs
Interviews

648 Sre Jobs - Page 4

Setup a job Alert
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

10.0 - 20.0 years

0 Lacs

noida, uttar pradesh

On-site

As the Vice President of Infrastructure Engineering & Support within Fiserv Technology Services, you will be instrumental in collaborating closely with CTO portfolio leaders and CIO teams to address various infrastructure requirements and transformation initiatives. Your role will involve driving efficiency, optimization, and service delivery while overseeing a significant portion of the global organization based in Global Services. It will be your responsibility to champion service excellence, maintain platform stability, security, and resilience, and enable Fiserv clients and customers. Your deep technical expertise, architectural knowledge, and domain skills, coupled with a commercial mindset and fiscal prudence, will ensure the delivery of world-class solutions that enhance the Fiserv brand. In this role, you will: - Take ownership of the end-to-end operating model within FTS. - Regularly engage with business stakeholders to understand their needs, involve them in joint planning, and ensure high stakeholder satisfaction. - Serve as the single point of accountability and escalation for technology service provisioning to clients, the business, and the FTS organization. - Execute enterprise-wide programs and initiatives aligned with the overall strategy. - Promote the adoption and enhancement of strategic technology tools. - Utilize both technical and commercial acumen to drive business profitability through the technology solution portfolio. - Optimize technology utilization across internal and external stakeholders to meet functional and financial objectives. - Utilize initiative management, new product adoption, AI Ops, automation, and lifecycle management to achieve efficient technology outcomes. - Advocate for clients while owning the technology change roadmap. To be successful in this role, you should possess: - Over 20 years of experience in infrastructure engineering, with a focus on compute and storage technologies, operating systems, database, middleware, cloud, containers, and network services. - More than 10 years of experience in the banking and financial services industry. - 15+ years of experience in managing global teams and delivering technology service solutions. - A Bachelor's degree in engineering or computer science, or equivalent military experience. - Demonstrated expertise in ITSM, SRE, Automation, and Telemetry/AI Ops. - Experience in setting up and managing a command center for triaging and quickly restoring services. - Proficiency in Change Success and Proactive Problem Management. Additionally, it would be beneficial to have: - More than 15 years of experience in driving transformational improvements in infrastructure. - Extensive experience in leading large-scale infrastructure projects, including mergers and acquisitions. - Proven ability to manage third-party processors, hardware & software vendors, and external infrastructure providers. - Knowledge of ITIL controls and compliance processes to effectively manage vulnerabilities. - Strong leadership experience in building and sustaining a diverse workforce aligned with corporate and country goals.,

Posted 1 week ago

Apply

5.0 - 9.0 years

0 Lacs

hyderabad, telangana

On-site

As the Director of Engineering at SIDGS, you will be responsible for leading a high-performing global engineering team structured around PODs and shared service towers. You will play a crucial role in scaling delivery across complex, multi-disciplinary teams including App Dev, Web Dev, DevSecOps, Cloud, SRE, Data & AI, and Solution Architecture. Your strategic leadership in a professional services environment will be essential for ensuring engineering excellence, process governance, talent growth, and technical innovation across our portfolio of client engagements. Your key responsibilities will include leading a matrixed engineering organization, driving alignment across functional engineering towers and shared services, mentoring Engineering Leads, promoting continuous technical and leadership development, and owning end-to-end accountability for engineering deliverables. You will ensure adherence to engineering standards, secure coding practices, and audit readiness, oversee operational metrics, DevSecOps pipelines, SRE operations, and cloud infra governance, and collaborate with Solution Engineering and Sales to support pre-sales, estimation, and proposal development. Additionally, you will champion the adoption of emerging technologies such as GenAI, Vertex AI, and LangChain into client solutions, promote reusable reference architectures, knowledge sharing via COPs, lead Engineering Process Governance & Audit (PG&A), set and monitor KPIs for engineering productivity, code quality, CI/CD performance, and uptime, ensure security compliance, including CVE remediation and DAST/SAST implementation, and partner with internal R&D teams to pilot new technologies and integrate them into practice. You will also promote certification tracks, hackathons, and experimentation in modern cloud-native and AI-based architectures. To qualify for this role, you should have 12+ years of experience in software engineering, with at least 5 years in leadership roles, a proven track record managing geographically distributed engineering teams in a services or consulting firm, and a deep understanding of cloud platforms, CI/CD, DevSecOps, SRE tools, full-stack engineering, and AI/ML platforms. Strong program/project management acumen with a bias for execution, demonstrated experience in solution architecture and client delivery management, and certifications in cloud, DevSecOps, or AI/ML are a plus. A bachelor's or master's degree in computer science or a related field is preferred.,

Posted 1 week ago

Apply

7.0 - 12.0 years

7 - 11 Lacs

Mumbai, Bengaluru

Work from Office

Location PAN India As per companys designated LTIM locations Shift Type Rotational Shifts including Night Shift and Weekend Availability Experience 7 Years of Exp Job Summary We are looking for a skilled and adaptable Site Reliability Engineer SRE Observability Engineer to join our dynamic project team The ideal candidate will play a critical role in ensuring system reliability scalability observability and performance while collaborating closely with development and operations teams This position requires strong technical expertise problemsolving abilities and a commitment to 247 operational excellence Key Responsibilities Site Reliability Engineering Design build and maintain scalable and reliable infrastructure Automate system provisioning and configuration using tools like Terraform Ansible Chef or Puppet Develop tools and scripts in Python Go Java or Bash for automation and monitoring Administer and optimize LinuxUnix systems with a strong understanding of TCPIP DNS load balancers and firewalls Implement and manage cloud infrastructure across AWS or Kubernetes Maintain and enhance CICD pipelines using tools like Jenkins ArgoCD Monitor systems using Prometheus Grafana Nagios or Datadog and respond to incidents efficiently Conduct postmortems and define SLAsSLOs for system reliability and performance Plan for capacity and performance using benchmarking tools and implement autoscaling and failover systems Observability Engineering Instrument services with relevant metrics logs and traces using OpenTelemetry Prometheus Jaeger Zipkin etc Build and manage observability pipelines using Grafana ELK Stack Splunk Datadog or Honeycomb Work with timeseries databases eg InfluxDB Prometheus and log aggregation platforms Design actionable s and dashboards to improve system observability and reduce fatigue Partner with developers to promote observability best practices and define key performance indicators KPIs Required Skills Qualifications Proven experience as an SRE or Observability Engineer in complex production environments Handson expertise in LinuxUnix systems and cloud infrastructure AWSKubernetes Strong programming and scripting skills in Python Go Bash or Java Deep understanding of monitoring logging and ing systems Experience with modern Infrastructure as Code and CICD practices Ability to analyze and troubleshoot production issues in realtime Excellent communication skills to collaborate with crossfunctional teams and stakeholders Flexibility to work in rotational shifts including night shifts and weekends as required by project demands A proactive mindset with a focus on continuous improvement and reliability Additional Requirements Excellent communication skills to collaborate with crossfunctional teams and stakeholders Flexibility to work in rotational shifts including night shifts and weekends as required by project demands A proactive mindset with a focus on continuous improvement and reliability

Posted 1 week ago

Apply

8.0 - 13.0 years

14 - 24 Lacs

Hyderabad

Work from Office

Location Hyderabad Mandatory Skills - Application Support | Microservices | Splunk / ThousandEyes Monitoring | CI/CD | GCP | Database Job Description - 8 years of experience in Java/.NET based application support like Issues Resolution and Incident management . RCA Creation. Strong trouble shooting skills in debugging multiarchitecture systems and experience with microservices architecture patterns Devops and Cloud computing (GCP/AWS) Very strong communication and stakeholder coordination skill Experience in Altering and Monitoring which includes thousand eyes monitoring, Splunk alerts monitoring, google cloud alerts monitoring. Experience in Managing CI/CD pipeline deployments using harness and bamboo. GIT. Experience working with containers e.g., Docker, Kubernetes, Cloud Foundry, etc Deep knowledge of Internet protocols and web services technologies e.g., HTTP, DNS, TCP/UDP, SOAP, JSON and REST Unix Shell Scripting or any programming language mandatory

Posted 1 week ago

Apply

7.0 - 9.0 years

6 - 10 Lacs

Remote, , India

On-site

Requirements: 7+ years of experience working with Hadoop, preferably Open Source. 3+ years of leading Big Data, DevOps, SRE, DBA, or development team. Experience setting up and running Hadoop clusters of 1000+ nodes. Solid knowledge of NoSQL databases, preferably Cassandra or ScyllaDB. Experience running and troubleshooting Kafka. Working knowledge of at least one of: Terraform, Ansible, SaltStack, Puppet. Proficiency in shell scripting. Nice to have: Experience with Prometheus. Experience managing Showflake. Solid knowledge of Graphite and Grafana. Python or Perl scripting skills. Experience with installing and managing Aerospike. DBA experience with one of: PostgreSQL, MySQL, MariaDB.

Posted 1 week ago

Apply

7.0 - 8.0 years

8 - 10 Lacs

Bengaluru, Karnataka, India

On-site

What you ll do day to day Design and implementation of monitoring strategies. Improving reliability, stability, and performance of production systems. Leading automation of engineering and operations processes. Systems administration and management of production, pre-production, and test environments. Design and optimization of CI/CD pipelines. Maintenance and administration of source control systems. On-call support of production systems. What you must have 7+ years of experience as an SRE, DevOps, or TechOps Engineer. 5+ years of tools development or automation using Python, Perl, Java, or Go . 3+ years of containerization and orchestration experience. Solid experience in managing production environments in a public cloud, AWS preferred. Proficiency in Linux system administration. Experience with monitoring and observability tools: Prometheus, Loki, Grafana. Experience with at least two of the following: Puppet, Salt, Ansible, Terraform. Experience in setting up and supporting CI/CD pipelines.

Posted 1 week ago

Apply

7.0 - 9.0 years

7 - 9 Lacs

Bengaluru, Karnataka, India

On-site

Main Responsibilities: Technical leadership of infrastructure projects. Driving automation of performance testing environments. Leading containerization and IaC projects. Leading automation of engineering and operations processes. Defining and implementing HA and DR strategies. Design and optimization of CI/CD pipelines. Runbooks automation. On-call support of production systems. Requirements: 7+ years of experience in SRE, DevOps, or TechOps. 5+ years of experience leading technical projects or teams. 3+ years of tools development or automation. 3+ years of containerization and orchestration experience. Proficiency in shell scripting, as well as Python or Go. Ability to define project requirements and milestones. Experience leading cross-functional projects and teams. Solid experience in managing AWS production environments. Monitoring and observability expertise: OTEL, Prometheus, Grafana tools. Experience with at least two of the following: Puppet, Salt, Ansible, Terraform.

Posted 1 week ago

Apply

7.0 - 9.0 years

7 - 9 Lacs

Bengaluru, Karnataka, India

On-site

Main Responsibilities: Technical leadership of infrastructure projects. Driving automation of performance testing environments. Leading containerization and IaC projects. Leading automation of engineering and operations processes. Defining and implementing HA and DR strategies. Design and optimization of CI/CD pipelines. Runbooks automation. On-call support of production systems. Requirements: 7+ years of experience in SRE, DevOps, or TechOps. 5+ years of experience leading technical projects or teams. 3+ years of tools development or automation. 3+ years of containerization and orchestration experience. Proficiency in shell scripting, as well as Python or Go. Ability to define project requirements and milestones. Experience leading cross-functional projects and teams. Solid experience in managing AWS production environments. Monitoring and observability expertise: OTEL, Prometheus, Grafana tools. Experience with at least two of the following: Puppet, Salt, Ansible, Terraform.

Posted 1 week ago

Apply

0.0 years

0 Lacs

Bengaluru, Karnataka, India

On-site

Job Description: The Blockchain program manager must have expertise in leading complex technical programs and delivery by strong customer collaboration relationship management at all levels of the organization This individual is responsible for collaborating with the clients management team and lead the planning and implementation of programs in Blockchain Key Responsibilities: Deliver large complex Programs Projects that require multidisciplinary technical coordination cross functional partnership and engineering interfaces involving design development production support execution Proficient in analyzing developing and proposing various cost models including T M FP Unit of Work UoW to optimize the costs based on project requirement and customer needs Extensive experience leading geographically distributed teams supporting shared Infrastructure hosting planning maintenance and migrations leveraging Global Delivery Model Ensuring inter connected teams are efficiently and effectively working towards program goals Stakeholder communications negotiations and problem solving Coordination with internal and other vendors teams involved in the program Review research and manage a queue of client inquires and co ordinate with development product application support and operational teams to ensure seamless execution of service Interface with infrastructure release management change management QA DBA and application teams to expedite resolution of dependencies Escalation Management Assemble Incident response team to fast track service restoration efforts for Business Critical High Priority events Invoke Problem Management procedures to lead root cause analysis investigations Collaborate with clients for requirement workshops and prepare status reports for respective projects Build high performing teams mentor lesser scoped managers team members set short long term goals track performance build a culture of learnability conduct appraisals to drive Organizational goals objectives Well versed with Agile CSM SAFe SRE Waterfall Methodologies Service Delivery Operations Quality Risk Audit Management processes Present Weekly Monthly Quarterly status reports to key stakeholders executives and customer leadership Identify improvement areas and build trust by focusing on continuous improvement initiatives Preferred Skills: Technology->Enterprise Architecture->API / Microservices Architecture,Technology->Java->Springboot,Technology->Blockchain->Corda,Technology->Blockchain->Blockchain - All

Posted 1 week ago

Apply

8.0 - 12.0 years

0 Lacs

karnataka

On-site

As a Principal Engineer - Site Reliability Engineering (SRE) within the Digital Business team at Sonyliv, you will play a crucial role in ensuring the availability, scalability, and performance of our cutting-edge OTT platform. With a global user base, we are dedicated to providing seamless, high-quality streaming experiences to our audience. Your primary responsibility will be to design, build, and maintain a robust and scalable infrastructure that supports our OTT platform. Leveraging your extensive SRE experience and developer mindset, you will lead initiatives to enhance system reliability and operational efficiency. You will take full ownership of system operations, ensuring application and infrastructure reliability while demonstrating a strong support mindset to address critical incidents, even outside regular business hours. Additionally, you will collaborate closely with cross-functional teams to align goals and enhance operational excellence. Key responsibilities include managing full system ownership, developing tools and automation to improve reliability, responding to critical system issues promptly, designing and managing infrastructure solutions, driving observability best practices, and continuously improving system reliability and performance. To excel in this role, you should have at least 8 years of experience, a deep understanding of observability, and the ability to lead reliability initiatives across systems and teams. Strong technical proficiency in containers (Docker, Kubernetes), networking concepts, CDNs, infrastructure-as-code tools, cloud platforms, observability solutions, scripting/programming languages, and incident handling is essential. We are looking for a candidate with a passion for system reliability, scalability, and performance optimization, along with excellent communication, collaboration, and leadership skills. Your willingness to participate in a 24x7 on-call rotation and support critical systems during off-hours will be crucial for success in this role. Join us at Sony Pictures Networks to be part of a dynamic team that is shaping the future of entertainment in India. With leading entertainment channels and a promising streaming platform like Sony LIV, we are committed to creating a diverse and inclusive workplace where you can thrive and make a meaningful impact.,

Posted 1 week ago

Apply

5.0 - 9.0 years

0 Lacs

indore, madhya pradesh

On-site

The Modern Data Company is seeking a skilled and experienced DevOps Engineer to join our team. As a DevOps Engineer, you will play a crucial role in designing, implementing, and managing CI/CD pipelines for automated deployments. You will be responsible for maintaining and optimizing cloud infrastructure for scalability and performance, ensuring system security, monitoring, and incident response using industry best practices, and automating infrastructure provisioning using tools such as Terraform, Ansible, or similar technologies. Collaboration with development teams to streamline release processes, improve system reliability, troubleshoot and resolve infrastructure and deployment issues efficiently, implement containerization and orchestration (Docker, Kubernetes), and drive cost optimization strategies for cloud infrastructure will be key responsibilities in this role. The ideal candidate will have at least 5 years of experience in DevOps, SRE, or Cloud Engineering roles, with strong expertise in CI/CD tools such as Jenkins, GitLab CI/CD, or GitHub Actions. Proficiency in cloud platforms like AWS, Azure, or GCP, Infrastructure as Code tools like Terraform, CloudFormation, or Ansible, Kubernetes, Docker, monitoring and logging tools, scripting skills for automation, networking, security best practices, and Linux administration are required. Experience working in agile development environments is a plus. Nice to have skills include experience with serverless architectures, microservices, database management, performance tuning, and exposure to AI/ML deployment pipelines. In return, we offer a competitive salary and benefits package, the opportunity to work on cutting-edge AI technologies and products, a collaborative and innovative work environment, professional development opportunities, and career growth. If you are passionate about AI and data products, and eager to work in a dynamic team environment to make a significant impact in the world of AI and data, we encourage you to apply now and join our team at The Modern Data Company.,

Posted 1 week ago

Apply

5.0 - 9.0 years

0 Lacs

hyderabad, telangana

On-site

As the Director/Head of Engineering at SIDGS, you will be responsible for leading a high-performing, global engineering team that is structured around PODs and shared service towers. Your role will require strategic leadership in a professional services environment, overseeing complex, multi-disciplinary teams including App Dev, Web Dev, DevSecOps, Cloud, SRE, Data & AI, and Solution Architecture. Your key responsibilities will include leading a matrixed engineering organization, ensuring alignment across functional engineering towers and shared services, mentoring Engineering Leads, and promoting continuous technical and leadership development. You will own end-to-end accountability for engineering deliverables, adherence to engineering standards, and oversee operational metrics, DevSecOps pipelines, SRE operations, and cloud infra governance. Collaboration with Solution Engineering and Sales to support pre-sales, estimation, and proposal development will be crucial. You will drive the adoption of emerging technologies into client solutions, champion reusable reference architectures, and lead Engineering Process Governance & Audit. Monitoring KPIs for engineering productivity, code quality, CI/CD performance, and uptime, as well as ensuring security compliance and promoting innovation and R&D activities will also be part of your responsibilities. To be successful in this role, you should have 12+ years of experience in software engineering, with at least 5 years in leadership roles. You must have a proven track record of managing geographically distributed engineering teams in a services or consulting firm. Deep understanding of cloud platforms, CI/CD, DevSecOps, SRE tools, full-stack engineering, AI/ML platforms, program/project management acumen, and experience in solution architecture and client delivery management are essential. A degree in computer science or related field is preferred, and certifications in cloud, DevSecOps, or AI/ML are a plus.,

Posted 1 week ago

Apply

5.0 - 9.0 years

0 Lacs

thiruvananthapuram, kerala

On-site

You should have a minimum of 5 years of experience in DevOps, SRE, or Infrastructure Engineering. Your expertise should include a strong command of Azure Cloud and Infrastructure-as-Code using tools such as Terraform and CloudFormation. Proficiency in Docker and Kubernetes is essential. You should be hands-on with CI/CD tools and scripting languages like Bash, Python, or Go. A solid knowledge of Linux, networking, and security best practices is required. Experience with monitoring and logging tools such as ELK, Prometheus, and Grafana is expected. Familiarity with GitOps, Helm charts, and automation will be an advantage. Your key responsibilities will involve designing and managing CI/CD pipelines using tools like Jenkins, GitLab CI/CD, and GitHub Actions. You will be responsible for automating infrastructure provisioning through tools like Terraform, Ansible, and Pulumi. Monitoring and optimizing cloud environments, implementing containerization and orchestration with Docker and Kubernetes (EKS/GKE/AKS), and maintaining logging, monitoring, and alerting systems (ELK, Prometheus, Grafana, Datadog) are crucial aspects of the role. Ensuring system security, availability, and performance tuning, managing secrets and credentials using tools like Vault and Secrets Manager, troubleshooting infrastructure and deployment issues, and implementing blue-green and canary deployments will be part of your responsibilities. Collaboration with developers to enhance system reliability and productivity is key. Preferred skills include certification as an Azure DevOps Engineer, experience with multi-cloud environments, microservices, and event-driven systems, as well as exposure to AI/ML pipelines and data engineering workflows.,

Posted 2 weeks ago

Apply

5.0 - 9.0 years

0 Lacs

hyderabad, telangana

On-site

As a Manager of Management Information and Reporting at HSBC, you will play a crucial role in supporting the Global Procurement function by designing and developing Qlik dashboards integrated with databases and deploying them on Qlik server environments. Your responsibilities will include contributing to the execution of the Business Intelligence Roadmap for Global Procurement, collaborating with various stakeholders to identify reporting requirements, and performing complex data analysis to address business questions. You will be tasked with translating business problems into analytical frameworks, extracting actionable insights from data, and maintaining and fine-tuning dashboard designs using industry best practices. Additionally, you will support business transition, manage deliverables acceptance, and implement Agile and Waterfall development practices on dashboard projects while ensuring service desk calls are completed within agreed SLAs. Your role will also involve creating and managing data quality dashboards, documenting data lineage and transformation rules, and building advisory relationships with senior managers. Your ability to work in a fast-paced environment, handle multiple outputs simultaneously, and think critically to make data-driven decisions will be essential. To excel in this role, you should have an understanding of the Procurement life cycle, proficiency in SQL, ETL framework, Alteryx, Python, GCP, and Qlik Sense. Exceptional analytical skills, experience with business tools like JIRA and Confluence, and advanced knowledge of data visualization technologies such as Looker and Qlik are also required. Moreover, expertise in SQL/Big Query, hands-on experience with reporting tools like Qlik/Power BI/ML, and proven experience in Qlik developments and Data Governance will be beneficial. Join HSBC and make a real impact by leveraging your skills and knowledge to drive business intelligence initiatives and support the growth and success of the organization.,

Posted 2 weeks ago

Apply

5.0 - 9.0 years

0 Lacs

pune, maharashtra

On-site

As a Site Reliability Engineer with us, you will play a crucial role in supporting the successful delivery of Location Strategy projects. Your responsibilities will include ensuring that projects are completed within the planned budget, maintaining agreed quality standards, and adhering to governance standards. You will be at the forefront of driving innovation and excellence, contributing to the evolution of our digital landscape. By utilizing cutting-edge technology, you will work towards revolutionizing our digital offerings to provide unparalleled customer experiences. To excel in this role, you should possess a strong understanding of enterprise design principles to develop secure, fault-tolerant, and scalable systems. Additionally, familiarity with Agile, DevOps, SRE, and CI/CD practices is essential. Hands-on experience in Java or Python, along with proficiency in release and change management processes, source code control systems like Git, and build automation tools such as Maven/Gradle, will be beneficial. Leadership experience in managing small to midsize teams, coupled with a strong work ethic and the ability to offer technical leadership in resolving complex technical issues, are key attributes required for this role. Valued skills that would further enhance your profile include knowledge of Infrastructure as Code (IaC) and automation tools like Ansible, Chef, or Terraform, experience with event bus products such as Kafka, proficiency in testing methodologies like TDD and BDD, and an understanding of architectural and design patterns. Additionally, familiarity with API development using REST/SOAP protocols and the ability to evaluate and select third-party products effectively will be advantageous. In this position based in Pune, your primary objective will be to apply software engineering techniques, automation, and best practices in incident response to ensure the reliability, availability, and scalability of systems, platforms, and technology. Your accountabilities will involve ensuring the availability, performance, and scalability of systems and services, proactively monitoring and addressing system outages, developing tools for operational automation, optimizing system performance, and collaborating with development teams to integrate reliability best practices. As an Assistant Vice President, you will be expected to advise on decision-making processes, contribute to policy development, and ensure operational effectiveness. For individuals with leadership responsibilities, fostering a conducive environment for team members to excel and delivering consistently at a high standard will be crucial. The LEAD behaviors Listen and be authentic, Energize and inspire, Align across the enterprise, and Develop others will guide your leadership approach. Alternatively, as an individual contributor, you will lead collaborative assignments, guide team members through structured tasks, and identify innovative approaches to meet project objectives by leveraging cross-functional methodologies. Overall, regardless of your role, all colleagues are expected to uphold the Barclays Values of Respect, Integrity, Service, Excellence, and Stewardship, in addition to embodying the Barclays Mindset to Empower, Challenge, and Drive. These principles serve as the moral compass and operating manual for behavior within our organization.,

Posted 2 weeks ago

Apply

4.0 - 7.0 years

25 - 40 Lacs

Bengaluru

Work from Office

Site Reliability Engineer We are looking for engineers who are passionate about reliability, performance, and efficiency, and with experience in building tools, services, and automation to manage and improve production services. Role and Responsibilities Work to improve the reliability and performance of the next generation of distributed systems and containerized deployments Diagnose and troubleshoot complex distributed systems handling millions of queries per second Day-to-day work is heavily command-line driven, which requires a strong understanding of Linux. Troubleshoot issues across the entire stack - hardware, software, application, and network Participate in 24x7 on-call rotations. Design build and maintain core infrastructure that enables Phonepe scaling to support hundreds of thousands of concurrent users Actively take part in the Analysis and System improvement plan. Drive performance testing, capacity planning and high availability practices. Own implementations of new technologies while ensuring proper testing and documentation. Proactively monitor/identify/solve issues which could have a potential impact to our Infrastructure. Natural team player and also have a resourceful attitude. Buddy new team members, and get them production ready. Preferred candidate profile Experience in Azure. Systems internals/security, linux, network, and monitoring Knowledge of Linux cloud services using kvm/qemu/lvm. In-depth knowledge in Perl/GoLang/Python to automate tasks with minimal intervention. Knowledge in Database technologies, specifically in MySQL/NoSQL is good to hav

Posted 2 weeks ago

Apply

15.0 - 20.0 years

90 - 150 Lacs

Pune, Bengaluru

Hybrid

Northern Trust is seeking an experienced Principal Technology Resiliency Enablement Office . The APAC Technology Resiliency Enablement Office will lead the regional strategy, implementation, and governance of technology resilience initiatives. This role ensures alignment with global best practices, particularly those established by the US Practice Lead, to strengthen operational resilience, business continuity, and regulatory compliance.

Posted 2 weeks ago

Apply

6.0 - 9.0 years

16 - 22 Lacs

Bengaluru

Work from Office

Key Skills: Angular, React, NodeJS, Python, C#, .NET Core, Java, Golang, SQL, NoSQL, AWS, Azure, GCP, Microservices, AI/ML, GenAI, DevOps, Agile, SRE, GitHub, SonarQube, FaaS, PaaS. Key Responsibilities: Design, develop, and maintain cloud-native applications and services using Angular, React, NodeJS, Python, C#, .NET Core, Java, and Golang. Work on microservices architecture, leveraging cloud services (FaaS/PaaS) on platforms like AWS, Azure, and GCP. Implement AI/ML and GenAI technologies into products to enhance functionality and performance. Follow best practices in coding, data-structures, algorithms, and software engineering principles. Collaborate with cross-functional teams to deliver high-quality, scalable, and efficient solutions. Use DevOps methodologies to ensure rapid product delivery, continuous integration, and deployment pipelines. Participate in product design and architecture discussions, offering insights to improve system performance and user experience. Work within Agile frameworks such as XP, Lean, SAFe, and DevSecOps, ensuring high-quality products are delivered. Utilize tools like ADO, GitHub, and SonarQube to manage version control, code quality, and project tracking. Experience Requirement: 6-9 years of proven experience in software engineering with hands-on expertise in Angular, React, NodeJS, Python, C#, .NET Core, Java, Golang, SQL/NoSQL databases. Experience in cloud-native engineering using cloud platforms like AWS, Azure, and GCP, with knowledge of microservices, FaaS, and PaaS. Experience with AI/ML technologies, with a preference for candidates with experience in Generative AI (GenAI). Strong understanding of software engineering methodologies like XP, Lean, SAFe, DevSecOps, and SRE. Education: Any Graduation.

Posted 2 weeks ago

Apply

5.0 - 10.0 years

17 - 30 Lacs

Pune, Bengaluru

Work from Office

About Position: As a Senior Service Reliability Engineer at Proofpoint you will develop a deep understanding of the various services and applications that come together to deliver Proofpoints next generation security products. You will be responsible for maintaining and extending the Elasticsearch and Splunk clusters used for critical near-real-time data analysis. In this role, you will be continually evaluating the performance of our Elasticsearch and Splunk clusters to spot developing problems, planning changes for upcoming high-load events, applying security fixes, testing and performing incremental upgrades, and extending and improving our monitoring and alert infrastructure. Role: SRE Elasticsearch Administrator Location: Pune & Bengaluru Experience: 5 years to 12years Job Type: Full Time Employment Mandatory Mention 3 skills: Elasticsearch cluster, Elasticsearch administrative , cluster What You'll Do: In this role, you will be continually evaluating the performance of our Elasticsearch and Splunk clusters to spot developing problems, planning changes for upcoming high-load events, applying security fixes, testing and performing incremental upgrades, and extending and improving our monitoring and alert infrastructure. Youll also be involved in maintaining other parts of the data pipeline, which may include server less or server-based systems for feeding data into the Elasticsearch pipeline. Were continually trying to optimize our cost-vs-performance position, and so testing new types of hosts or configurations is an ongoing focus. We do much of our work with declarative tools such as Puppet, and various scripting mechanisms (depending on the target environment). In general, we want to automate as much as possible and aim for a ‘build once/run everywhere’ system. Some of our Elasticsearch clusters are in the public cloud; some are in Kubernetes clusters, and some are in private datacenter. This will be an opportunity to work with a variety of types of infrastructure and operations teams. Build long-lasting, effective partnerships across the organization to foster collaboration between Product, Engineering and Operations teams. Participate in an on-call rotation and be willing to jump on escalated issues as needed Expertise You'll Bring: Bachelor’s degree in computer science, information technology, engineering, or related discipline required Expertise in administration and management of Elasticsearch clusters. (Primary) Expertise in administration and management of Splunk clusters. (Secondary) Strong Knowledge in provisioning and Configuration Management tools like Puppet, Ansible, Rundeck, etc. Experience in building Automations and Infrastructure as Code using Terraform, Packer or CloudFormation templates. (Plus) Experience with monitoring and logging tools like Splunk, Prometheus, PagerDuty, etc. Experience with scripting languages like Python, Bash, Go, Ruby, Perl, etc. Experience with CI/CD tools like Jenkins, Pipelines, Artifactory, etc. An inquisitive mind with the ability to learn where the data exists in a large and disparate system and what that data means The skills to do effective troubleshooting, following a problem wherever it may lead. Benefits : Competitive salary and benefits package Culture focused on talent development with quarterly promotion cycles and company-sponsored higher education and certifications Opportunity to work with cutting-edge technologies Employee engagement initiatives such as project parties, flexible work hours, and Long Service awards Annual health check-ups Insurance coverage: group term life, personal accident, and Mediclaim hospitalization for self, spouse, two children, and parents Inclusive Environment: Persistent Ltd. is dedicated to fostering diversity and inclusion in the workplace. We invite applications from all qualified individuals, including those with disabilities, and regardless of gender or gender preference. We welcome diverse candidates from all backgrounds. We offer hybrid work options and flexible working hours to accommodate various needs and preferences. Our office is equipped with accessible facilities, including adjustable workstations, ergonomic chairs, and assistive technologies to support employees with physical disabilities. If you are a person with disabilities and have specific requirements, please inform us during the application process or at any time during your employment. We are committed to creating an inclusive environment where all employees can thrive. Our company fosters a value-driven and people-centric work environment that enables our employees to: Accelerate growth, both professionally and personally Impact the world in powerful, positive ways, using the latest technologies Enjoy collaborative innovation, with diversity and work-life wellbeing at the core Unlock global opportunities to work and learn with the industry’s best Let’s unleash your full potential at Persistent “Persistent is an Equal Opportunity Employer and prohibits discrimination and harassment of any kind.”

Posted 2 weeks ago

Apply

5.0 - 10.0 years

7 - 11 Lacs

Mumbai

Hybrid

Job Title:Application Support SRE Experience: 5-10 Years Location:Mumbai - Hybrid Job Description : Responsibilities:- 5 to 8 years in a similar role of hands-on application middleware specialist. Prior experience of working in a global financial organization is an advantage Client is looking to onboard an application support and SRE specialist for their Application and Data Engineering (ADE) group. ADE provides application engineering, tooling, automation and elevated production support services conforming to company security blueprints and focused on performance, Reliability and scalability by understanding the technical requirement from application owners and business, participate in technical evaluation of vendors and vendor technologies, Conduct proof of concept, packaging and deploying middleware products. skills: Linux Python/Shell Database-Sybase, DB2 Web Servers

Posted 2 weeks ago

Apply

10.0 - 15.0 years

12 - 17 Lacs

Chennai, Bengaluru

Work from Office

Strong Knowledge in Linux internals (Preferable RHEL Ubuntu) Essential Knowledge in Windows internals Comprehensive understanding in DevOps SRE, IaC and 12 Factor Principles Excellent hands-on experience in configuration management, orchestration and IaC tools (Ansible, Jenkins, Terraform) Strong understanding of Virtualization Technologies (KVM Libvirt oVirt KubeVirt. OVM, Openstack) Strong understanding of Software Defined Storage Technologies (CEPH, GlusterFS) Strong understanding of Repository and Artifact management Tools (Red Hat Satellite, Spacewalk, Nexus) Strong understanding of Container Technologies (Docker, Kubernetes, Openshift) Strong understanding of ELK and its beats (Auditbeat, FileBeat) Strong understanding of OS Compliance Policies (CIS Benchmark) Agile methodologies and its ceremonies Architect, write and implement software that improves the stability, scalability, availability of products. Own multiple services and have the authonomy to do what suits the business and our customers in IT. Solve occurring problems and create solutions and automation to prevent them from happen again. Plan for reliability for systems to work across multi datacenter/environment and handle the outages. Conceptual understanding about infrastructure and how it works, DNS (Authoritive and Non-Authoritive DNS, Dynamic and bind DNS, Forwarder) SSL Communication (Handshake of SSL traffic, Cipher Suites, Enc Algorithyms,) Active Directory (Security OUs, policies) Certificates (SAN, client-authentication, keystores, mutual ssl) Loadbalancers Site Selectors Firewall Vault Tools (Cyberark Hashicorp) High Availability Knowledge about API communications (Rest/Soap), developing a new consumer/publisher for any API. Excellent Scripting in Groovy (writing Jenkins Files) Bash Powershell Python GITOPS driven configuration management and deployment. Familiar and openminded to Opensource technologies Team player quick adaptation to context change Security Awareness Strong understanding of troubleshooting. Deep dive to an issue, read logs, track the clues and identify the problems. Strategic Thinking with Research and Development minds

Posted 2 weeks ago

Apply

5.0 - 7.0 years

7 - 10 Lacs

Hyderabad

Work from Office

Position Summary The Network SRE III will be responsible for ensuring the reliability, availability, and scalability of networks, systems and associated platforms. Systems under the care of a Network SRE r III must operate effectively and reliably through scalable builds, deployments, releases and complex architectures that encompass modern technologies. You will work closely with technical and non-technical teams within the organization to facilitate the design and implementation of scalable solutions, drive multiple automation initiatives, monitor and maintain the performance of critical systems. What Youll Do Apply modern engineering principles and practices to operational functions and employ this methodology throughout the full system lifecycle; from initial concept and architecture through deployment, daily operation, and overall optimization, and apply these practices to refining existing systems Support and maintain networking systems to ensure optimal performance, reliability, and security. Build and deploy modern network designs translated from user stories and requirements. Manage the system lifecycle from implementation to shut down and decommissioning. Scale network systems sustainably through mechanisms such as automation and evolve systems by fostering changes that improve velocity. Leverage automation and configuration management to streamline administration. Troubleshoot and resolve complex issues, including failures, connectivity problems, and performance bottlenecks. Partner with cross-functional teams to design and implement scalable and robust network architecture to improve services on an ongoing basis. Evaluate and recommend new solutions, tools, and methodologies to enhance network efficiency, security, and productivity. Monitor network performance and proactively identify potential issues, generate meaningful reports, and implement preventive measures to maintain health and uptime. Conduct performance tuning and optimization activities to maximize efficiency and response times. Respond to and help manage incidents. Participate in RCA activities and assist with defining SLOs and SLIs for business stakeholders Implement and enforce security best practices to protect our networks, data, and infrastructure against unauthorized access, cyber threats, and vulnerabilities. Create and maintain comprehensive knowledge bases for system documentation, including standard operating procedures, configurations, and troubleshooting guides, to support end-users' ability to use the networks effectively. Participate in on-call rotation. Responsible for upholding F5s Business Code of Ethics and for promptly reporting violations of the Code or other company policies. Performs other related duties as assigned. The Job Description is intended to be a general representation of the responsibilities and requirements of the job. However, the description may not be all-inclusive, and responsibilities and requirements are subject to change. What Youll Bring Strong experience in network automation with code-first approach to manage resources on-premises, in cloud platforms, SaaS platforms, containerization, and orchestration. Proficiency in Agile delivery, SRE, DevOps principles, associated tools and technologies. Demonstrated experience in modern network principles, network infrastructure design, implementation, and maintenance. Extensive knowledge of network protocols (OSPF, BGP, STP, TCP, UDP, IPv6). Experience with observability tooling including logging infrastructure, tracing systems, alert definitions, etc. Solid Experience with network security technologies and protocols, including firewalls, intrusion detection and prevention systems, and VPNs. Solid understanding of cybersecurity principles and best practices. Strong understanding of network performance optimization and capacity planning. Familiarity with compliance and regulatory guidelines Demonstrated ability to work both independently and as an integral member of an agile team Proficient communication, planning, problem solving, trouble shooting, and organization skills Flexibility to adapt to changing project requirements and timelines Qualifications BS/BA or equivalent work experience 5+ years demonstrated experience in network operations, engineering, or similar technical role. Proficiency in network automation, using scripting/ programming languages, tools and software lifecycle methodologies. Hands-on experience with technology systems tools, protocols, and platforms F5 Inc. is an equal opportunity employer and strongly supports diversity in the workplace.

Posted 2 weeks ago

Apply

6.0 - 8.0 years

18 - 30 Lacs

Hyderabad

Work from Office

Key Skills: Hadoop, Cloudera, HDFS, YARN, Spark, Delta Lake, Linux, Docker, Kubernetes, Jenkins, REST API, Prometheus, Grafana, Splunk, PySpark, Python, Terraform, Ansible, GCP, DevOps, CI/CD, SRE, Agile, Infrastructure Automation Roles & Responsibilities: Lead and support technology teams in designing, developing, and managing data engineering and CI/CD pipelines, and infrastructure. Act as an Infrastructure/DevOps SME in designing and implementing solutions for risk analytics systems transformation, both tactical and strategic, aligned with regulatory and business initiatives. Collaborate with other technology teams, IT support teams, and architects to drive improvements in product delivery. Manage daily interactions with IT and central DevOps/infrastructure teams to ensure continuous support and delivery. Grow the technical expertise within the engineering community by mentoring and sharing knowledge. Design, maintain, and improve the full software delivery lifecycle. Enforce process discipline and improvements in areas like agile software delivery, production support, and DevOps pipeline development. Experience Requirement: 6-8 years of experience in platform engineering, SRE roles, and managing distributed/big data infrastructures. Strong hands-on experience with the Hadoop ecosystem, big data pipelines, and Delta Lake. Proven expertise in Cloudera Hadoop cluster management including HDFS, YARN, and Spark. In-depth knowledge of networking, Linux, HDFS, and DevSecOps tools like Docker, Kubernetes, and Jenkins. Skilled in containerization with Docker and orchestration using Kubernetes. Hands-on experience with designing and managing large-scale tech projects, including REST API standards. Experience with monitoring and logging tools such as Prometheus, Grafana, and Splunk. Global collaboration experience with IT and support teams across geographies. Strong coding skills in Spark (PySpark) and Python with at least 3 years of experience. Expertise in Infrastructure as Code (IaC) tools such as Terraform and Ansible. Working knowledge of GCP or other cloud platforms and their data engineering products is preferred. Familiarity with agile methodologies, with strong problem-solving and team collaboration skills. Education: B.Tech M.Tech (Dual), B.Tech, M. Tech.

Posted 2 weeks ago

Apply

5.0 - 10.0 years

20 - 35 Lacs

Bengaluru

Hybrid

SRE/Site reliability Engineer GCP Terraform Python

Posted 2 weeks ago

Apply

7.0 - 10.0 years

13 - 20 Lacs

Hyderabad

Work from Office

• 7+ years of experience in cloud infrastructure engineering or SRE roles. • Deep expertise in automating infrastructure using modern DevOps and IaC practices. • Proficient in building and maintaining CI/CD pipelines. • Strong background in microservices architecture and Docker. • Mid-level experience supporting Java or .NET applications. • Hands-on experience with cloud platforms such as AWS, Azure, or GCP. • Strong knowledge of networking, load balancing, and cloud security best practices. • Excellent analytical, problem-solving, and communication skills.

Posted 2 weeks ago

Apply
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies