Jobs
Interviews

652 Sre Jobs - Page 22

Setup a job Alert
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

6.0 - 10.0 years

27 - 42 Lacs

Kolkata

Work from Office

Azure devops, AWS – (Major AWS services like RDS-MSQL, Elastic beanstalk,) EKS , Terraform, Python, Kubernetes, SRE ( Man) Job Summary We are seeking an experienced Infra. Technology Specialist with 6 to 10 years of experience to join our team. The ideal candidate will have expertise in SRE Automation Database and SQL Database Basics Terraform AWS and Python. This hybrid role involves rotational shifts and does not require travel. The candidate will play a crucial role in ensuring the stability and efficiency of our infrastructure. Responsibilities Oversee the stability and performance of the companys infrastructure. Implement automation solutions to streamline operations and reduce manual tasks. Manage and optimize databases ensuring data integrity and availability. Utilize SQL to perform complex queries and data analysis. Apply Terraform to manage infrastructure as code and automate provisioning. Leverage AWS services to build and maintain scalable and secure cloud environments. Develop and maintain scripts using Python to automate routine tasks and processes. Monitor system performance and troubleshoot issues to ensure high availability. Collaborate with cross-functional teams to design and implement infrastructure solutions. Provide technical support and guidance to team members on infrastructure-related matters. Ensure compliance with security policies and best practices in all infrastructure activities. Participate in on-call rotations to provide 24/7 support for critical infrastructure components. Continuously evaluate and adopt new technologies to improve infrastructure efficiency. Qualifications Possess a strong background in Site Reliability Engineering (SRE) with hands-on experience. Demonstrate proficiency in automation tools and techniques to enhance operational efficiency. Have in-depth knowledge of database management and SQL for data manipulation and analysis. Show expertise in using Terraform for infrastructure as code and automated provisioning. Exhibit experience with AWS cloud services for building and maintaining cloud environments. Be skilled in Python programming for scripting and automation purposes. Display excellent problem-solving abilities and attention to detail. Have strong communication skills to collaborate effectively with team members. Be adaptable to rotational shifts and able to work in a hybrid work model. Maintain a proactive approach to learning and adopting new technologies. Ensure a high level of security and compliance in all infrastructure activities. Provide mentorship and support to junior team members. Demonstrate a commitment to continuous improvement and innovation.

Posted 2 months ago

Apply

6.0 - 10.0 years

27 - 42 Lacs

Kolkata

Work from Office

Azure DeVos, AWS – (Major AWS services like RDS-MSQL, Elastic beanstalk,) EKS , Terraform, Python, Kubernetes, SRE Job Summary We are seeking an experienced Infra. Technology Specialist with 6 to 10 years of experience to join our team. The ideal candidate will have expertise in SRE Automation Database and SQL Database Basics Terraform AWS and Python. This hybrid role involves rotational shifts and does not require travel. The candidate will play a crucial role in ensuring the stability and efficiency of our infrastructure. Responsibilities Oversee the stability and performance of the companys infrastructure. Implement automation solutions to streamline operations and reduce manual tasks. Manage and optimize databases ensuring data integrity and availability. Utilize SQL to perform complex queries and data analysis. Apply Terraform to manage infrastructure as code and automate provisioning. Leverage AWS services to build and maintain scalable and secure cloud environments. Develop and maintain scripts using Python to automate routine tasks and processes. Monitor system performance and troubleshoot issues to ensure high availability. Collaborate with cross-functional teams to design and implement infrastructure solutions. Provide technical support and guidance to team members on infrastructure-related matters. Ensure compliance with security policies and best practices in all infrastructure activities. Participate in on-call rotations to provide 24/7 support for critical infrastructure components. Continuously evaluate and adopt new technologies to improve infrastructure efficiency. Qualifications Possess a strong background in Site Reliability Engineering (SRE) with hands-on experience. Demonstrate proficiency in automation tools and techniques to enhance operational efficiency. Have in-depth knowledge of database management and SQL for data manipulation and analysis. Show expertise in using Terraform for infrastructure as code and automated provisioning. Exhibit experience with AWS cloud services for building and maintaining cloud environments. Be skilled in Python programming for scripting and automation purposes. Display excellent problem-solving abilities and attention to detail. Have strong communication skills to collaborate effectively with team members. Be adaptable to rotational shifts and able to work in a hybrid work model. Maintain a proactive approach to learning and adopting new technologies. Ensure a high level of security and compliance in all infrastructure activities. Provide mentorship and support to junior team members. Demonstrate a commitment to continuous improvement and innovation.

Posted 2 months ago

Apply

10.0 - 14.0 years

35 - 50 Lacs

Hyderabad

Work from Office

Azure devops, AWS EKS , Terraform, Python, Kubernetes, SRE Job Summary We are seeking an experienced Infra Ops Specialist with 10 to 14 years of experience to join our team. The ideal candidate will have expertise in Kubernetes Azure DevOps AWS EKS Elastic Beanstalk Automation Python AWS GCP SRE Ansible and Terraform. This role requires a strong background in Consumer Lending. The work model is hybrid and the shift is day. No travel is required. Responsibilities Lead the design and implementation of infrastructure solutions using Kubernetes AWS EKS and Elastic Beanstalk. Oversee the deployment and management of applications using Azure DevOps and Terraform. Provide automation solutions using Python and Ansible to streamline operations. Ensure the reliability and availability of infrastructure through SRE practices. Collaborate with cross-functional teams to support Consumer Lending applications. Monitor and optimize cloud infrastructure on AWS and GCP. Develop and maintain CI/CD pipelines for efficient software delivery. Implement security best practices and compliance standards in cloud environments. Troubleshoot and resolve infrastructure issues in a timely manner. Document infrastructure configurations and operational procedures. Mentor junior team members and provide technical guidance. Stay updated with the latest industry trends and technologies. Contribute to the continuous improvement of infrastructure processes. Qualifications Must have extensive experience with Kubernetes AWS EKS and Elastic Beanstalk. Should have strong expertise in Azure DevOps and Terraform. Must be proficient in automation using Python and Ansible. Should have a solid understanding of SRE practices. Must have experience with AWS and GCP cloud platforms. Should have a background in Consumer Lending domain.

Posted 2 months ago

Apply

15.0 - 20.0 years

10 - 14 Lacs

Navi Mumbai

Work from Office

Project Role : Application Lead Project Role Description : Lead the effort to design, build and configure applications, acting as the primary point of contact. Must have skills : Automation in Application Maintenance Good to have skills : NAMinimum 7.5 year(s) of experience is required Educational Qualification : 15 years full time educationRole Description :The SRE and Automations Manager will be responsible for driving the reliability, scalability, and efficiency of AMS operations by leading the automation initiatives and SRE practices across both SAP and non-SAP landscapes. This individual will work closely with application support, infrastructure, DevOps, and ITSM teams to ensure high availability and performance of critical business applications.Key Responsibilities:SRE Responsibilities:- Establish and implement SRE practices such as myWizard and GenWizard app components across supported applications.- Collaborate with support teams to identify improvement areas in incident handling through runbooks, self-healing scripts, and observability tools.- Design and enforce proactive monitoring and alerting strategies for SAP and non-SAP applications using availabl .- Participate in capacity planning, performance tuning, and disaster recovery strategy formulation for delivery teams.Automation Responsibilities:- Define and execute the automation strategy for repetitive operational tasks including system health checks, report generation, job monitoring, user provisioning, and ticket triaging.- Drive the development of automation scripts using Python, PowerShell, Shell, ABAP (for SAP), or other tools as needed.- Partner with application SMEs and functional teams to identify automation use cases and deliver continuous value.- Ensure all automation activities are documented, version-controlled, and aligned with security policies.________________________________________Technical Skills & Tools:- Strong knowledge of SRE principles and automation frameworks.- Familiarity with non-SAP technologies such as Java, .NET, Oracle, SQL, or custom-built apps.- Tools:ServiceNow, Splunk, AppDynamics, Grafana, Prometheus, Jenkins, Git, Ansible, Python, Shell scripting, ABAP (basic automation).- Good to have exposure to cloud platforms (AWS/Azure/GCP) and hybrid environments.________________________________________Leadership & Soft Skills: - Ability to lead a small team of SREs and automation engineers.- Excellent analytical, problem-solving, and communication skills.- Strong stakeholder management skills and experience working in multi-vendor environments.- Agile/DevOps mindset with a focus on continuous improvement. Additional Information:- The candidate should have minimum 7.5 years of experience in Automation in Application Maintenance.- This position is based in Mumbai.- A 15 years full time education is required. Qualification 15 years full time education

Posted 2 months ago

Apply

6.0 - 8.0 years

12 - 16 Lacs

Bengaluru

Work from Office

: Job TitleSite Reliability Engineer LocationBangalore, India Corporate TitleAssociate Role Description You will work closely with application teams to ensure stable, well monitored applications that are resilient to faults. You will agree and review Service Level Objectives (SLOs) to achieve high availability for applications based on their criticality. You will maintain Error Budgets for the application teams and prevent releases in the event of production instability and reduced availability. You will focus on reducing manual toil, improving operational reliability and driving automation-first practices. This is a hands-on role with strong focus on implementing SRE practices and reducing toil for Developer Tools. What we'll offer you As part of our flexible scheme, here are just some of the benefits that youll enjoy Best in class leave policy Gender neutral parental leaves 100% reimbursement under childcare assistance benefit (gender neutral) Sponsorship for Industry relevant certifications and education Employee Assistance Program for you and your family members Comprehensive Hospitalization Insurance for you and your dependents Accident and Term life Insurance Complementary Health screening for 35 yrs. and above Your key responsibilities Drive stability, performance and reliability improvements for TDI Engineering applications. Build Monitoring and alerting solutions to alert in the event of failures/performance issues across TDI Engineering applications to help us providing the optimum service level to the users. Provide feedback loops to continually improve the application resilience across multiple application teams. Collaborate with product owners and engineering team to prioritize reliability and stability of these applications. Define, measure and maintain SLOs and Error Budgets to ensure availability for end users and to achieve appropriate levels of application stability. Identify opportunities for automation and self-service capabilities and implement them to eliminate toil for both the application teams and the SRE team to optimise effectiveness Manage outage resolution and agree actions to reduce the likelihood of failure happening in future by owning RCA and conducting blameless postmortems. Your skills and experience Bachelors degree from an accredited college or university with a concentration in Computer Science or IT-related discipline (or equivalent work experience or diploma). 4+ Years of Experience in IT in large corporate environments, specifically in controlled production environments. Demonstrable Site Reliability Engineering experience of at least 2+ Years. Excellent analytical and problem-solving skills Experience in implementing observability solution using any industry standard tools Scripting skills (Groovy, shell, Bash, Cron or any equivalent) Experience in mid-range technologies and platforms, i.e. UNIX/LINUX, ORACLE database and Nginx experience. Good to have: Understanding and experience in Developer Tools (Jira, Confluence, Bitbucket, TeamCity, Artifactory, Udeploy) as an enterprise level Administrator experienced in managing applications with large user base. Knowledge and experience of observability tools like Grafana, Prometheus. How we'll support you Training and development to help you excel in your career Coaching and support from experts in your team A culture of continuous learning to aid progression A range of flexible benefits that you can tailor to suit your needs

Posted 2 months ago

Apply

6.0 - 8.0 years

5 - 9 Lacs

Pune

Work from Office

: Job Title- Production Support Analyst, AS Location- Pune, India Role Description L2 Technical Application Support What we'll offer you As part of our flexible scheme, here are just some of the benefits that youll enjoy Best in class leave policy Gender neutral parental leaves 100% reimbursement under childcare assistance benefit (gender neutral) Sponsorship for Industry relevant certifications and education Employee Assistance Program for you and your family members Comprehensive Hospitalization Insurance for you and your dependents Accident and Term life Insurance Complementary Health screening for 35 yrs. and above Your key responsibilities Provide hands on technical support for a suite of applications/platforms within Deutsche Bank Build up technical subject matter expertise on the applications/platforms being supported including business flows, the application architecture, and the hardware configuration. Resolve service requests submitted by the application end users to the best of L2 ability and escalate any issues that cannot be resolved to L3. Conduct real time monitoring to ensure application SLAs are achieved and maximum application availability (up time). Assist in the process to approve all new releases and production configuration changes, keep stakeholders informed and conduct any release tasks assigned to support. Manage incidents through to resolution keeping all stakeholders abreast of the situation and working to minimize impact wherever possible. Conduct post-mortems of incidents and drive relevant feedback into Incident, Problem and Change management programs. Build and maintain effective and productive relationships with the stakeholders in business, development, infrastructure, and third-party systems / data providers & vendors. Ensure all knowledge is documented and that support runbooks and knowledge articles are kept up to date. Approach support with a proactive attitude, working to improve the environment before issues occur. The candidate may have to work in shifts as part of a rota covering APAC and EMEA hours between 06:30 IST and 10:30 PM IST (2 shifts) Weekend coverage may need to be provided on rotational basis. Your skills and experience 6 to 8 years providing hands on IT support and interacting with application end users. Preferred: Experience in an investment bank, financial institution, or large corporation. Bachelors degree from an accredited college or university with a concentration in Computer Science or IT-related discipline (or equivalent work experience or diploma). Good analytical and problem-solving skills. Exceptional written and oral communication skills, including the ability to communicate technical information to a non-technical audience and with executive levels. Understanding of ITIL / SRE best practices for supporting a production environment Preferred: Experience in Google Cloud Understanding of how to get things done in large organizations, where to use processes and how to build and operate a network. Ability to work across countries, regions, and time zones with a broad range of cultures and technical capability. TECHNICAL COMPETENCIES Experience using operating systems such as UNIX, Linux and Wintel from the command line interface. Knowledge of commands need to navigate, troubleshoot issues and provide status of these systems. Preferred: Familiarity with coding language such as JAVA and .Net or Perl/Shell scripting. Preferred: Experience with Java hosting environments like WebSphere, Tomcat etc. Ability to write SQL to extract and patch data in Oracle databases as well as monitor database health and performance. Experience of monitoring tools such as Geneos and New Relic. Experience and hands-on on IBM BPM, Camunda , WAS etc. (preferred) Experience on RPA platforms like Blueprism/Chatbots (Preferred) Experience in Devops/SRE. Knowledge and development experience in Ansible automation. Experience in shell scripting, python. How we'll support you Training and development to help you excel in your career Coaching and support from experts in your team A culture of continuous learning to aid progression A range of flexible benefits that you can tailor to suit your needs

Posted 2 months ago

Apply

6.0 - 8.0 years

37 - 40 Lacs

Pune

Work from Office

: Job TitleProduction Specialist, AVP LocationPune, India Role Description Our organization within Deutsche Bank is AFC Production Services. We are responsible for providing technical L2 application support for business applications. The AFC (Anti-Financial Crime) line of business has a current portfolio of 25+ applications. The organization is in process of transforming itself using Google Cloud and many new technology offerings. As an Assistant Vice President, your role will include hands-on production support and be actively involved in technical issues resolution across multiple applications. You will also be working as application lead and will be responsible for technical & operational processes for all application you support. Deutsche Banks Corporate Bank division is a leading provider of cash management, trade finance and securities finance. We complete green-field projects that deliver the best Corporate Bank - Securities Services products in the world. Our team is diverse, international, and driven by shared focus on clean code and valued delivery. At every level, agile minds are rewarded with competitive pay, support, and opportunities to excel. You will work as part of a cross-functional agile delivery team. You will bring an innovative approach to software development, focusing on using the latest technologies and practices, as part of a relentless focus on business value. You will be someone who sees engineering as team activity, with a predisposition to open code, open discussion and creating a supportive, collaborative environment. You will be ready to contribute to all stages of software delivery, from initial analysis right through to production support." What we'll offer you As part of our flexible scheme, here are just some of the benefits that youll enjoy, Best in class leave policy. Gender neutral parental leaves 100% reimbursement under childcare assistance benefit (gender neutral) Sponsorship for Industry relevant certifications and education Employee Assistance Program for you and your family members Comprehensive Hospitalization Insurance for you and your dependents Accident and Term life Insurance Complementary Health screening for 35 yrs. and above Your key responsibilities Provide technical support by handling and consulting on BAU, Incidents/emails/alerts for the respective applications. Perform post-mortem, root cause analysis using ITIL standards of Incident Management, Service Request fulfillment, Change Management, Knowledge Management, and Problem Management. Manage regional L2 team and vendor teams supporting the application. Ensure the team is up to speed and picks up the support duties. Build up technical subject matter expertise on the applications being supported including business flows, application architecture, and hardware configuration. Define and track KPIs, SLAs and operational metrics to measure and improve application stability and performance. Conduct real time monitoring to ensure application SLAs are achieved and maximum application availability (up time) using an array of monitoring tools. Build and maintain effective and productive relationships with the stakeholders in business, development, infrastructure, and third-party systems / data providers & vendors. Assist in the process to approve application code releases as well as tasks assigned to support to perform. Keep key stakeholders informed using communication templates. Approach support with a proactive attitude, desire to seek root cause, in-depth analysis, and strive to reduce inefficiencies and manual efforts. Mentor and guide junior team members, fostering technical upskill and knowledge sharing. Provide strategic input into disaster recovery planning, failover strategies and business continuity procedures Collaborate and deliver on initiatives and install these initiatives to drive stability in the environment. Perform reviews of all open production items with the development team and push for updates and resolutions to outstanding tasks and reoccurring issues. Drive service resilience by implementing SRE(site reliability engineering) principles, ensuring proactive monitoring, automation and operational efficiency. Ensure regulatory and compliance adherence, managing audits,access reviews, and security controls in line with organizational policies. The candidate will have to work in shifts as part of a Rota covering APAC and EMEA hours between 07:00 IST and 09:00 PM IST (2 shifts). In the event of major outages or issues we may ask for flexibility to help provide appropriate cover. Weekend on-call coverage needs to be provided on rotational/need basis. Your skills and experience 9-15 years of experience in providing hands on IT application support. Experience in managing vendor teams providing 24x7 support. Preferred Team lead role experience, Experience in an investment bank, financial institution. Bachelors degree from an accredited college or university with a concentration in Computer Science or IT-related discipline (or equivalent work experience/diploma/certification). Preferred ITIL v3 foundation certification or higher. Knowledgeable in cloud products like Google Cloud Platform (GCP) and hybrid applications. Strong understanding of ITIL /SRE/ DEVOPS best practices for supporting a production environment. Understanding of KPIs, SLO, SLA and SLI Monitoring ToolsKnowledge of Elastic Search, Control M, Grafana, Geneos, OpenShift, Prometheus, Google Cloud Monitoring, Airflow,Splunk. Working Knowledge of creation of Dashboards and reports for senior management Red Hat Enterprise Linux (RHEL) professional skill in searching logs, process commands, start/stop processes, use of OS commands to aid in tasks needed to resolve or investigate issues. Shell scripting knowledge a plus. Understanding of database concepts and exposure in working with Oracle, MS SQL, Big Query etc. databases. Ability to work across countries, regions, and time zones with a broad range of cultures and technical capability. Skills That Will Help You Excel Strong written and oral communication skills, including the ability to communicate technical information to a non-technical audience and good analytical and problem-solving skills. Proven experience in leading L2 support teams, including managing vendor teams and offshore resources. Able to train, coach, and mentor and know where each technique is best applied. Experience with GCP or another public cloud provider to build applications. Experience in an investment bank, financial institution or large corporation using enterprise hardware and software. Knowledge of Actimize, Mantas, and case management software is good to have. Working knowledge of Big Data Hadoop/Secure Data Lake is a plus. Prior experience in automation projects is great to have. Exposure to python, shell, Ansible or other scripting language for automation and process improvement Strong stakeholder management skills ensuring seamless coordination between business, development, and infrastructure teams. Ability to manage high-pressure issues, coordinating across teams to drive swift resolution. Strong negotiation skills with interface teams to drive process improvements and efficiency gains. How we'll support you Training and development to help you excel in your career. Coaching and support from experts in your team A culture of continuous learning to aid progression. A range of flexible benefits that you can tailor to suit your needs. About us and our teams Please visit our company website for further information: https://www.db.com/company/company.htm We strive for a culture in which we are empowered to excel together every day. This includes acting responsibly, thinking commercially, taking initiative and working collaboratively. Together we share and celebrate the successes of our people. Together we are Deutsche Bank Group. We welcome applications from all people and promote a positive, fair and inclusive work environment.

Posted 2 months ago

Apply

4.0 - 8.0 years

6 - 10 Lacs

Bengaluru

Work from Office

Back At BCE Global Tech, immerse yourself in exciting projects that are shaping the future of both consumer and enterprise telecommunications This involves building innovative mobile apps to enhance user experiences and enable seamless connectivity on-the-go Thrive in diverse roles like Full Stack Developer, Backend Developer, UI/UX Designer, DevOps Engineer, Cloud Engineer, Data Science Engineer, and Scrum Master; at a workplace that encourages you to freely share your bold and different ideas If you are passionate about technology and eager to make a difference, we want to hear from you! Apply now to join our dynamic team in Bengaluru We are seeking a talented Site Reliability Engineer (SRE) to join our team The ideal candidate will have a strong background in software engineering and systems administration, with a passion for building scalable and reliable systems As an SRE, you will collaborate with development and operations teams to ensure our services are reliable, performant, and highly available Key Responsibilities "Ensure the 24/7 operations and reliability of data services in our production GCP and on-premise Hadoop environments Collaborate with the data engineering development team to design, build, and maintain scalable, reliable, and secure data pipelines and systems Develop and implement monitoring, alerting, and incident response strategies to proactively identify and resolve issues before they impact production Drive the implementation of security and reliability best practices across the software development life cycle Contribute to the development of tools and automation to streamline the management and operation of data services Participate in on-call rotation and respond to incidents in a timely and effective manner Continuously evaluate and improve the reliability, scalability, and performance of data services" Technology Skills 4+ years of experience in site reliability engineering or a similar role Strong experience with Google Cloud Platform (GCP) services, including BigQuery, Dataflow, Pub/Sub, and Cloud Storage Experience with on-premise Hadoop environments and related technologies (HDFS, Hive, Spark, etc ) Proficiency in at least one programming language (Python, Scala, Java, Go, etc ) Required Qualifications To Be Successful In This Role Bachelors degree in computer science engineering, or related field 8 -10 years of experience as a SRE Proven experience as an SRE, DevOps engineer, or similar role Strong problem-solving skills and ability to work under pressure Excellent communication and collaboration skills Flexible to work in EST time zones ( 9-5 EST) Additional Information Job Type Full Time Work ProfileHybrid (Work from Office/ Remote) Years of Experience8-10 Years LocationBangalore What We Offer Competitive salaries and comprehensive health benefits Flexible work hours and remote work options Professional development and training opportunities A supportive and inclusive work environment

Posted 2 months ago

Apply

4.0 - 8.0 years

6 - 10 Lacs

Bengaluru

Work from Office

Back At BCE Global Tech, immerse yourself in exciting projects that are shaping the future of both consumer and enterprise telecommunications This involves building innovative mobile apps to enhance user experiences and enable seamless connectivity on-the-go Thrive in diverse roles like Full Stack Developer, Backend Developer, UI/UX Designer, DevOps Engineer, Cloud Engineer, Data Science Engineer, and Scrum Master; at a workplace that encourages you to freely share your bold and different ideas If you are passionate about technology and eager to make a difference, we want to hear from you! Apply now to join our dynamic team in Bengaluru We're seeking a dedicated Site Reliability Engineer to join our team In this role, you will be responsible for maintaining the reliability, scalability, and performance of our systems You'll implement best practices for monitoring, incident response, and automation to ensure seamless operations Your expertise will help us build resilient infrastructure, reduce downtime, and enhance the overall user experience Key Responsibilities Experience working with various monitoring tools (eg ELK, Dyntrace, Cloudwatch, Cloud logging, Cloud Monitoring, BMC Surveyor, BMC Patrol, Grafana, Prometheus) Ensure monitoring and self-healing strategies are implemented and maintained to proactively prevent production incidents Perform root cause analysis of production issues Design and manage on call and escalation processes- Nice to Have Participate in design reviews and production reviews for new features, products, or pieces of infrastructure Designing and implementing ELK (Elasticsearch, Logstash and Kibana) stack, Prometheus and Grafana solutions for monitoring and alerting Debug production issues across services and levels of the stack Establish KPIs to demonstrate maturity, efficiency, and value to our business partners Works as an integral part of the DevOps team with complimentary skills and common goals L3 Support experience is an asset Work to create a Release management process and help with Out-of-business-hour deployments and support (Rotation with team members) Familiar and comfortable with agile development techniques Technology Skills (Mandatory) ELK, Dyntrace, Cloudwatch, Cloud logging, Cloud Monitoring, BMC Surveyor, BMC Patrol, Grafana, Prometheus Required Qualifications To Be Successful In This Role Bachelors degree in computer science engineering, or related field 8 -10 years of experience as a SRE Proven experience as an SRE, DevOps engineer, or similar role Strong programming skills in languages such as Python, Go, Java, or Ruby Strong problem-solving skills and ability to work under pressure Excellent communication and collaboration skills Flexible to work in EST time zones ( 9-5 EST) Competitive salaries and comprehensive health benefits Flexible work hours and remote work options Professional development and training opportunities A supportive and inclusive work environment

Posted 2 months ago

Apply

3.0 - 6.0 years

10 - 15 Lacs

Bengaluru

Work from Office

We are looking for a Senior Site Reliability Engineer, to join our Service Reliability and Operation group. We provide innovative team collaboration and an opportunity to build, operate and support scalable and reliable services that underpin Thomson Reuters’ products. About the Role: In this opportunity as a Senior Site Reliability Engineer , you will be responsible to: Be a Professional SRE: Implement site reliability engineering and DevOps best practices. Feed non-functional requirements into the product backlog, such as, but not limited to, high availability, scalability, self-healing, observability, continuous delivery, security Build and maintain monitoring for all aspects of infrastructure, micro-services and the platform and implement Alerting mechanism using cloud native solutions Provide primary operational support and engineering for distributed platforms Act as the go to person for any production issue. Troubleshoot and monitor until successful mitigation, communicate effectively, postmortem and implementation of the learnings. Maintain IaC and CICD and promote best practices for our CI/CD processes Focus on Continuous improvement and technical standards – drive improvements in productivity, monitoring, tooling and set industry best practices. On-call Rotation: Participate in on-call/shift rotations (L2). About You: You’re a fit for the role of Senior Site Reliability Engineer if your background includes: Bachelor’s degree in computer science or related field - a must Minimum of 7+ years of experience as DevOps/SRE engineer and Cloud engineer with hands-on experience in AWS cloud technologies. Highly skilled in UNIX/Linux-based Systems Proven experience in building and operating PRODUCTION cloud-native infrastructure, applications, and services on AWS or Azure. Experience or knowledge of Container technology such as Docker, Kubernetes and Istio service mesh Must have experience using AWS services (such as Cloud Front, EKS, ECS, RDS, Threat detection and other security controls) or Azure services (such as AKS, ACR, Entra ID Network Security Group) or Azure services (such as AKS, ACR, Entra ID Network Security Group) Must have 2+ scripting and programming experience (PowerShell, Bash) Experience or knowledge of Observability toolsDataDog, ELK, SumoLogic, CloudWatch, Azure Monitor Experience or knowledge with Version Control and CI/CD (Git/ Azure DevOps / JFrog Artifactory) Experience or knowledge writing Infrastructure as Code (IaC) (Terraform / CloudFormation / ArgoCD / other) Team player with a can-do attitude #LI-PS1 What’s in it For You Hybrid Work Model We’ve adopted a flexible hybrid working environment (2-3 days a week in the office depending on the role) for our office-based roles while delivering a seamless experience that is digitally and physically connected. Flexibility & Work-Life Balance: Flex My Way is a set of supportive workplace policies designed to help manage personal and professional responsibilities, whether caring for family, giving back to the community, or finding time to refresh and reset. This builds upon our flexible work arrangements, including work from anywhere for up to 8 weeks per year, empowering employees to achieve a better work-life balance. Career Development and Growth: By fostering a culture of continuous learning and skill development, we prepare our talent to tackle tomorrow’s challenges and deliver real-world solutions. Our Grow My Way programming and skills-first approach ensures you have the tools and knowledge to grow, lead, and thrive in an AI-enabled future. Industry Competitive Benefits We offer comprehensive benefit plans to include flexible vacation, two company-wide Mental Health Days off, access to the Headspace app, retirement savings, tuition reimbursement, employee incentive programs, and resources for mental, physical, and financial wellbeing. Culture: Globally recognized, award-winning reputation for inclusion and belonging, flexibility, work-life balance, and more. We live by our valuesObsess over our Customers, Compete to Win, Challenge (Y)our Thinking, Act Fast / Learn Fast, and Stronger Together. Social Impact Make an impact in your community with our Social Impact Institute. We offer employees two paid volunteer days off annually and opportunities to get involved with pro-bono consulting projects and Environmental, Social, and Governance (ESG) initiatives. Making a Real-World Impact: We are one of the few companies globally that helps its customers pursue justice, truth, and transparency. Together, with the professionals and institutions we serve, we help uphold the rule of law, turn the wheels of commerce, catch bad actors, report the facts, and provide trusted, unbiased information to people all over the world. About Us Thomson Reuters informs the way forward by bringing together the trusted content and technology that people and organizations need to make the right decisions. We serve professionals across legal, tax, accounting, compliance, government, and media. Our products combine highly specialized software and insights to empower professionals with the data, intelligence, and solutions needed to make informed decisions, and to help institutions in their pursuit of justice, truth, and transparency. Reuters, part of Thomson Reuters, is a world leading provider of trusted journalism and news. We are powered by the talents of 26,000 employees across more than 70 countries, where everyone has a chance to contribute and grow professionally in flexible work environments. At a time when objectivity, accuracy, fairness, and transparency are under attack, we consider it our duty to pursue them. Sound excitingJoin us and help shape the industries that move society forward. As a global business, we rely on the unique backgrounds, perspectives, and experiences of all employees to deliver on our business goals. To ensure we can do that, we seek talented, qualified employees in all our operations around the world regardless of race, color, sex/gender, including pregnancy, gender identity and expression, national origin, religion, sexual orientation, disability, age, marital status, citizen status, veteran status, or any other protected classification under applicable law. Thomson Reuters is proud to be an Equal Employment Opportunity Employer providing a drug-free workplace. We also make reasonable accommodations for qualified individuals with disabilities and for sincerely held religious beliefs in accordance with applicable law. More information on requesting an accommodation here. Learn more on how to protect yourself from fraudulent job postings here. More information about Thomson Reuters can be found on thomsonreuters.com.

Posted 2 months ago

Apply

6.0 - 10.0 years

10 - 15 Lacs

Bengaluru

Work from Office

We are looking for a Senior Site Reliability Engineer, to join our Service Reliability and Operation group. We provide innovative team collaboration and an opportunity to build, operate and support scalable and reliable services that underpin Thomson Reuters’ products. About the Role In this opportunity as a Senior Site Reliability Engineer, you will: Be a Professional SRE: Implement site reliability engineering and DevOps best practices. Feed non-functional requirements into the product backlog, such as, but not limited to, high availability, scalability, self-healing, observability, continuous delivery, security Build and maintain monitoring for all aspects of infrastructure, micro-services and the platform and implement Alerting mechanism using cloud native solutions Provide primary operational support and engineering for distributed platforms Act as the go to person for any production issue. Troubleshoot and monitor until successful mitigation, communicate effectively, postmortem and implementation of the learnings. Maintain IaC and CICD and promote best practices for our CI/CD processes Focus on Continuous improvement and technical standards – drive improvements in productivity, monitoring, tooling and set industry best practices. On-call Rotation: Participate in on-call/shift rotations. When on-call, you are expected to drive the troubleshooting and mitigation activities while working on incident Be innovative and curious: Maintain end-to-end security ensuring that we meet best practices standards Keep up-to-date with emerging cloud technology trends, especially around DevOps, Service Reliability and Security. Adopt pan-TR operation principles to ensure consistency and efficiency Documenting “tribal” knowledge. Constant upkeep of documentation and runbooks can ensure that teams get the information they need right when they need it Be collaborative: Extreme collaboration within our teams – Canada, US, Mexico and India About you: You’re a fit for the role of Senior Site Reliability Engineer if you: Bachelor’s degree in computer science or related field - a must Minimum of 6-10 years of experience as DevOps/SRE engineer and Cloud engineer with hands-on experience in AWS cloud technologies. Highly skilled in UNIX/Linux-based Systems Proven experience in building and operating PRODUCTION cloud-native infrastructure, applications, and services on AWS. Experience or knowledge of Container technology such as Docker, Kubernetes and Istio service mesh Must have experience using AWS services (such as Cloud Front, EKS, ECS, RDS, Threat detection and other security controls) Must have 2+ years scripting and programming experience (PowerShell, Bash) Experience or knowledge of Observability toolsDataDog, ELK, SumoLogic, CloudWatch Experience or knowledge with Version Control and CI/CD (Git/ Azure DevOps / JFrog Artifactory) Experience or knowledge writing Infrastructure as Code (IaC) (Terraform / CloudFormation / other) Team player with a can do attitude#LI-SS6 What’s in it For You Hybrid Work Model We’ve adopted a flexible hybrid working environment (2-3 days a week in the office depending on the role) for our office-based roles while delivering a seamless experience that is digitally and physically connected. Flexibility & Work-Life Balance: Flex My Way is a set of supportive workplace policies designed to help manage personal and professional responsibilities, whether caring for family, giving back to the community, or finding time to refresh and reset. This builds upon our flexible work arrangements, including work from anywhere for up to 8 weeks per year, empowering employees to achieve a better work-life balance. Career Development and Growth: By fostering a culture of continuous learning and skill development, we prepare our talent to tackle tomorrow’s challenges and deliver real-world solutions. Our Grow My Way programming and skills-first approach ensures you have the tools and knowledge to grow, lead, and thrive in an AI-enabled future. Industry Competitive Benefits We offer comprehensive benefit plans to include flexible vacation, two company-wide Mental Health Days off, access to the Headspace app, retirement savings, tuition reimbursement, employee incentive programs, and resources for mental, physical, and financial wellbeing. Culture: Globally recognized, award-winning reputation for inclusion and belonging, flexibility, work-life balance, and more. We live by our valuesObsess over our Customers, Compete to Win, Challenge (Y)our Thinking, Act Fast / Learn Fast, and Stronger Together. Social Impact Make an impact in your community with our Social Impact Institute. We offer employees two paid volunteer days off annually and opportunities to get involved with pro-bono consulting projects and Environmental, Social, and Governance (ESG) initiatives. Making a Real-World Impact: We are one of the few companies globally that helps its customers pursue justice, truth, and transparency. Together, with the professionals and institutions we serve, we help uphold the rule of law, turn the wheels of commerce, catch bad actors, report the facts, and provide trusted, unbiased information to people all over the world. About Us Thomson Reuters informs the way forward by bringing together the trusted content and technology that people and organizations need to make the right decisions. We serve professionals across legal, tax, accounting, compliance, government, and media. Our products combine highly specialized software and insights to empower professionals with the data, intelligence, and solutions needed to make informed decisions, and to help institutions in their pursuit of justice, truth, and transparency. Reuters, part of Thomson Reuters, is a world leading provider of trusted journalism and news. We are powered by the talents of 26,000 employees across more than 70 countries, where everyone has a chance to contribute and grow professionally in flexible work environments. At a time when objectivity, accuracy, fairness, and transparency are under attack, we consider it our duty to pursue them. Sound excitingJoin us and help shape the industries that move society forward. As a global business, we rely on the unique backgrounds, perspectives, and experiences of all employees to deliver on our business goals. To ensure we can do that, we seek talented, qualified employees in all our operations around the world regardless of race, color, sex/gender, including pregnancy, gender identity and expression, national origin, religion, sexual orientation, disability, age, marital status, citizen status, veteran status, or any other protected classification under applicable law. Thomson Reuters is proud to be an Equal Employment Opportunity Employer providing a drug-free workplace. We also make reasonable accommodations for qualified individuals with disabilities and for sincerely held religious beliefs in accordance with applicable law. More information on requesting an accommodation here. Learn more on how to protect yourself from fraudulent job postings here. More information about Thomson Reuters can be found on thomsonreuters.com.

Posted 2 months ago

Apply

3.0 - 6.0 years

10 - 15 Lacs

Hyderabad

Work from Office

As a senior site reliability engineer will work in our global organization to provide operational support for all Thomson Reuters products, including development tools and infrastructure used by engineering teams to build and test their applications. They will also collaborate with engineering teams on continuous integration/continuous deployment (CI/CD), monitoring, alerts, and other areas of operations support.About the Role: Develop, Deliver, and SupportBy applying modern SRE operational & development practices, you will be involved in the entire operational support, Monitoring, automation, building, and delivering high-quality solutions for the team. Be a Team PlayerWorking in a collaborative team-oriented environment, you will share information, value diverse ideas, and partner with cross-functional and remote teams. Be an Agile Person with a strong sense of urgency and a desire to work in a fast-paced, dynamic environment, you will deliver solutions against strict timelines. Be Innovativeyou are empowered to try new approaches and learn new technologies. You will contribute innovative ideas, create solutions, and be accountable for end-to-end deliveries. Be an Effective Communicatorthrough active engagement and communication with cross-functional partners and team members, you will effectively articulate ideas and collaborate on technical developments. About You: Experienced Site Reliability Engineer with 6+ years of experience in DevOps, SRE roles. Keen to learn complex architectures and come up to speed quickly. A self-learner, self-driven, and able to operate with minimal supervision. Able to demonstrate ownership of accountabilities. Able to successfully communicate with business partners, management, and technical team members. Experienced SRE with development or DevOps background, worked on enterprise-scale applications. Proficient user of AWS, OCI and Monitoring tools like DataDog etc. AWS SysOps Associate or DevOps professional certified is a plus. Proactive in raising problems and identifying solutions. Strong sense of customer service. Able to work in a highly collaborative team setting. Approaching work with a DevOps and continuous improvement mindset #LI-PS1 What’s in it For You Hybrid Work Model We’ve adopted a flexible hybrid working environment (2-3 days a week in the office depending on the role) for our office-based roles while delivering a seamless experience that is digitally and physically connected. Flexibility & Work-Life Balance: Flex My Way is a set of supportive workplace policies designed to help manage personal and professional responsibilities, whether caring for family, giving back to the community, or finding time to refresh and reset. This builds upon our flexible work arrangements, including work from anywhere for up to 8 weeks per year, empowering employees to achieve a better work-life balance. Career Development and Growth: By fostering a culture of continuous learning and skill development, we prepare our talent to tackle tomorrow’s challenges and deliver real-world solutions. Our Grow My Way programming and skills-first approach ensures you have the tools and knowledge to grow, lead, and thrive in an AI-enabled future. Industry Competitive Benefits We offer comprehensive benefit plans to include flexible vacation, two company-wide Mental Health Days off, access to the Headspace app, retirement savings, tuition reimbursement, employee incentive programs, and resources for mental, physical, and financial wellbeing. Culture: Globally recognized, award-winning reputation for inclusion and belonging, flexibility, work-life balance, and more. We live by our valuesObsess over our Customers, Compete to Win, Challenge (Y)our Thinking, Act Fast / Learn Fast, and Stronger Together. Social Impact Make an impact in your community with our Social Impact Institute. We offer employees two paid volunteer days off annually and opportunities to get involved with pro-bono consulting projects and Environmental, Social, and Governance (ESG) initiatives. Making a Real-World Impact: We are one of the few companies globally that helps its customers pursue justice, truth, and transparency. Together, with the professionals and institutions we serve, we help uphold the rule of law, turn the wheels of commerce, catch bad actors, report the facts, and provide trusted, unbiased information to people all over the world. About Us Thomson Reuters informs the way forward by bringing together the trusted content and technology that people and organizations need to make the right decisions. We serve professionals across legal, tax, accounting, compliance, government, and media. Our products combine highly specialized software and insights to empower professionals with the data, intelligence, and solutions needed to make informed decisions, and to help institutions in their pursuit of justice, truth, and transparency. Reuters, part of Thomson Reuters, is a world leading provider of trusted journalism and news. We are powered by the talents of 26,000 employees across more than 70 countries, where everyone has a chance to contribute and grow professionally in flexible work environments. At a time when objectivity, accuracy, fairness, and transparency are under attack, we consider it our duty to pursue them. Sound excitingJoin us and help shape the industries that move society forward. As a global business, we rely on the unique backgrounds, perspectives, and experiences of all employees to deliver on our business goals. To ensure we can do that, we seek talented, qualified employees in all our operations around the world regardless of race, color, sex/gender, including pregnancy, gender identity and expression, national origin, religion, sexual orientation, disability, age, marital status, citizen status, veteran status, or any other protected classification under applicable law. Thomson Reuters is proud to be an Equal Employment Opportunity Employer providing a drug-free workplace. We also make reasonable accommodations for qualified individuals with disabilities and for sincerely held religious beliefs in accordance with applicable law. More information on requesting an accommodation here. Learn more on how to protect yourself from fraudulent job postings here. More information about Thomson Reuters can be found on thomsonreuters.com.

Posted 2 months ago

Apply

4.0 - 5.0 years

8 - 12 Lacs

Bengaluru

Work from Office

As a SRE at Thomson Reuters you will be responsible for supporting our cloud infrastructure by designing, implementing, maintaining and monitoring systems that make it easy to build, deploy, monitor and scale applications. You will also play a key role in building a culture of reliability within the organization and collaborating closely with development teams to ensure the highest standards of system availability are achieved. About The Role We are looking for candidates who have experience working with complex technical environments and possess strong communication skills to effectively articulate issues and solutions to both technical and non-technical audiences. play a key role in building a culture of reliability within the organization and collaborating closely with development teams to ensure the highest standards of system availability are achieved. Collaborating with senior management and other stakeholders on projects About You 4-5 years required in AWS Cloud, Devops, terraform, Cloudformation, CI/CD, Monitoring, observability, Enterprise Application Support, Infrastructure as Code, Cloud security, Database fundamentals, Java Application knowledge, cloud administration & App support. Experience on System admin (Linux), cloud automation Implement best practices for security and compliance within AWS environments. Automate infrastructure provisioning, monitoring, and scaling using tools like Terraform, CloudFormation, or AWS SDKs. Excellent interpersonal, presentation, verbal and written communication skills Strong analytical and problem solving skills A bachelors degree in Computer Science or related field; masters degree preferred #LI-SA1 What’s in it For You Hybrid Work Model We’ve adopted a flexible hybrid working environment (2-3 days a week in the office depending on the role) for our office-based roles while delivering a seamless experience that is digitally and physically connected. Flexibility & Work-Life Balance: Flex My Way is a set of supportive workplace policies designed to help manage personal and professional responsibilities, whether caring for family, giving back to the community, or finding time to refresh and reset. This builds upon our flexible work arrangements, including work from anywhere for up to 8 weeks per year, empowering employees to achieve a better work-life balance. Career Development and Growth: By fostering a culture of continuous learning and skill development, we prepare our talent to tackle tomorrow’s challenges and deliver real-world solutions. Our Grow My Way programming and skills-first approach ensures you have the tools and knowledge to grow, lead, and thrive in an AI-enabled future. Industry Competitive Benefits We offer comprehensive benefit plans to include flexible vacation, two company-wide Mental Health Days off, access to the Headspace app, retirement savings, tuition reimbursement, employee incentive programs, and resources for mental, physical, and financial wellbeing. Culture: Globally recognized, award-winning reputation for inclusion and belonging, flexibility, work-life balance, and more. We live by our valuesObsess over our Customers, Compete to Win, Challenge (Y)our Thinking, Act Fast / Learn Fast, and Stronger Together. Social Impact Make an impact in your community with our Social Impact Institute. We offer employees two paid volunteer days off annually and opportunities to get involved with pro-bono consulting projects and Environmental, Social, and Governance (ESG) initiatives. Making a Real-World Impact: We are one of the few companies globally that helps its customers pursue justice, truth, and transparency. Together, with the professionals and institutions we serve, we help uphold the rule of law, turn the wheels of commerce, catch bad actors, report the facts, and provide trusted, unbiased information to people all over the world. About Us Thomson Reuters informs the way forward by bringing together the trusted content and technology that people and organizations need to make the right decisions. We serve professionals across legal, tax, accounting, compliance, government, and media. Our products combine highly specialized software and insights to empower professionals with the data, intelligence, and solutions needed to make informed decisions, and to help institutions in their pursuit of justice, truth, and transparency. Reuters, part of Thomson Reuters, is a world leading provider of trusted journalism and news. We are powered by the talents of 26,000 employees across more than 70 countries, where everyone has a chance to contribute and grow professionally in flexible work environments. At a time when objectivity, accuracy, fairness, and transparency are under attack, we consider it our duty to pursue them. Sound excitingJoin us and help shape the industries that move society forward. As a global business, we rely on the unique backgrounds, perspectives, and experiences of all employees to deliver on our business goals. To ensure we can do that, we seek talented, qualified employees in all our operations around the world regardless of race, color, sex/gender, including pregnancy, gender identity and expression, national origin, religion, sexual orientation, disability, age, marital status, citizen status, veteran status, or any other protected classification under applicable law. Thomson Reuters is proud to be an Equal Employment Opportunity Employer providing a drug-free workplace. We also make reasonable accommodations for qualified individuals with disabilities and for sincerely held religious beliefs in accordance with applicable law. More information on requesting an accommodation here. Learn more on how to protect yourself from fraudulent job postings here. More information about Thomson Reuters can be found on thomsonreuters.com.

Posted 2 months ago

Apply

6.0 - 11.0 years

22 - 37 Lacs

Gurugram, Delhi / NCR

Hybrid

The SRE team at GreyOrange is responsible for monitoring the stability and availability of mission-critical production systems, managing incidents for quicker resolution, and establishing BAU. The team also manages and maintains internal tools/infra which is consumed by other development teams. The experienced SRE will play a crucial role in ensuring the reliability, scalability, capacity planning, and performance of our infrastructure and applications. The ideal candidate will have a strong background in software engineering, system administration, containerization, and cloud technologies. Requirements Should have 6 to 11 years of experience Well-versed with scripting/programming languages (Python/Bash/PowerShell, etc.) to automate manual work, particularly within cloud environments Well-versed with Observability tools (Grafana, Splunk, Dynatrace) for monitoring, alerting, and logging solutions to identify and address potential issues, especially in cloud infrastructure Working experience with automation tools (Jenkins, GitLab, Ansible/Chef for configuration management) and processes to streamline deployment, monitoring, and management of systems and applications in the cloud Hands-on experience with containerization and orchestration technologies such as Docker, Kubernetes, or similar, particularly in cloud-native environments Well aware of SLI, SLO, SLA, and Error Budget concepts and their implementations; provide on-call support and participate in incident management & response activities as needed Expert with troubleshooting production issues and bugs. Good knowledge of Unix systems, networking, web technologies, and databases. Incident Management experience coupled with effective communication skills for production workload. Working knowledge in any one of the cloud platforms (AWS or GCP) What you'll do? Lead reliability engineering projects and drive them to closure. Ensure system stability and high availability by proactively monitoring performance and troubleshooting issues Design, build and maintain efficient, reliable, and scalable cloud-based infrastructure and services Automate processes and find opportunities to improve the observability and availability of the Platform to reduce toil. Implement and manage observability tools for comprehensive monitoring, alerting, and logging Own end-to-end availability and performance of different services & tools. Practice sustainable incident response and blameless postmortems. Provide on-call support for incident management and participate actively in response activities

Posted 2 months ago

Apply

6.0 - 11.0 years

20 - 25 Lacs

Chennai, Bengaluru

Work from Office

Job Summary: We are seeking an experienced IT Technical Service Delivery Manager to lead end-to-end service delivery across a diverse technology landscape. The ideal candidate will have a strong background in managing high-performing teams, coordinating global delivery models (onshore/offshore), Maintaining strong client relationships, and driving complex IT initiatives using Agile and Scrum frameworks. This role requires deep technical acumen in modern Technologies including Azure, AWS, DevOps, SRE, low-code/no-code platforms, and generative AI. The candidate must be a proactive leader who can operate at both strategic and technical levels, ensuring quality delivery, customer satisfaction, and continuous improvement. Key Responsibilities: • Lead and manage cross-functional IT delivery teams, including onshore and offshore resources. • Serve as the primary point of contact for clients; build and maintain strong client relationships. • Drive delivery of complex technical projects ensuring alignment with scope, timelines, budgets, and quality standards. • Oversee project planning, estimation, budgeting, resource allocation, and risk management. • Ensure adherence to Agile and Scrum methodologies, facilitating sprint planning, reviews, and retrospectives. • Identify, manage, and resolve project risks, issues, and escalations proactively. • Work closely with engineering and product teams to ensure seamless integration of DevOps, SRE, and platform engineering practices. • Promote innovation by exploring and integrating low-code/no-code tools and generative AI solutions. • Continuously assess team performance, provide coaching, and foster a culture of accountability and excellence. • Prepare and present status reports, KPIs, and performance metrics to stakeholders and clients.

Posted 2 months ago

Apply

8.0 - 13.0 years

15 - 30 Lacs

Bengaluru

Work from Office

Job Title : Cloud Architect AWS Location : Bangalore Shift : Rotational Experience Required : 8+ years - 13 Years Type : Full-time Job Summary We are looking for a highly skilled and experienced AWS Cloud Architect with a strong foundation in Site Reliability Engineering (SRE) practices to lead cloud transformation initiatives. The ideal candidate will have hands-on experience with AWS infrastructure , DevOps automation , security governance , cost optimization , and infrastructure as code . You will work on high-impact projects in a cloud-first environment, collaborating with cross-functional teams to ensure scalable, secure, and reliable infrastructure. Key Responsibilities Cloud Architecture & Deployment Design, build, and optimize cloud-native architectures on AWS . Lead the migration of on-premises workloads to AWS using best practices. Define and enforce cloud governance, tagging policies , and account management standards . Security, IAM, and Compliance Implement AWS IAM, PIM, PAM , and manage VPC Security Groups , NACLs , and encryption policies . Conduct cloud security assessments , enforce SOC2 , ISO 27001 , or HIPAA compliance. Work with AWS Config , AWS CloudTrail , AWS GuardDuty , and Security Hub . Infrastructure Automation & DevOps Build and maintain CI/CD pipelines using Jenkins , Git , Terraform , CloudFormation , and Ansible . Manage containerized workloads using Docker , Kubernetes , and orchestration tools. Implement Infrastructure as Code (IaC) and Configuration Management for consistent deployments. Cost Optimization & Performance Tuning Use AWS Cost Explorer , Budgets , and Trusted Advisor to monitor and reduce costs. Optimize workloads through auto-scaling , spot instances , Savings Plans , and rightsizing . Regularly audit and report on cloud spend and performance KPIs. Monitoring & Reliability Set up logging and monitoring using CloudWatch , Prometheus , Nagios , or Datadog . Define and maintain SLA/SLO/SLI metrics, runbooks, and incident response procedures. Implement blue/green deployments , rollback strategies , and chaos engineering principles. Documentation & Collaboration Maintain architecture diagrams using Lucidchart , Draw.io , or Visio . Document SOPs for cloud operations, deployments, and recovery scenarios. Collaborate with engineering, security, product, and QA teams. Skills and Experience Required Cloud Platform : AWS (EC2, S3, RDS, CloudFront, Lambda, VPC, CloudFormation) DevOps Tools : Terraform, Ansible, Jenkins, GitHub Actions, Docker, Kubernetes Security & IAM : AWS IAM, PIM/PAM, encryption, CloudTrail, GuardDuty, Security Hub Scripting : Python, Bash, Shell scripting Monitoring & Logging : CloudWatch, ELK Stack, Datadog, Nagios Networking : VPC, Subnetting, Route Tables, NAT Gateway, VPNs Certifications (preferred): AWS Certified Solutions Architect Professional AWS Certified DevOps Engineer Professional Preferred Qualifications Experience in hybrid environments (AWS + on-prem) Working knowledge of Azure or GCP is a plus Familiarity with microservices architecture Hands-on with CI/CD using GitOps or DevOps pipelines Background in ERP, retail, or high-availability enterprise SaaS platforms Soft Skills Excellent communication and stakeholder management Strong analytical and problem-solving mindset Ability to work in 24x7 environments and rotational shifts Self-motivated and team-oriented approach

Posted 2 months ago

Apply

5.0 - 10.0 years

5 - 10 Lacs

Bengaluru / Bangalore, Karnataka, India

On-site

What You ll Need: 5+ years of practical application and experience in managing SRE lifecycle practices for a mid-to-large organization. 10+ years of experience working on various infrastructure technologies, including Linux platforms, storage platforms, networking protocols, DNS/LDAP, and databases. 5+ years of direct experience troubleshooting and solving infrastructure problems. Proven process-oriented approach to solving problems and improving reliability. Proven experience with Automation, focusing on developing custom monitors and automated testing processes.

Posted 2 months ago

Apply

7.0 - 12.0 years

20 - 35 Lacs

Chennai

Work from Office

Responsibilities of the Site Reliability Engineer (SRE) SREs monitor performance, collaborate with developers, and implement system improvements to prevent failures. They also enhance uptime and balance development speed with system stability. Design and Implement Systems: SREs design and implement robust systems that ensure high availability and reliability. This involves creating architectures that are resilient to failures and can handle large traffic volumes. Automate Operational Tasks: A key responsibility is automating repetitive operational tasks. This can improve efficiency and reduce the risk of human error. This includes creating and maintaining automation scripts and tools. Monitor and Maintain System Health: SREs continuously monitor system performance using various tools and dashboards. They analyze metrics, logs, and alerts to ensure systems are running smoothly and address any issues that arise. Manage Incidents and Troubleshoot Issues: When incidents occur, SREs are responsible for troubleshooting and resolving issues quickly. They perform root cause analysis to prevent future occurrences and improve system resilience. Ensure Service Level Objectives (SLOs) and Service Level Agreements (SLAs): SREs work to meet and exceed defined SLOs and SLAs. They measure system performance against these objectives and take corrective actions if performance deviates from expected levels. Collaborate with Development Teams: SREs collaborate with development teams to integrate reliability best practices into the software development lifecycle. They ensure that new features and services meet reliability standards before deployment. Required Skills and Qualifications Coding, system architecture, and proficiency with incident management systems are essential SRE competencies. To be successful in this position, one often has to have a background in computer science or a similar discipline. Also, individuals have expertise in software development or operations. Proficiency in Programming Languages SREs should be proficient in programming languages such as: Python Bash/Shell Scripting Java Perl C/C++ JavaScript PowerShell (for Windows environments) SQL (for database management)

Posted 2 months ago

Apply

3.0 - 5.0 years

5 - 7 Lacs

Chennai

Work from Office

The SRE, Java, Power BI, Uipath role involves working with relevant technologies, ensuring smooth operations, and contributing to business objectives. Responsibilities include analysis, development, implementation, and troubleshooting within the SRE, Java, Power BI, Uipath domain.

Posted 2 months ago

Apply

3.0 - 5.0 years

15 - 18 Lacs

Pune

Work from Office

Experience: 3 to 5 years in cloud infrastructure operations, L1 incident management, automation support, and observability, with team coordination or mentoring experience. Location: Pune Shift: 24x7 Support (Rotational Shifts) Education: BE/B.Tech (Relevant certifications preferred AWS Cloud Practitioner/Associate, Azure Fundamentals, CKA, Terraform Associate) Job Summary: We are seeking a L1 Lead – Site Reliability Engineer (SRE) to guide and manage the frontline SRE team in ensuring the stability, availability, and efficiency of enterprise-scale cloud infrastructure operations. This role involves supervising incident response, ensuring adherence to runbooks and SOPs, providing technical guidance to L1 engineers, and being the key escalation point for L1 issues. You will be responsible for monitoring cloud services, triaging alerts, validating remediation efforts, mentoring junior engineers, and collaborating with L2/L3 teams for escalations and root cause analysis. Responsibilities: Lead and mentor the L1 SRE team during shifts, ensuring timely response and proper handling of incidents, service requests, and alerts. Oversee infrastructure and application monitoring using tools such as Prometheus, Grafana, AWS CloudWatch, and Azure Monitor. Validate and guide remediation actions like pod restarts, disk space cleanup, scaling, and alert verification. Ensure SOPs, runbooks , and shift handover notes are followed and updated regularly. Execute and validate predefined Ansible playbooks, Terraform scripts, and CI/CD pipelines with junior team members. Act as the first point of escalation for unresolved L1 issues and coordinate with L2/L3 teams for resolution and RCA. Govern and track shift performance, including SLA compliance, FCR (First Call Resolution), and ticket hygiene. Coordinate patching, backup checks, standard changes, and validations in AWS/Azure environments. Facilitate onboarding of new L1 engineers, and deliver knowledge-sharing and refresher training sessions. Support automation initiatives by identifying repetitive tasks and creating/reviewing simple scripts. Conduct weekly/monthly shift reports and participate in SRE governance and review calls with operations leadership. Monitor the health of Kubernetes clusters and guide the team in basic pod/node/service troubleshooting. Skills/Expertise: 3+ years of experience in cloud infrastructure operations with at least 1 year in a lead or mentoring role. Strong troubleshooting, coordination, documentation, and escalation management skills. Proven ability to lead shifts in a 24x7 support model. Familiarity with ITSM practices and SLA management ( ServiceNow or similar). Proactive and structured communicator, capable of shift planning, reporting, and stakeholder updates. Technical Skills: Experience monitoring and operating cloud-based environments with basic troubleshooting for system and application-level issues. Familiarity with cloud services and concepts across AWS, such as EC2, S3, IAM, VPC, etc and Azure DevOps services. Basic knowledge of container platforms such as Docker and Kubernetes (understanding pod/service basics, logs, etc.). Exposure to scripting using Shell, Bash, or Python for automation of routine tasks. Basic understanding of version control systems like Git, GitHub, or GitLab. Awareness of infrastructure-as-code and automation tools such as Ansible, Terraform, or CloudFormation (execution under guidance). Familiar with CI/CD concepts and tools like Jenkin or GitLab CI (executing builds, monitoring pipelines). Understanding of alerting and monitoring tools like Grafana, ELK, site 24*7, CloudWatch and Prometheus Hands-on with ITSM tools such as ServiceNow for incident and ticket tracking. Role & responsibilities Preferred candidate profile

Posted 2 months ago

Apply

2.0 - 6.0 years

5 - 9 Lacs

Kochi

Work from Office

Your Role and Responsibilities * Software developers at IBM are the backbone of our overall strategy, and software development is the essential activity that drives the success of IBM and our clients worldwide. At IBM, you will use the latest software development tools, techniques and technologies and work with leading minds in the industry to build products, path-breaking technologies, and solutions that you can be proud of. * Do you have the skills and passion for building the futureIf yes, come and be part of a niche team at IBM Software Labs focused on building an AI-driven Digital Labor platform, Watson Orchestrate, an AI platform that offers digeys (aka digital employees) with custom skills that can automate today’s businesses. Look for more details at [1]https://www.ibm.com/products/watson-orchestrate * We seek a DevOps technical leader/architect with robust expertise in designing distributed SaaS platforms and associated end-to-end build, deployment, CD/CI pipelines, frameworks, and tooling. Experience in quickly isolating problems and identifying root causes in complex production systems. The ideal candidate would have rich experience in understanding enterprise architecture and complex systems and be able to architect a solution that eases the deployment, identifies issues via monitoring and ensures the system is always highly available, reliable and resilient. ReferencesVisible links 1.https://www.ibm.com/products/watson-orchestrate Required education Bachelor's Degree Preferred education Master's Degree Required technical and professional expertise * 7+ years of experience with at least 5+ years of experience as a DevOps/SRE Architect * Designed, implemented, and supported complex distributed Saas platforms. * Deep understanding and working experience on Kubernetes, Containers, Red Hat Open Shift Clusters on AWS, AWS services, ArgoCD, Jenkins, Grafana and other pipelines, and monitoring tools. * Has an approach to troubleshooting systematically and has a deep sense of ownership * Maintains personal responsibility and commitment to address and respond to incidents quickly * Passionate about automation and innovations that improve productivity and reliability. * Experience in technically coaching and mentoring junior SRE/DevOps Engineers Preferred technical and professional experience * Good communication, collaboration, negotiation skills and technical leadership qualities * Strong Go Skills

Posted 2 months ago

Apply

4.0 - 7.0 years

2 - 6 Lacs

Bengaluru

Work from Office

Design, implement, and maintain scalable and reliable compute infrastructure, with a focus on Wintel, Linux, VMWare, and Redhat KVM environments. Collaborate with development teams to ensure applications are designed for reliability and performance across different operating systems and virtualization platforms. Automate repetitive tasks to improve efficiency and reduce manual intervention, specifically within Wintel and Linux systems. Monitor system performance, identify bottlenecks, and implement solutions to improve overall system reliability in VMWare and Redhat KVM environments. Develop and maintain tools for deployment, monitoring, and operations tailored to Wintel, Linux, VMWare, and Redhat KVM. Troubleshoot and resolve issues in development, test, and production environments, focusing on compute-related challenges. Participate in on-call rotations and respond to incidents promptly, ensuring high availability of compute resources. Implement best practices for security, compliance, and data protection within Wintel, Linux, VMWare, and Redhat KVM systems. Document processes, procedures, and system configurations specific to the compute infrastructure Primary Skills Site Reliability Engineer SRE Compute Infrastructure Wintel Administration Linux Administration VMWare Administration Redhat Proficiency in scripting languages Python, Java, C/C++, Bash Infrastructure tools Terraform, Ansible Experience with monitoring and logging tools Prometheus, Grafana, ELK stack Solid understanding of networking, security, and system administration within Wintel and Linux environments. Experience with CI/CD pipelines and tools Jenkins, GitLab CI Knowledge of database management systems MySQL, PostgreSQL

Posted 2 months ago

Apply

4.0 - 8.0 years

3 - 7 Lacs

Bengaluru

Work from Office

Design, implement, and maintain scalable and reliable compute infrastructure, with a focus on Wintel, Linux, VMWare, and Redhat KVM environments. Collaborate with development teams to ensure applications are designed for reliability and performance across different operating systems and virtualization platforms. Automate repetitive tasks to improve efficiency and reduce manual intervention, specifically within Wintel and Linux systems. Monitor system performance, identify bottlenecks, and implement solutions to improve overall system reliability in VMWare and Redhat KVM environments. Develop and maintain tools for deployment, monitoring, and operations tailored to Wintel, Linux, VMWare, and Redhat KVM. Troubleshoot and resolve issues in development, test, and production environments, focusing on compute-related challenges. Participate in on-call rotations and respond to incidents promptly, ensuring high availability of compute resources. Implement best practices for security, compliance, and data protection within Wintel, Linux, VMWare, and Redhat KVM systems. Document processes, procedures, and system configurations specific to the compute infrastructure. Primary Skills Site Reliability Engineer SRE Compute Infrastructure Wintel Administration Linux Administration VMWare Administration Redhat Proficiency in scripting languages Python, Java, C/C++, Bash Infrastructure tools Terraform, Ansible Experience with monitoring and logging tools Prometheus, Grafana, ELK stack Solid understanding of networking, security, and system administration within Wintel and Linux environments. Experience with CI/CD pipelines and tools Jenkins, GitLab CI Knowledge of database management systems MySQL, PostgreSQL

Posted 2 months ago

Apply

4.0 - 8.0 years

3 - 7 Lacs

Bengaluru

Work from Office

Primary Skills Site Reliability Engineer SRE Compute Infrastructure Wintel Administration Linux Administration VMWare Administration Redhat Proficiency in scripting languages Python, Java, C/C++, Bash Infrastructure tools Terraform, Ansible Experience with monitoring and logging tools Prometheus, Grafana, ELK stack Solid understanding of networking, security, and system administration within Wintel and Linux environments. Experience with CI/CD pipelines and tools Jenkins, GitLab CI Knowledge of database management systems MySQL, PostgreSQL Secondary Skills Proven experience as an SRE or similar role, with a focus on compute infrastructure, particularly Wintel, Linux, VMWare, and Redhat KVM. Proficiency in scripting languages (e.g., Python, Java, C/C++, Bash) and infrastructure-as-code tools (e.g., Terraform, Ansible). Experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack). Solid understanding of networking, security, and system administration within Wintel and Linux environments. Excellent problem-solving skills and attention to detail. Strong communication and collaboration skills. Experience with CI/CD pipelines and tools (e.g., Jenkins, GitLab CI). Knowledge of database management systems (e.g., MySQL, PostgreSQL). Familiarity with microservices architecture and related technologies. Ability to work in a 24x7 on-call after hour rotation environment. Experience with distributed storage technologies such as NFS, HDFS, Ceph, and Amazon S3, as well as dynamic resource management frameworks (Apache Mesos, Kubernetes, Yarn). Proactive approach identifying complex problems, performance bottlenecks, and areas for improvement. Advocate for DevOps/SRE best practices, leading postmortems/RCAs, incident retrospectives, and operational readiness reviews. Relevant experience with the following products is a plus; ServiceNow, BigFix, Tenable, CrowdStrike, Splunk, and SQL Server.

Posted 2 months ago

Apply

3.0 - 8.0 years

19 - 25 Lacs

Gurugram, Bengaluru

Work from Office

Position: Sr. DevSecOps Engineer Location: Bangalore Experience: 3-6 years Are you ready to build scalable, secure, and cloud-native systems that make a real impact? Were looking for a passionate DevSecOps Engineer to join a fast-moving team working on cutting-edge healthcare technology. What You’ll Do: Build and maintain secure, scalable cloud infrastructure. Implement and optimize CI/CD pipelines. Manage containerized deployments using Kubernetes. Enhance DevOps workflows following SRE principles. Collaborate across teams to ensure seamless product releases. Key Skills: Azure – Strong experience with cloud services and infrastructure. GIT – Proficient in version control and collaboration. Terraform – Infrastructure as Code for scalable environments. Kubernetes – Orchestration of containerized workloads. SRE – Familiarity with Site Reliability Engineering practices. Bring your expertise, creativity, and drive—and let’s build something incredible together.

Posted 2 months ago

Apply
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies