Get alerts for new jobs matching your selected skills, preferred locations, and experience range. Manage Job Alerts
6.0 - 11.0 years
10 - 15 Lacs
Hyderabad
Work from Office
As a senior site reliability engineer will work in our global organization to provide operational support for all Thomson Reuters products, including development tools and infrastructure used by engineering teams to build and test their applications. They will also collaborate with engineering teams on continuous integration/continuous deployment (CI/CD), monitoring, alerts, and other areas of operations support. About the Role: Experienced Site Reliability Engineer with 6+ years of experience in DevOps, SRE roles. Develop, Deliver, and SupportBy applying modern SRE operational & development practices, you will be involved in the entire operational support, Monitoring, automation, building, and delivering high-quality solutions for the team. Be a Team PlayerWorking in a collaborative team-oriented environment, you will share information, value diverse ideas, and partner with cross-functional and remote teams. Be an Agile Personwith a strong sense of urgency and a desire to work in a fast-paced, dynamic environment, you will deliver solutions against strict timelines. Be Innovativeyou are empowered to try new approaches and learn new technologies. You will contribute innovative ideas, create solutions, and be accountable for end-to-end deliveries. Be an Effective Communicatorthrough active engagement and communication with cross-functional partners and team members, you will effectively articulate ideas and collaborate on technical developments. About You: Keen to learn complex architectures and come up to speed quickly. A self-learner, self-driven, and able to operate with minimal supervision. Able to demonstrate ownership of accountabilities. Able to successfully communicate with business partners, management, and technical team members. Experienced SRE with development or DevOps background, worked on enterprise-scale applications. Proficient user of AWS, OCI and Monitoring tools like DataDog etc. AWS SysOps Associate or DevOps professional certified is a plus. Proactive in raising problems and identifying solutions. Strong sense of customer service. Able to work in a highly collaborative team setting. Approaching work with a DevOps and continuous improvement mindset Experience on rotational On-call #LI-PS1 Whats in it For You Hybrid Work Model Weve adopted a flexible hybrid working environment (2-3 days a week in the office depending on the role) for our office-based roles while delivering a seamless experience that is digitally and physically connected. Flexibility & Work-Life Balance: Flex My Way is a set of supportive workplace policies designed to help manage personal and professional responsibilities, whether caring for family, giving back to the community, or finding time to refresh and reset. This builds upon our flexible work arrangements, including work from anywhere for up to 8 weeks per year, empowering employees to achieve a better work-life balance. Career Development and Growth: By fostering a culture of continuous learning and skill development, we prepare our talent to tackle tomorrows challenges and deliver real-world solutions. Our Grow My Way programming and skills-first approach ensures you have the tools and knowledge to grow, lead, and thrive in an AI-enabled future. Industry Competitive Benefits We offer comprehensive benefit plans to include flexible vacation, two company-wide Mental Health Days off, access to the Headspace app, retirement savings, tuition reimbursement, employee incentive programs, and resources for mental, physical, and financial wellbeing. Culture: Globally recognized, award-winning reputation for inclusion and belonging, flexibility, work-life balance, and more. We live by our valuesObsess over our Customers, Compete to Win, Challenge (Y)our Thinking, Act Fast / Learn Fast, and Stronger Together. Social Impact Make an impact in your community with our Social Impact Institute. We offer employees two paid volunteer days off annually and opportunities to get involved with pro-bono consulting projects and Environmental, Social, and Governance (ESG) initiatives. Making a Real-World Impact: We are one of the few companies globally that helps its customers pursue justice, truth, and transparency. Together, with the professionals and institutions we serve, we help uphold the rule of law, turn the wheels of commerce, catch bad actors, report the facts, and provide trusted, unbiased information to people all over the world. Thomson Reuters informs the way forward by bringing together the trusted content and technology that people and organizations need to make the right decisions. We serve professionals across legal, tax, accounting, compliance, government, and media. Our products combine highly specialized software and insights to empower professionals with the data, intelligence, and solutions needed to make informed decisions, and to help institutions in their pursuit of justice, truth, and transparency. Reuters, part of Thomson Reuters, is a world leading provider of trusted journalism and news. We are powered by the talents of 26,000 employees across more than 70 countries, where everyone has a chance to contribute and grow professionally in flexible work environments. At a time when objectivity, accuracy, fairness, and transparency are under attack, we consider it our duty to pursue them. Sound excitingJoin us and help shape the industries that move society forward. As a global business, we rely on the unique backgrounds, perspectives, and experiences of all employees to deliver on our business goals. To ensure we can do that, we seek talented, qualified employees in all our operations around the world regardless of race, color, sex/gender, including pregnancy, gender identity and expression, national origin, religion, sexual orientation, disability, age, marital status, citizen status, veteran status, or any other protected classification under applicable law. Thomson Reuters is proud to be an Equal Employment Opportunity Employer providing a drug-free workplace. We also make reasonable accommodations for qualified individuals with disabilities and for sincerely held religious beliefs in accordance with applicable law. More information on requesting an accommodation here. Learn more on how to protect yourself from fraudulent job postings here. More information about Thomson Reuters can be found on thomsonreuters.com.
Posted 1 month ago
2.0 - 7.0 years
4 - 9 Lacs
Bengaluru
Work from Office
The DevOps Engineer provides application support for the delivery, support, and maintenance of production applications for applications under LSM scope. This includes mitigating customer impact of issues, taking action to resolve issues, and determining root cause to prevent future problems. This role also helps drive initiatives for reducing incident rates and provides input to the planning and direction for setting application standards. About the Role In this opportunity as a DevOpsEngineer, you will: Analyzes customer problems of high complexity, assesses scope of impact, identifies options for problem resolution and takes action to resolve issues according to defined service levels. Provides a high level of technical and subject matter expertise in one or more technologies and serves as a point of escalation for technical issues related to specialty. Innovates by suggesting technology and/or process improvements to reduce the volume of incidents and mean time to recover. Collaborates with business, third party vendors, developers, production support, and technical operations groups to determine appropriate software/hardware needed and to resolve any issues impacting the application processes. Mitigates customer impact of issues and defines, reviews, and executes workarounds. Conducts root cause analysis and correlation of other system and/or application problems of high complexity. Carries out unit testing and implements application changes developed or modified, ensuring application behavior meets the needs of the client and business. Communicates status of outstanding issues to customers and ensures ticketing system is always up-to-date with the most recent actions and status. Proactively monitors production environments and/or applications and conducts health assessments to identify areas for improvement. Develops, configures, or supports tools for system monitoring and/or troubleshooting. Provides input for technical plans and solutions. Provides advice or training to users about application functionality. Provides technical guidance to less experienced team members. Manages multiple and sometimes competing priorities. Performs actions aligned with defined standards and best practices. Performs occasional work outside standard business hours as part of an on-call rotation About You : Youre a fit for the role of DevOps Engineer if you: Bachelors degree in computer science or related technical field. 2+ years of experience as a Cloud Engineer/SRE or similar role 2+ year of hands-on experience on Azure SQL, AKS clusters, Azure DevOps, Azure CLI, ADO Pipelines. Good Knowledge on Bicep Deployments and GitHub Actions Good Knowledge on Cloud Services and able to handle the Cloud Infrastructure. 2+ year of hands-on experience on DevOps technologies like Git, CI/CD, infrastructure automation, ADO Pipeline, containerization, and orchestration. Scripting knowledge like Shell script, PowerShell, bash, yaml and groovy Tools like GitHub actions, Kubernetes, Ansible, Docker, etc. Working experience on Clusters and High Availability solution Hands-on experience in Linux/Windows infrastructure & knowledge of networking Collaborate and communicate with other team members to decompose large tasks into small testable tasks. Experience with integrating into a team employing Agile and Scrum methodologies#LI-SS6 Whats in it For You Hybrid Work Model Weve adopted a flexible hybrid working environment (2-3 days a week in the office depending on the role) for our office-based roles while delivering a seamless experience that is digitally and physically connected. Flexibility & Work-Life Balance: Flex My Way is a set of supportive workplace policies designed to help manage personal and professional responsibilities, whether caring for family, giving back to the community, or finding time to refresh and reset. This builds upon our flexible work arrangements, including work from anywhere for up to 8 weeks per year, empowering employees to achieve a better work-life balance. Career Development and Growth: By fostering a culture of continuous learning and skill development, we prepare our talent to tackle tomorrows challenges and deliver real-world solutions. Our Grow My Way programming and skills-first approach ensures you have the tools and knowledge to grow, lead, and thrive in an AI-enabled future. Industry Competitive Benefits We offer comprehensive benefit plans to include flexible vacation, two company-wide Mental Health Days off, access to the Headspace app, retirement savings, tuition reimbursement, employee incentive programs, and resources for mental, physical, and financial wellbeing. Culture: Globally recognized, award-winning reputation for inclusion and belonging, flexibility, work-life balance, and more. We live by our valuesObsess over our Customers, Compete to Win, Challenge (Y)our Thinking, Act Fast / Learn Fast, and Stronger Together. Social Impact Make an impact in your community with our Social Impact Institute. We offer employees two paid volunteer days off annually and opportunities to get involved with pro-bono consulting projects and Environmental, Social, and Governance (ESG) initiatives. Making a Real-World Impact: We are one of the few companies globally that helps its customers pursue justice, truth, and transparency. Together, with the professionals and institutions we serve, we help uphold the rule of law, turn the wheels of commerce, catch bad actors, report the facts, and provide trusted, unbiased information to people all over the world. Thomson Reuters informs the way forward by bringing together the trusted content and technology that people and organizations need to make the right decisions. We serve professionals across legal, tax, accounting, compliance, government, and media. Our products combine highly specialized software and insights to empower professionals with the data, intelligence, and solutions needed to make informed decisions, and to help institutions in their pursuit of justice, truth, and transparency. Reuters, part of Thomson Reuters, is a world leading provider of trusted journalism and news. We are powered by the talents of 26,000 employees across more than 70 countries, where everyone has a chance to contribute and grow professionally in flexible work environments. At a time when objectivity, accuracy, fairness, and transparency are under attack, we consider it our duty to pursue them. Sound excitingJoin us and help shape the industries that move society forward. As a global business, we rely on the unique backgrounds, perspectives, and experiences of all employees to deliver on our business goals. To ensure we can do that, we seek talented, qualified employees in all our operations around the world regardless of race, color, sex/gender, including pregnancy, gender identity and expression, national origin, religion, sexual orientation, disability, age, marital status, citizen status, veteran status, or any other protected classification under applicable law. Thomson Reuters is proud to be an Equal Employment Opportunity Employer providing a drug-free workplace. We also make reasonable accommodations for qualified individuals with disabilities and for sincerely held religious beliefs in accordance with applicable law. More information on requesting an accommodation here. Learn more on how to protect yourself from fraudulent job postings here. More information about Thomson Reuters can be found on thomsonreuters.com.
Posted 1 month ago
6.0 - 11.0 years
9 - 13 Lacs
Bengaluru
Work from Office
About the role : In this opportunity, you will: Be a Professional SRE: Implement site reliability engineering and DevOps best practices. Feed non-functional requirements into the product backlog, such as, but not limited to, high availability, scalability, self-healing, observability, continuous delivery, security Build and maintain monitoring for all aspects of infrastructure, micro-services and the platform and implement Alerting mechanism using cloud native solutions Provide primary operational support and engineering for distributed platforms Act as the go to person for any production issue. Troubleshoot and monitor until successful mitigation, communicate effectively, postmortem and implementation of the learnings. Maintain IaCand CICD and promote best practices for our CI/CD processes Focus on Continuousimprovement andtechnical standardsdrive improvements in productivity,monitoring,toolingand set industry best practices. On-call Rotation:Participate in on-call/shift rotations. When on-call, you are expected to drive the troubleshooting and mitigation activities while working on incident Be innovative and curious: Maintain end-to-end security ensuring that we meet best practices standards Keepup-to-datewith emerging cloud technology trends, especially around DevOps, Service Reliability and Security. Adopt pan-TR operation principles to ensure consistency and efficiency Documenting tribal knowledge. Constant upkeep of documentation and runbooks can ensure that teams get the information they need right when they need it Be collaborative: Extreme collaboration within our teams Canada, US, Mexico and India About you: You're a fit for the role if you have: Bachelors degree in computer science or related field - a must Minimum of 6+ yearsofexperience as DevOps/SRE engineer andCloud engineerwith hands-on experience in AWS cloud technologies. Highly skilled in UNIX/Linux-based Systems Proven experience in building and operating PRODUCTION cloud-native infrastructure, applications, and services on AWS. Experience or knowledge of Container technology such as Docker, Kubernetes and Istio service mesh Must have experience using AWS services (such as Cloud Front, EKS, ECS, RDS, Threat detection and other security controls) Must have 2+ years scripting and programming experience(PowerShell, Bash) Experience or knowledge of Observability toolsDataDog, ELK, SumoLogic, CloudWatch Experience or knowledge with Version Control and CI/CD (Git/ Azure DevOps / JFrog Artifactory) Experience or knowledge writing Infrastructure as Code (IaC) (Terraform / CloudFormation / other) Team player with a can do attitude #LI-SM1 Whats in it For You Hybrid Work Model Weve adopted a flexible hybrid working environment (2-3 days a week in the office depending on the role) for our office-based roles while delivering a seamless experience that is digitally and physically connected. Flexibility & Work-Life Balance: Flex My Way is a set of supportive workplace policies designed to help manage personal and professional responsibilities, whether caring for family, giving back to the community, or finding time to refresh and reset. This builds upon our flexible work arrangements, including work from anywhere for up to 8 weeks per year, empowering employees to achieve a better work-life balance. Career Development and Growth: By fostering a culture of continuous learning and skill development, we prepare our talent to tackle tomorrows challenges and deliver real-world solutions. Our Grow My Way programming and skills-first approach ensures you have the tools and knowledge to grow, lead, and thrive in an AI-enabled future. Industry Competitive Benefits We offer comprehensive benefit plans to include flexible vacation, two company-wide Mental Health Days off, access to the Headspace app, retirement savings, tuition reimbursement, employee incentive programs, and resources for mental, physical, and financial wellbeing. Culture: Globally recognized, award-winning reputation for inclusion and belonging, flexibility, work-life balance, and more. We live by our valuesObsess over our Customers, Compete to Win, Challenge (Y)our Thinking, Act Fast / Learn Fast, and Stronger Together. Social Impact Make an impact in your community with our Social Impact Institute. We offer employees two paid volunteer days off annually and opportunities to get involved with pro-bono consulting projects and Environmental, Social, and Governance (ESG) initiatives. Making a Real-World Impact: We are one of the few companies globally that helps its customers pursue justice, truth, and transparency. Together, with the professionals and institutions we serve, we help uphold the rule of law, turn the wheels of commerce, catch bad actors, report the facts, and provide trusted, unbiased information to people all over the world. Thomson Reuters informs the way forward by bringing together the trusted content and technology that people and organizations need to make the right decisions. We serve professionals across legal, tax, accounting, compliance, government, and media. Our products combine highly specialized software and insights to empower professionals with the data, intelligence, and solutions needed to make informed decisions, and to help institutions in their pursuit of justice, truth, and transparency. Reuters, part of Thomson Reuters, is a world leading provider of trusted journalism and news. We are powered by the talents of 26,000 employees across more than 70 countries, where everyone has a chance to contribute and grow professionally in flexible work environments. At a time when objectivity, accuracy, fairness, and transparency are under attack, we consider it our duty to pursue them. Sound excitingJoin us and help shape the industries that move society forward. As a global business, we rely on the unique backgrounds, perspectives, and experiences of all employees to deliver on our business goals. To ensure we can do that, we seek talented, qualified employees in all our operations around the world regardless of race, color, sex/gender, including pregnancy, gender identity and expression, national origin, religion, sexual orientation, disability, age, marital status, citizen status, veteran status, or any other protected classification under applicable law. Thomson Reuters is proud to be an Equal Employment Opportunity Employer providing a drug-free workplace. We also make reasonable accommodations for qualified individuals with disabilities and for sincerely held religious beliefs in accordance with applicable law. More information on requesting an accommodation here. Learn more on how to protect yourself from fraudulent job postings here. More information about Thomson Reuters can be found on thomsonreuters.com.
Posted 1 month ago
14.0 - 24.0 years
50 - 60 Lacs
Noida, Hyderabad, Pune
Work from Office
Expectations Prior experience serving as an architect in Practice, COE, and HBUs, where they have creating service offerings, solution accelerators, and unique selling propositions Play a critical role in driving automation, continuous integration/continuous delivery (CI/CD), and monitoring capabilities to enhance the development and operations processes. Lead and execute designing, defining, and prototyping the end-to-end unified observability system leveraging NewRelic, Splunk and Grafana Stack Define build, implementation, and deployment strategies for the DevOps, Observability and Site Reliability Engineering Marketing of technology & domain solutions / service offerings to internal/external stakeholders Manage business relationship with the technology partners & start-up eco systems and demonstrate edge over competition. Passionate about technology and customer success with excellent communication and articulation skills Should have prior experience in presenting capabilities and solutions to end customers Build initial prototypes of the observability solution and lead the demo sessions with the customer teams Behavior Competencies Excellent Communication, interpersonal and Presentation Skills People Management Conflict Resolution Solutioning Customer Service Accountability Judgement and decision making Ability to build and maintain relationships with stakeholders Technical Skills At least 4 years of pre-sales experience, working with RFI / RFP, developing and presenting technical design & solution to the internal and external stakeholders Extensive experience in assessing SRE, DevOps, Observability maturity state for with ability to define maturity improvement roadmap. Extensive experience in defining and implementing SRE, DevOps, Observability strategies for 3 or more large scale projects Experience of cloud platforms such as AWS or Azure or GCP Deep expertise in Time Series Databases configurations and implementation on AWS cloud Experience of scale observability projects as architect in designing, implementation, and cloud deployment of observability on containerized (Azure AKS or AWS EKS) applications using NewRelic, Splunk and Grafana Stack or open source Grafana and Prometheus products/tools Deep expertise in designing and implementing of end-to-end distributed tracing using several Daemonsets/agents and telemetry gathering patterns. 3+ years in a Monitoring & Observability automation using NewRelic, Splunk and Grafana Stack including Prometheus based alerting. Deep expertise in observability tools such as Splunk, NewRelic, AWS CloudWatch, AWS OpenSearch, and ELK etc
Posted 1 month ago
10.0 - 15.0 years
14 - 18 Lacs
Hyderabad, Bengaluru
Work from Office
About the Role: Grade Level (for internal use): 11 About the Role We are looking for a highly driven Senior Platform & Full Stack Engineer who brings passion, innovation, and deep technical experience to join our high-performing DevOps and SRE team. In this role, youll help us define, build, and scale the next generation of cloud-native, cloud-agnostic CI/CD pipelines , Infrastructure as Code (IaC) reusable workflows , and AI-driven autonomous deployments . Key Responsibilities Lead the design and implementation of reusable IaC workflows and standardized CI/CD blueprints across multiple teams. Architect and maintain cloud-agnostic deployment solutions with deep expertise in AWS and Kubernetes (EKS). Implement and optimize configuration as code practices using tools like Terraform and GitHub Actions. Partner with developers and SREs to define end-to-end infrastructure workflows covering compute, network, and storage automation. Contribute as a hands-on developer to internal tools, platforms, and APIs (Java, Go, or similar). Collaborate on cutting-edge initiatives such as Agentic AI workflows and autonomous chat-based deployments using MCP and LLM orchestration. Foster a culture of continuous innovation, high energy, and performance excellence. Required Skills & Experience 10+ years of experience in DevOps, Platform Engineering, or Full Stack Development with platform ownership. Proven experience designing Infrastructure as Code using Terraform at scale. Solid programming skills J ava ,Python, Javascript and Go preferred Expertise in CI/CD pipeline design and orchestration using GitHub Actions (and optionally ArgoCD, GitLab, Jenkins, etc.). Strong knowledge of AWS services, with hands-on experience in EKS , IAM, networking (VPCs, Route53, ALBs), storage (EBS, S3), and compute. End-to-end understanding of modern cloud infrastructure , DevSecOps, observability, and release practices. Ability to translate product/platform needs into reliable, secure, scalable infrastructure solutions . Excellent problem-solving skills and a mindset for performance, scalability, and resilience. Passion for innovation, high energy, and eagerness to experiment with emerging tech like LLMs and Agentic AI Additional Skills Experience with multi-cloud environments (Azure, GCP). Knowledge of Agentic AI systems , LLMs , or AI Ops use cases. Exposure to platform-as-product or internal developer platforms. Familiarity with Kubernetes Operators, Helm charts, and service mesh (Istio, Linkerd). Why Join Us Be part of a forward-thinking DevOps and SRE team pushing the boundaries of platform automation. Work on AI-powered workflows and define how infrastructure can be deployed through intelligent assistants. Build developer-centric platforms that make a real impact on engineering productivity and product reliability. Enjoy a culture of innovation, energy, and excellence where your ideas will be heard and executed. Whats In It For You Our Purpose: Progress is not a self-starter. It requires a catalyst to be set in motion. Information, imagination, people, technologythe right combination can unlock possibility and change the world.Our world is in transition and getting more complex by the day. We push past expected observations and seek out new levels of understanding so that we can help companies, governments and individuals make an impact on tomorrow. At S&P Global we transform data into Essential Intelligence, pinpointing risks and opening possibilities. We Accelerate Progress. Our People: Our Values: Integrity, Discovery, Partnership At S&P Global, we focus on Powering Global Markets. Throughout our history, the world's leading organizations have relied on us for the Essential Intelligence they need to make confident decisions about the road ahead. We start with a foundation of integrity in all we do, bring a spirit of discovery to our work, and collaborate in close partnership with each other and our customers to achieve shared goals. Benefits: We take care of you, so you cantake care of business. We care about our people. Thats why we provide everything youand your careerneed to thrive at S&P Global. Health & WellnessHealth care coverage designed for the mind and body. Continuous LearningAccess a wealth of resources to grow your career and learn valuable new skills. Invest in Your FutureSecure your financial future through competitive pay, retirement planning, a continuing education program with a company-matched student loan contribution, and financial wellness programs. Family Friendly PerksIts not just about you. S&P Global has perks for your partners and little ones, too, with some best-in class benefits for families. Beyond the BasicsFrom retail discounts to referral incentive awardssmall perks can make a big difference. For more information on benefits by country visithttps://spgbenefits.com/benefit-summaries Global Hiring and Opportunity at S&P Global: At S&P Global, we are committed to fostering a connected andengaged workplace where all individuals have access to opportunities based on their skills, experience, and contributions. Our hiring practices emphasize fairness, transparency, and merit, ensuring that we attract and retain top talent. By valuing different perspectives and promoting a culture of respect and collaboration, we drive innovation and power global markets. ----------------------------------------------------------- Equal Opportunity Employer S&P Global is an equal opportunity employer and all qualified candidates will receive consideration for employment without regard to race/ethnicity, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, marital status, military veteran status, unemployment status, or any other status protected by law. Only electronic job submissions will be considered for employment. If you need an accommodation during the application process due to a disability, please send an email to EEO.Compliance@spglobal.com and your request will be forwarded to the appropriate person. US Candidates Only The EEO is the Law Poster http://www.dol.gov/ofccp/regs/compliance/posters/pdf/eeopost.pdf describes discrimination protections under federal law. Pay Transparency Nondiscrimination Provision - https://www.dol.gov/sites/dolgov/files/ofccp/pdf/pay-transp_%20English_formattedESQA508c.pdf ----------------------------------------------------------- IFTECH202.2 - Middle Professional Tier II (EEO Job Group)
Posted 1 month ago
5.0 - 10.0 years
15 - 25 Lacs
Bengaluru
Remote
Hiring for European Cyber Security Company. Role & responsibilities: Act as the highest point of technical escalation within the team for cloud infrastructure, platform and application related issues. Troubleshoot and resolve complex incidents involving AWS services, containerized environments, application availability and customer raised tickets. Ensure that the application uptime is 99.99% and customer tickets are promptly answered. Own major incident response: coordinate efforts across teams, lead root cause analysis (RCA), and deliver clear post-mortem reports. Monitor system health and security events using advanced tools. Collaborate with Care Efficiency Leads and Engineering teams to improve observability and automation. Maintain internal runbooks, playbooks, and documentation for recurring or high-risk scenarios. Implement proactive improvements to reduce MTTR (mean time to resolution) and eliminate recurring incidents. Support change management processes and participate in planned maintenance windows. Mentor other Care engineers, encouraging a culture of knowledge-sharing and continuous improvement. Participate in 24/7 coverage including on-call shifts or rotations as needed. Preferred candidate profile: 5+ years of experience in NOC, or SRE roles, with at least 2 years in a Tier 3 or senior escalation role. AWS Architecture Expertise : In-depth knowledge of AWS architecture principles, security best practices, and advanced services relevant to cloud infrastructure. Containerization Knowledge : Experience with Docker and container orchestration using Amazon ECS/EKS to maintain microservices architecture. Advanced Monitoring Solutions : Hands-on experience with Dynatrace, Datadog, CloudWatch, or similar tools. Expertise in implementing and optimizing monitoring solutions for complex, distributed cybersecurity infrastructure. Linux System Administration : Strong command of Linux environments with ability to perform system hardening aligned with security best practices. Scripting & Automation : Proficiency in Bash, Python, or PowerShell scripting for automating routine tasks, log analysis, and creating incident response procedures. Development of complex automation workflows to reduce MTTR and eliminate manual processes. CI/CD Pipeline Expertise : Experience with continuous integration/continuous deployment pipelines for security products. Multi-Cloud Strategy : Experience with hybrid or multi-cloud environments to support diverse infrastructure needs. Performance Optimization : Advanced skills in tuning cloud resources for optimal performance while maintaining uptime requirement. Operational Skills: Technical Leadership : Ability to lead technical response teams during critical incidents affecting partners. Mentorship : Skills in mentoring L1 and L2 engineers on specific technologies and processes. Root Cause Analysis : Expert-level problem investigation and root cause analysis skills to prevent recurring issues in environment. Process Improvement : Ability to identify inefficiencies in operational processes and implement improvements to enhance partner support. Cross-team Collaboration : Strong collaboration skills to work effectively with Engineering, Product, and Partner teams during complex incidents. Bonus Points For: Business Impact Awareness : Deep understanding of how technical issues impact customer business operations and end-users. Custom Partner Solutions : Experience with customized deployments for high-tier partners and ability to troubleshoot complex partner-specific configurations.
Posted 1 month ago
5.0 - 10.0 years
18 - 22 Lacs
Hyderabad, Chennai, Bengaluru
Hybrid
Key Responsibilities: Design and automate workflows using Power Automate and IcM. Build dashboards and reports in Power BI. Query and analyze data using Kusto Query Language in Azure Monitor/Log Analytics. Manage incidents and operational workflows via Geneva/Jarvis. Develop and deploy solutions in the Azure environment. Use Azure DevOps for pipeline creation, environment management, and project tracking. Preferred Skills: Strong knowledge of Power Apps, Microsoft Azure infrastructure and services. Experience with automation and monitoring tools within Microsoft. Familiarity with DevOps methodologies and tools. Excellent troubleshooting and problem-solving abilities. Mandatory Skills - Power BI,KQL,Devops,Azure
Posted 1 month ago
5.0 - 10.0 years
20 - 25 Lacs
Chennai
Work from Office
Hi, Wishes from GSN!!! Pleasure connecting with you!!! We been into Corporate Search Services for Identifying & Bringing in Stellar Talented Professionals for our reputed IT / Non-IT clients in India. We have been successfully providing results to various potential needs of our clients for the last 20 years. At present, GSN is SRE Production Support hiring for one of our leading MNC client. PFB the details for your better understanding: Experience: 6+ Yrs Budget: 15LPA- 25LPA Work Location: CHENNAI Mode: WFO (5 Days in Office) Work Timing: 24/7 (cab facility and shift allowance will be provided) Whom we look for? We are looking for an experienced SRE (Site reliability Engineer) Should have worked in any of the Application Support (Java/.Net) Experience in L2 or L3 application support (Alert Configuration + Dashboard Creation) Experience with Release Management and Production Deployment Experience in Splunk Experience with Grafana (Added advantage) If interested, kindly APPLY for IMMEDIATE response. Thanks & Rgds SHOBANA | GSN | Shobana@gsnhr.net |Google Reviews: https://g.co/kgs/UAsF9W
Posted 1 month ago
4.0 - 8.0 years
6 - 10 Lacs
Bengaluru
Work from Office
Automation: Develop and maintain automation tools and scripts to streamline deployment, monitoring, and management of the infrastructure and applications. Monitoring and Alerting: Set up and maintain monitoring and alerting systems to proactively identify and resolve issues before they impact customers or services. Including participation in on-call rotations to respond promptly to high priority incidents. Performance Optimization: Identify opportunities for performance optimization and work with development teams to implement improvements. Documentation: Maintain up-to-date documentation for the infrastructure, processes, and procedures. Collaboration: Work closely with development teams, product managers, and other stakeholders to understand requirements and ensure the reliability of the platform. Continuous Improvement: Participate in post-incident reviews, retrospectives, and other forums to identify areas for improvement and drive continuous improvement initiatives. Required education Bachelor's Degree Preferred education Master's Degree Required technical and professional expertise Strong Linux systems engineering background with CentOS/RHEL or Debian including experience building, maintaining and troubleshooting these systems. Automation and Scripting: Strong scripting skills (e.g., Bash, Python) and experience with configuration management tools (e.g., Ansible, Chef, Puppet) to automate deployment and management tasks. Excellent Git skills (merges, branching, forking) Experience with Cloud Platforms: Strong experience with cloud platforms such as IBM, AWS, Azure, or Google Cloud Platform, including expertise in: o Deploying and managing services in these environments. o Managing, and troubleshooting containerized applications. Troubleshooting and Problem Solving: Strong troubleshooting skills and the ability to quickly identify and resolve complex issues in a production environment, including experience with incident response and post-incident analysis. Preferred technical and professional experience DevOps Culture: Experience working in a DevOps culture and mindset, including a strong understanding of the collaboration between development and operations teams to achieve business goals. Container Orchestration: Proficiency in container orchestration tools such as Nomad or Kubernetes, including experience with Hashicorp Consul/Vault or equivalents. Monitoring and Logging: Experience with monitoring and logging tools (e.g., ELK stack, Grafana, Prometheus) to monitor the health and performance of infrastructure and applications. Including experience building and maintaining these tools. Security: Knowledge of implementing security best practices and maintaining compliance standards (Center for Internet Security (CIS) Benchmarks, FedRAMP). Security: Ability to patch software or adjust configurations to mitigate Common Vulnerabilities and Exposures (CVE) in a timely fashion. Experience with clustered time series database technologies such as InfluxDB as well as experience with distributed event streaming platforms using Kafka and Telegraf. CI/CD: Experience with application deployment using CI/CD tools such as Jenkins and Tekton. Working knowledge with GitHub, JIRA, Confluence, and ServiceNow.
Posted 1 month ago
6.0 - 10.0 years
6 - 10 Lacs
Hyderabad / Secunderabad, Telangana, Telangana, India
On-site
Job Summary Role : Site Reliability Engineer & Azure Experience : 6 to 10 Years Skills : SRE with Azure and OCP/Open Shift cloud platform Location : Bangalore/Hyderabad Role: Software Development - Other Industry Type: IT Services & Consulting Department: Engineering - Software & QA Employment Type: Full Time, Permanent Role Category: Software Development
Posted 1 month ago
14.0 - 20.0 years
50 - 70 Lacs
Bengaluru
Hybrid
Overview : As an SRE manager, you are responsible for the availability and reliability of Calixs cloud. At Calix, Site Reliability Engineering combines software and systems engineering to build and run large-scale, distributed, fault-tolerant systems. You would be responsible for leading a team of Site Reliability Engineers, overseeing the reliability, scalability, and maintainability of Calix's critical infrastructure, including building and maintaining automation tools, managing on-call rotations, collaborating with development teams, and ensuring systems meet service level objectives (SLOs), all while prioritizing continuous improvement and a strong focus on infrastructure health and stability within the Calix platform, leveraging tools like Terraform, observability frameworks from the Grafana Labs ecosystem, and Google Cloud Platform. Qualifications : - Strong experience as an SRE manager with a proven track record of managing large-scale, highly available systems. - Expertise in cloud computing platforms (preferably Google Cloud Platform). - Knowledge of core operating system principles, networking fundamentals, and systems management. - Programming skills in languages like Python and Go. - Proven experience building and leading SRE teams, including hiring, coaching, and performance management. - Deep understanding and expertise in building and maintaining scalable open-source monitoring tools and backend storage. - Experience with incident management processes and best practices. - Excellent communication and collaboration skills to work with cross-functional teams. - Knowledge of SRE principles, including error budgets, fault analysis, and reliability engineering concepts. Education : - B.S. or M.S. in Computer Science or equivalent field. Role & responsibilities
Posted 1 month ago
3.0 - 8.0 years
5 - 15 Lacs
Hyderabad, Pune, Bengaluru
Hybrid
Job description Hiring for site reability engineer - AWS /Azure Devops with experience range 3 + years. Mandatory Skills: Java, Kubernetes, AWS/Azure, DevOps/DevSecOps, Monitoring Tools - App Dynamics/ Dynatrace/New Relic, Build and Release, Prometheus, Python, Node.JS-site reability engineer Education: BE/B.Tech/MCA/M.Tech/MSc./MSts
Posted 1 month ago
5.0 - 10.0 years
10 - 20 Lacs
Chennai
Work from Office
Good communication skill1 #1 Person should be able to talk in large forums. This is very important Application Production support -4 to 10 years Hands on experience in Splunk - Hands on experience in building CI CD pipelines Candidate should have experience in supporting Technical Production support. ( Functional support is not a good fit) Cloud knowledge Good to have scripting skills, container, and Kubernetes knowledge. Technical Skills: Technology: JAVA / .Net framework , C# basics, Splunk, Cloud preferably PCF Experience in Production Deployment using CI/CD pipelines Splunk Query Skills: Ability to write effective Splunk queries for data analysis and monitoring Linux Administration: Experience with Linux Server troubleshooting common failures, health checks and administration tasks Real-Time troubleshooting of critical application workflows and incorporate feedback to product development. Should have good knowledge on splunk , should be able to write queries in splunk and create alerts. Good knowledge of ITSM framework. Should be good in analysis , able to find out solutions without much help. Triage alerts & diagnose/resolve critical issues, manage implementation of changes. Perform root cause analysis of critical incidents/alerts. Initiate and drive the Techlines in case of outages/major incidents/Batch abends and ensure .Service Restoration in the least time possible. Act quickly on the application Alerts and Batch Job failures. Identify manual toil, repetitive issues, and work with stakeholders with improvement plan. Should have basic experience in .net framework , C# to fix bug Able to write basic powershell commands. Knowledge of one or more of Message Brokers such as RabbitMQ, IBM MQ Knowledge of JIRA, confluence and remedy ticketing systems.
Posted 1 month ago
12.0 - 17.0 years
14 - 19 Lacs
Bengaluru
Work from Office
Your Role & Responsibilities: Looking to make a significant impactThis is your chance to become a key part of a dynamic team of talented professionals, leading the development and deployment of innovative, industry-leading, cloud-based AI services. We are seeking an experienced AI & Cloud Software Engineer to join us. This role designing, developing, and deploying AI-based services. You will be instrumental in problem-solving, automating wide ranges of tasks, and interfacing with other teams and solve complex problems. Responsibilities: Develop AI capabilities in IBM Cloud based applications Design and be an avid coder who can get his hands dirty and be involved in the coding to the deepest level. Work in an agile environment of continuous deliverable. You’ll have access to all the technical training courses you need to become the expert you want to be. Define all aspects of development from appropriate technology and workflow to coding standards Collaborate with other professionals to determine functional and non-functional requirements Participate in technical reviews of requirements, specifications, designs, code and other artifacts. Learn new skills and adopt new practices readily in order to develop innovative and cutting-edge software products that maintain Company’s technical leadership position. Required education Bachelor's Degree Required technical and professional expertise Required Expertise Full Stack & AI/ML 7–12 years' experience with AI/ML tools (scikit-learn, TensorFlow, PyTorch, LLMs), model deployment, and full-stack development. Backend & APIs Strong in Java, Python, Node.js, REST APIs, Kafka, and databases like Cassandra, PostgreSQL. Cloud & DevOps Expertise in IBM Cloud/AWS/Azure, Kubernetes, Docker, microservices, CI/CD, and SRE practices. Web & Architecture Proficient in web technologies (HTTP, JSON, HTML, JS) and modern cloud/microservices architecture with API design skills. Preferred technical and professional experience Preferred Expertise Messaging & OSExperience with Kafka, RabbitMQ, and Linux environments (Red Hat, Ubuntu). Networking & ToolsKnowledge of TCP/IP, HTTP protocols, GitHub, Maven/Gradle. SaaS & CI/CDBackground in SaaS apps, CI/CD pipelines, and agile development cycles. Testing & AutomationFamiliarity with UI test tools like Selenium or Puppeteer. MindsetOwnership, adaptability, global collaboration, and eagerness to solve complex problems with new tech.
Posted 1 month ago
10.0 - 15.0 years
12 - 17 Lacs
Bengaluru
Work from Office
Consult with clients and propose architectural solutions to help move & improve infra from on-premises to cloud or help optimize cloud spend from one public cloud to the other. Be the first one to experiment on new age cloud offerings, help define the best practice as a thought leader for cloud, automation & Dev-Ops, be a solution visionary and technology expert across multiple channels. Good understanding of cloud design principles, sizing, multi-zone/cluster setup, resiliency and DR design. Solution Architect or similar certifications from Azure is must. Good business judgment, a comfortable, open communication style, and a willingness and ability to work with customers and teams. Strong communication skills and ability to lead discussions with client technical experts, application team & Vendors to drive collaboration, design thinking model towards reaching the desired objective Required education Bachelor's Degree Preferred education Master's Degree Required technical and professional expertise Experience in participating in technical reviews of requirements, designs, code, and other artifacts and use your experience in Multicloud to build hybrid-cloud solutions for customers. Provide leadership to project teams and facilitate the definition of project deliverables around core Cloud based technology and methods. Define tracking mechanisms and ensure IT standards and methodology are met; deliver quality results. Sound knowledge of SRE principles and ability to address performance issues through design or coding is must. Implement observability, develop, and support pipeline model to deploy key features, changes. Security, Risk and Compliance - Advise customers on best practices around access management, network setup, regulatory compliance, and related areas Preferred technical and professional experience 10 - 15 years of experience with at least 5+ years of hands-on experience in Azure Cloud Computing and IT operational experience in a global enterprise environment. Experience in Azure Databricks is preferred. Must have Azure DevOps experience and expertise in all Azure services and Database and Operating Systems experience and good experience in Automation skills like Terraform Ansible etc. Should work in IBM Cloud project as and when needed
Posted 1 month ago
3.0 - 8.0 years
6 - 11 Lacs
Hyderabad / Secunderabad, Telangana, Telangana, India
On-site
How will you fulfill your potential Work with a global team of highly motivated platform engineers and software developers building integrated architectures for secure, scalable infrastructure services serving a diverse set of use cases. Partner with colleagues from across technology and risk to ensure an outstanding platform is delivered. Help to provide frictionless integration with the firm s runtime, deployment and SDLC technologies. Collaborate on feature design and problem solving. Help to ensure reliability, define, measure, and meet service level objectives. Quality coding & integration, testing, release, and demise of software products supporting AWM functions. Engage in quality assurance and production troubleshooting. Help to communicate and promote best practices for software engineering across the Asset Management tech stack. Basic Qualifications A strong grounding in software engineering concepts and implementation of architecture design patterns. A good understanding of multiple aspects of software development in microservices architecture, full stack development experience, Identity / access management and technology risk. Sound SDLC and practices and tooling experience - version control, CI/CD and configuration management tools. Ability to communicate technical concepts effectively, both written and orally, as well as interpersonal skills required to collaborate effectively with colleagues across diverse technology teams. Experience meeting demands for high availability and scalable system requirements. Ability to reason about performance, security, and process interactions in complex distributed systems. Ability to understand and effectively debug both new and existing software. Experience with metrics and monitoring tooling, including the ability to use metrics to rationally derive system health and availability information. Experience in auditing and supporting software based on sound SRE principles. Preferred Qualifications 3+ Years of Experience using and/or supporting Java based frameworks & SQL / NOSQL data stores. Experience with deploying software to containerized environments - Kubernetes/Docker. Scripting skills using Python, Shell or bash. Experience with Terraform or similar infrastructure-as-code platforms. Experience building services using public cloud providers such as AWS, Azure or GCP
Posted 1 month ago
7.0 - 9.0 years
35 - 60 Lacs
Gurugram, Bengaluru
Hybrid
Role & responsibilities As a Site Reliability Engineer, you'll use your advanced development and operations knowledge to identify and prioritize issues. Find universal solutions to common problems and mentor and support junior staff. Additionally, you will: Enlighten, Enable and Empower a fast-growing set of multi-disciplinary teams, across multiple applications and locations. Tackle complex development, automation and business process problems. Champion Cvent standards and best practices. Ensure the scalability, performance, and resilience of our suite of products. Work with the development and product team of a new application to establish the right monitoring and alerting strategy. Develop build, test and deployment automation that seamlessly targets multiple on-premises and AWS regions. Help a dev team working on a legacy code base to realize zero-down-time deployments. • Give back by working on and contributing to Open Source projects Automate all the things! Preferred candidate profile Experience with SDLC methodologies (preferably Agile software development methodology). Scripting languages like Ruby, Groovy, Bash, PowerShell, or Python. Exposure to managing AWS services / operational knowledge of managing applications in AWS Experience with configuration management tools such as Chef, Puppet, Ansible or equivalent Hands-on experience with Windows and Linux/Unix Administration Working with APM, monitoring, and logging tools (New Relic, DataDog, Splunk) Good understanding of containerization concepts - docker, ECS, EKS, Kubernetes Experience managing 3 tier application stacks Experience with build tools such as Jenkins Working experience with NoSQL databases such as MongoDB, couchbase, postgres etc F5 load balancing concepts Understanding of basic networking concepts Experience with package managers such as nexus, artifactory or equivalent Good communication skills
Posted 1 month ago
10.0 - 13.0 years
35 - 50 Lacs
Chennai
Work from Office
Job Summary We are seeking an experienced R2 Architect with 10 to 13 years of experience in SRE DevOps and SRE Concepts. The ideal candidate will work in a hybrid model primarily during the day shift. This role does not require travel. The candidate will play a crucial role in ensuring the reliability and efficiency of our systems contributing to the companys overall success and societal impact. Responsibilities Lead the design and implementation of SRE practices to enhance system reliability and performance. Oversee the development and maintenance of automated solutions for system monitoring and incident response. Provide technical guidance and mentorship to the SRE team to ensure best practices are followed. Collaborate with cross-functional teams to identify and address system bottlenecks and performance issues. Implement and manage CI/CD pipelines to streamline software delivery processes. Develop and maintain comprehensive documentation for SRE processes and procedures. Conduct regular system audits and performance reviews to ensure optimal operation. Implement robust incident management protocols to minimize downtime and service disruptions. Monitor system health and performance metrics to proactively address potential issues. Drive continuous improvement initiatives to enhance system reliability and efficiency. Ensure compliance with industry standards and best practices in SRE and DevOps. Facilitate effective communication and collaboration between development and operations teams. Utilize data-driven insights to inform decision-making and optimize system performance. Qualifications Possess extensive experience in SRE DevOps and SRE Concepts. Demonstrate proficiency in implementing and managing CI/CD pipelines. Exhibit strong problem-solving skills and the ability to address complex system issues. Have a solid understanding of automated monitoring and incident response solutions. Show excellent communication and collaboration skills to work effectively with cross-functional teams. Maintain a proactive approach to system health and performance monitoring. Display a commitment to continuous improvement and staying updated with industry trends. Hold relevant certifications in SRE or DevOps practices. Bring a proven track record of enhancing system reliability and efficiency. Demonstrate the ability to mentor and guide team members in best practices. Exhibit strong organizational skills and attention to detail. Have experience in developing and maintaining comprehensive documentation. Show a commitment to ensuring compliance with industry standards and best practices.
Posted 1 month ago
3.0 - 5.0 years
5 - 7 Lacs
Bengaluru
Work from Office
Job Title: Site Reliability Engineer Department: Engineering / Infrastructure Reports To: SRE Manager / DevOps Lead Location: Bangalore, India Role Summary The Site Reliability Engineer (SRE) will be responsible for ensuring the availability, performance, and scalability of critical systems. This role involves managing CI/CD pipelines, monitoring production environments, automating operations, and driving platform reliability improvements in collaboration with development and infrastructure teams. Key Responsibilities Manage alerts and monitoring of critical production systems. Operate and enhance CI/CD pipelines and improve deployment and rollback strategies. Work with central platform teams on reliability initiatives. Automate testing, regression, and build tooling for operational efficiency. Execute NFR testing on production systems. Plan and implement Debian version migrations with minimal disruption. Required Qualifications & Skills CI/CD and Packaging Tools: Hands-on experience with Jenkins, Docker, JFrog for packaging and deployment. Operating System Expertise: Experience in Debian OS migration and upgrade processes. Monitoring Systems: Knowledge of Grafana, Nagios, and other observability tools. Configuration Management: Proficiency with Ansible, Puppet, or Chef. Version Control: Working knowledge of Git and related version control systems. Kubernetes: Deep understanding of Kubernetes architecture, deployment pipelines, and debugging. Ability to deploy components with detailed insights into: Configuration parameters and system requirements Monitoring and alerting needs Performance tuning Designing for high availability and fault tolerance Networking: Understanding of TCP/IP, UDP, Multicast, Broadcast. Experience with TCPDump, Wireshark for network diagnostics. Linux & Databases: Strong skills in Linux tools and scripting. Familiarity with MySQL and NoSQL database systems. Soft Skills Strong problem-solving and analytical skills Effective communication and collaboration with cross-functional teams Ownership mindset and accountability Adaptability to fast-paced and dynamic environments Detail-oriented and proactive approach Preferred Qualifications Bachelor’s degree in Computer Science, Engineering, or related technical field Certifications in Kubernetes (CKA/CKAD), Linux, or DevOps practices Experience with cloud platforms (AWS, GCP, Azure) Exposure to service mesh, observability stacks, or SRE toolkits Key Relationships Internal: DevOps, Infrastructure, Software Development, QA, Security Teams External: Tool vendors, platform service providers (if applicable) Role Dimensions Impact on uptime and reliability of business-critical services Ownership of CI/CD and production deployment processes Contributor to cross-team reliability and scalability initiatives Success Measures (KPIs) System uptime and availability (SLA adherence) Mean Time to Detect (MTTD) and Mean Time to Resolve (MTTR) incidents Deployment success rate and rollback frequency Automation coverage of operational tasks Completion of OS migration and infrastructure upgrade projects Competency Framework Alignment Technical Mastery: Infrastructure, automation, CI/CD, Kubernetes, monitoring Execution Excellence: Timely project delivery, process improvements Collaboration: Cross-functional team engagement and support Resilience: Problem solving under pressure and incident response Innovation: Continuous improvement of operational reliability and performance
Posted 1 month ago
8.0 - 13.0 years
40 - 65 Lacs
Hyderabad
Remote
Technical Head of Cloud & DevOps Location: 100% Remote (India, Eastern Europe, UK, or U.S.-based candidates; occasional travel to company hubs or conferences as needed) Type: Full-time, Senior Technical Leadership Role Overview We are seeking a Head of Cloud & DevOps to lead the hands-on management, scaling, and continuous improvement of our decentralized compute infrastructure. This position will serve as the primary technical leader for cloud operations, Kubernetes orchestration, infrastructure management, and DevOps pipelines, ensuring platform reliability, performance, and scalability. You will work closely with the CTO, product management, and cross-functional engineering teams to operationalize our companys evolving platform, drive our migration to in-house Distributed Kubernetes Service (DKS), and ensure high uptime and SLA adherence for enterprise customers. This role requires deep technical expertise combined with strong leadership to guide and mentor teams, while remaining actively engaged in architecture reviews, troubleshooting, and hands-on problem solving. This role is designed for candidates who aspire to grow into a future CTOO position, taking on expanded enterprise leadership responsibilities as the platform scales globally. Mandatory Skills Kubernetes orchestration (multi-cluster, DKS, service mesh) Cloud infrastructure scaling (AWS, hybrid, AI workloads) DevOps & CI/CD leadership (Jenkins, GitOps, version control) Infrastructure as Code (IaC) (Terraform, Helm, Ansible) Incident response and uptime optimization (SRE, observability, 99.9%+ SLAs) Security & Compliance knowledge (SOC 2, ISO 27001, access control, encryption) Team leadership in DevOps/SRE/Cloud Ops Monitoring and alerting systems Platform reliability and SLA adherence 8+ years in Cloud Infrastructure, 4+ in Kubernetes/DevOps leadership Non Mandatory skills Experience with Distributed Kubernetes Service (DKS) migrations Passion for decentralized computing / Web3 / blockchain NXQ Token or similar token incentive familiarity Cloud-native architecture for AI workloads Experience with hybrid or bare-metal Kubernetes deployments Global infrastructure experience Knowledge of performance-based DevOps metrics (error budgets, SLOs) Key Responsibilities Infrastructure Ownership & Uptime Leadership Own the full operational lifecycle of our companys decentralized compute infrastructure, spanning Kubernetes, VMs, AI workloads, hybrid cloud integrations, and blockchain components. • Develop and execute infrastructure scaling plans to meet growth demands while maintaining enterprise-grade SLAs (99.9%+ uptime). • Build robust monitoring, observability, alerting, and incident response systems to proactively manage global NanoServer operations. • Maintain deep involvement in diagnosing and resolving performance, capacity, and stability issues. Kubernetes Platform Management & DKS Migration Lead the architecture, deployment, and ongoing optimization of our companys Distributed Kubernetes Service (DKS). • Manage the transition from AWS EKS to DKS with zero downtime, thorough testing, rollbacks, and security assurance. • Ensure DKS delivers parity or superiority to leading cloud providers' managed Kubernetes offerings. DevOps Leadership Drive maturity in CI/CD pipelines, infrastructure-as-code, configuration management, and automated testing practices. • Oversee deployment reliability, version control, rollbacks, and release management. • Lead incident response runbooks, playbooks, SRE error budgets, and continuous reliability improvements. Security & Compliance Implement strong security controls for Kubernetes clusters, network access, identity management, data privacy, and blockchain-related assets. • Collaborate with compliance teams on certifications (SOC 2, ISO 27001, etc.) as required by enterprise clients. • Maintain operational adherence to security standards and best practices. Team Leadership & Execution Lead, mentor, and grow cross-functional cloud operations teams: DevOps, SRE, infrastructure engineers, and backend developers. • Foster a culture of accountability, continuous improvement, operational excellence, and proactive ownership. • Set clear objectives, performance metrics, and technical execution roadmaps aligned to business goals. Collaboration & Stakeholder Alignment • Partner closely with the CTO, product management, and engineering leadership to translate platform objectives into actionable infrastructure projects. • Represent technical operations in cross-functional planning sessions and communicate platform health, SLAs, and operational risks. Qualifications & Experience 8+ years of experience managing complex cloud infrastructure, with at least 4+ years leading DevOps/SRE/Kubernetes operations at scale. • Strong hands-on expertise with Kubernetes orchestration, multi-cluster management, service mesh, container security, and high-scale distributed systems. • Proven success in infrastructure scaling, uptime optimization, incident response, and capacity planning. • In-depth knowledge of DevOps pipelines, CI/CD frameworks, Infrastructure-as-Code (Terraform, Helm), and automated deployments. • Demonstrated ability to lead migrations from managed cloud services to in-house infrastructure. • Strong understanding of cloud security, access controls, encryption, data privacy, and enterprise compliance . • Passion for decentralized cloud computing, Web3/blockchain concepts, or AI-driven infrastructure is a plus. • Excellent leadership, communication, and cross-functional collaboration skills. • Bachelors or Master’s degree in Computer Science, Engineering, or a related field; equivalent experience considered. Compensation & Benefits Competitive base salary depending on candidate location • Equity participation aligned to long-term growth of our company • Performance-based annual bonuses • NXQ token incentives aligned with ecosystem growth • Comprehensive healthcare coverage • Remote work flexibility with home office stipends • Opportunities for global collaboration and occasional travel • High-impact leadership role shaping the future of cloud technology • Structured career path to grow into CTOO based on organizational maturity and demonstrated leadership
Posted 1 month ago
8.0 - 13.0 years
0 - 0 Lacs
Hyderabad, Chennai, Bengaluru
Hybrid
Job Title Site Reliability Engineer SRE Observability Engineer Shift Type Rotational Shifts including Night Shift and Weekend Availability Experience 7 Years of Exp Job Summary We are looking for a skilled and adaptable Site Reliability Engineer SRE Observability Engineer to join our dynamic project team The ideal candidate will play a critical role in ensuring system reliability scalability observability and performance while collaborating closely with development and operations teams This position requires strong technical expertise problem solving abilities and a commitment to 247 operational excellence Key Responsibilities Site Reliability Engineering Design build and maintain scalable and reliable infrastructure Automate system provisioning and configuration using tools like Terraform Ansible Chef or Puppet Develop tools and scripts in Python Go Java or Bash for automation and monitoring Administer and optimize Linux Unix systems with a strong understanding of TCPIP DNS load balancers and firewalls Implement and manage cloud infrastructure across AWS or Kubernetes Maintain and enhance CICD pipelines using tools like Jenkins ArgoCD Monitor systems using Prometheus Grafana Nagios or Datadog and respond to incidents efficiently Conduct postmortems and define SLAsSLOs for system reliability and performance Plan for capacity and performance using benchmarking tools and implement autoscaling and failover systems Observability Engineering Instrument services with relevant metrics logs and traces using OpenTelemetry Prometheus Jaeger Zipkin etc Build and manage observability pipelines using Grafana ELK Stack Splunk Datadog or Honeycomb Work with timeseries databases eg InfluxDB Prometheus and log aggregation platforms Design actionable s and dashboards to improve system observability and reduce fatigue Partner with developers to promote observability best practices and define key performance indicators KPIs Required Skills Qualifications Proven experience as an SRE or Observability Engineer in complex production environments Handson expertise in LinuxUnix systems and cloud infrastructure AWSKubernetes Strong programming and scripting skills in Python Go Bash or Java Deep understanding of monitoring logging and ing systems Experience with modern Infrastructure as Code and CICD practices Ability to analyze and troubleshoot production issues in realtime Excellent communication skills to collaborate with crossfunctional teams and stakeholders Flexibility to work in rotational shifts including night shifts and weekends as required by project demands A proactive mindset with a focus on continuous improvement and reliability Additional Requirements Excellent communication skills to collaborate with crossfunctional teams and stakeholders Flexibility to work in rotational shifts including night shifts and weekends as required by project demands A proactive mindset with a focus on continuous improvement and reliability Skills Mandatory Skills : Ansible, AWS Automation Services, AWS CloudFormation, AWS Code Pipeline, AWS CodeDeploy, AWS DevOps Services
Posted 1 month ago
1.0 - 6.0 years
1 - 6 Lacs
Bengaluru / Bangalore, Karnataka, India
On-site
Our strategic new platform is a brand new/greenfield initiative to converge all our businesses onto a single technology platform, to proving much better scalability, improved resiliency and reducing the time needed to develop and deliver new features. The programme is sponsored by our division s senior leadership and is one of the key strategic deliveries of the next few years. Highly developed analytical and technical skills, combined with a commercial and collaborative approach to problem solving, are essential to our success. Responsibilities and Qualifications Who Are We Looking For Self-guided pragmatic individual with a proven track record in designing and delivering complex software solutions in the financial services industry. Motivated by the opportunity to make impactful deliveries for our businesses and clients Strong analytical skills Hungry to learn new concepts and technologies Ability to efficiently work within a global team Effective written and verbal communication skills Able to keep a commercial outlook while maintaining a focus on technical quality and attention to detail Skills And Experience We Are Looking For 1+ years of strong programming experience in Java Comfortable multi-tasking and working as part of a global team. Experience building and maintaining scalable and distributed systems. Familiarity with Test Driven Development and Business Driven Development Performant data structures and algorithms Experience of databases (SQL, NoSQL) Linux / Unix skills including scripting (shell) Knowledge of financial markets, asset classes and market infrastructure Basic experience in SRE practices and Incident Management.
Posted 1 month ago
3.0 - 8.0 years
3 - 8 Lacs
Bengaluru / Bangalore, Karnataka, India
On-site
Maplelabs Solutions (A Unit of Xoriant Corporation) is conducting the virtual weekend drive for SRE Engineer role on 8th and 9th of March - Bangalore/ Chennai Educational Qualification : BE/MCA/ME/M Tech/MSc with Computer Science. Experience: 3 -10 years of experience in site reliability engineering, systems engineering. Experience in managing cloud environments (Preferably AWS, GCP, or Azure). Experience in building and maintaining CI/CD pipelines. Familiarity with containerization technologies (Docker, Kubernetes). Knowledge of infrastructure automation tools such as Terraform, Ansible, or Chef. Experience with monitoring and observability tools (Prometheus, Grafana, Datadog, Splunk etc.). Technical Skills: Strong proficiency in at least one scripting language (Python, Go, Bash, etc.). Experience with load balancing, caching, and distributed systems architecture. Understanding of version control tools (Git, GitLab, GitHub). Expertise in troubleshooting production systems. Understanding of networking, including DNS, HTTP, and TCP/IP.
Posted 1 month ago
3.0 - 8.0 years
3 - 8 Lacs
Bengaluru / Bangalore, Karnataka, India
On-site
Self-guided pragmatic individual with a proven track record in designing and delivering complex software solutions in the financial services industry. Motivated by the opportunity to make impactful deliveries for our businesses and clients Strong analytical skills Hungry to learn new concepts and technologies Ability to efficiently work within a global team Effective written and verbal communication skills Able to keep a commercial outlook while maintaining a focus on technical quality and attention to detail Skills And Experience We Are Looking For 3+ years of strong programming experience in Java Comfortable multi-tasking and working as part of a global team. Experience building and maintaining scalable and distributed systems. Familiarity with Test Driven Development and Business Driven Development Performant data structures and algorithms Experience of databases (SQL, NoSQL) Linux / Unix skills including scripting (shell) Knowledge of financial markets, asset classes and market infrastructure Basic experience in SRE practices and Incident Management.
Posted 1 month ago
10.0 - 20.0 years
35 - 65 Lacs
Bengaluru
Work from Office
Ensure reliability, scalability, and security of Wells Real-time Ops platforms. Lead SRE practices, observability, incident mgmt, and digital enablement of O&G wellsite workflows Required Candidate profile Experienced SRE with deep Azure/multi-cloud knowledge and Oil & Gas Wells domain expertise. Proven in driving platform reliability, observability, Agile delivery, and digital enablement
Posted 1 month ago
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
Accenture
39581 Jobs | Dublin
Wipro
19070 Jobs | Bengaluru
Accenture in India
14409 Jobs | Dublin 2
EY
14248 Jobs | London
Uplers
10536 Jobs | Ahmedabad
Amazon
10262 Jobs | Seattle,WA
IBM
9120 Jobs | Armonk
Oracle
8925 Jobs | Redwood City
Capgemini
7500 Jobs | Paris,France
Virtusa
7132 Jobs | Southborough