Get alerts for new jobs matching your selected skills, preferred locations, and experience range.
2.0 - 6.0 years
1 - 3 Lacs
, Singapore
On-site
Description We are looking for a Site Reliability Engineer to join our team. The ideal candidate will have 2-6 years of experience in a similar role and will be responsible for ensuring the reliability, scalability, and performance of our systems. Responsibilities Design, build, and maintain highly available systems Monitor and respond to system alerts and incidents Identify and resolve performance issues Develop and implement automation tools for system management Collaborate with development teams to ensure seamless deployment and operation of applications Maintain documentation of system architecture and processes Participate in on-call rotation for after-hours support Skills and Qualifications Bachelor's degree in Computer Science or related field 2-6 years of experience in a Site Reliability Engineer or similar role Strong knowledge of Linux systems administration Experience with configuration management tools such as Puppet, Chef, or Ansible Experience with cloud infrastructure providers such as AWS, Azure, or GCP Strong scripting skills in at least one language such as Python, Ruby, or Bash Familiarity with monitoring tools such as Nagios, Zabbix, or Prometheus Excellent problem-solving and troubleshooting skills Strong communication and collaboration skills
Posted 1 week ago
5.0 - 10.0 years
15 - 25 Lacs
Bengaluru
Hybrid
Dear candidate, We are looking SRE ( Site Reliability Engineer) for Bangalore location. Requirement 1: SRE(Artifactory) * GitLab setup & administration * Implement best practices to improve pipeline performance * AWS with Terraform coding * Linux administration & troubleshooting * Strong coding skills in any language (preferably Python) * Familiar with container technologies (Docker / Kubernetes) * Good knowledge of infrastructure and application monitoring (Prometheus / Grafana / Could watch) Requirement 2: SRE(GITLAB) * JFrog Artifactory setup & administration * JFrog XRAY setup & administration * AWS with Terraform coding * Linux administration & troubleshooting * Strong coding skills in any language (preferably Python) * Familiar with container technologies (Docker / Kubernetes) * Good knowledge of infrastructure and application monitoring (Prometheus / Grafana / Could watch) Location:- Bangalore (Whitefield) Work mode:- Hybrid Interview Mode:- Face to face (Monday - Friday) If interested, please share your cv at ruchika.gahlawat@innovasolutions.com.
Posted 1 week ago
5.0 - 10.0 years
4 - 8 Lacs
Bengaluru
Work from Office
We are looking for an experienced Senior BT Reliability Engineer to join our Business Technology team to maintain and continually improve our cloud-based services. The Site Reliability Engineering team in Bangalore is brand new, and builds foundational back-end infrastructure services and tooling for Okta s corporate teams. We enable teams to build infrastructure at scale and automate their software reliably and predictably. SREs are team players and innovators who build and operate technology using best practices and an agile mindset. We are looking for a smart, innovative, and passionate engineer for this role, someone who has a passion for designing complex and implementing cloud-based infrastructure. This is a new team, and the ideal candidate welcomes the challenge of building something new. They enjoy seeing their designs run at scale with automation, testing, and an excellent operational mindset. If you exemplify the ethics of, "If you have to do something more than once, automate it," we want to hear from you! Responsibilities Build and run development tools, pipelines, and infrastructure with a security-first mindset Actively participate in Agile ceremonies, write stories, and support team members through demos, knowledge sharing, and architecture sessions Promote and apply best practices for building secure, scalable, and reliable cloud infrastructure Develop and maintain technical documentation, network diagrams, runbooks, and procedures Designing, building, running, and monitoring Okta's IT infrastructure and cloud services Driving initiatives to evolve our current cloud platforms to increase efficiency and keep it in line with current security standards and best practices Recommend, develop, implement, and manage appropriate policy, standards, processes, and procedural updates Working with software engineers to ensure that development follows established processes and works as intended Create and maintain centralized technical processes, including container and image management Provide excellent customer service to our internal users and be an advocate for SRE services and DevOps practices Qualifications 5+ years of experience as a SRE, DevOps, Systems Engineer, or equivalent Demonstrated ability to develop complex applications for cloud infrastructure at scale and deliver projects on schedule and within budget Proficient in managing AWS multi-account environments and AWS authentication, governance, and using org management suite, including, but not limited to, AWS Orgs, AWS IAM, AWS Identity Center, and Stacksets Proficient with automating systems and infrastructure via Terraform Proficient in developing applications running on AWS or other cloud infrastructure resources, including compute, storage, networking, and virtualization Proficient with Git and building deployment pipeline using commercial tools, especially Github Actions Proficient with developing tooling and automation using Python Proficient with AWS container based workloads and concepts, especially EKS, ECS, and ECR. Experience with monitoring tools, especially Splunk, Cloudwatch, and Grafana Experience with reliability engineering concepts and security best practices on public cloud platforms Experience with image creation and management, especially for container and EC2 based workloads Knowledgeable with Linux system administration skills Familiar with configuration management tools, such as Ansible and SSM Familiar with Github Actions Runner Controller self-hosted runners Good communication skills, with the ability to influence others and communicate complex technical concepts to different audiences
Posted 1 week ago
6.0 - 8.0 years
13 - 18 Lacs
Gurugram
Work from Office
Responsibilities : - Define and enforce SLOs, SLIs, and error budgets across microservices - Architect an observability stack (metrics, logs, traces) and drive operational insights - Automate toil and manual ops with robust tooling and runbooks - Own incident response lifecycle: detection, triage, RCA, and postmortems - Collaborate with product teams to build fault-tolerant systems - Champion performance tuning, capacity planning, and scalability testing - Optimise costs while maintaining the reliability of cloud infrastructure Must have Skills : - 6+ years in SRE/Infrastructure/Backend related roles using Cloud Native Technologies - 2+ years in SRE-specific capacity - Strong experience with monitoring/observability tools (Datadog, Prometheus, Grafana, ELK etc.) - Experience with infrastructure-as-code (Terraform/Ansible) - Proficiency in Kubernetes, service mesh (Istio/Linkerd), and container orchestration - Deep understanding of distributed systems, networking, and failure domains - Expertise in automation with Python, Bash, or Go - Proficient in incident management, SLAs/SLOs, and system tuning - Hands-on experience with GCP (preferred)/AWS/Azure and cloud cost optimisation - Participation in on-call rotations and running large-scale production systems Nice to have skills : - Familiarity with chaos engineering practices and tools (Gremlin, Litmus) - Background in performance testing and load simulation (Gatling, Locust, k6, JMeter)
Posted 1 week ago
5.0 - 8.0 years
4 - 7 Lacs
Bengaluru
Work from Office
Key Responsibilities Building software Applications Is responsible to build software applications by using relevant development languages and applying knowledge of systems, services and tools appropriate for the business area and guide more junior members of the team in this topic.Is responsible to refactor and simplify code by introducing design patterns when necessary and guide more junior members of the team in this topic.Is responsible to ensure the quality of the application by following standard testing techniques and methods that adhere to the test strategyIs responsible to write readable and reusable code by applying standard patterns and using standard librariesIs responsible to maintain data security, integrity and quality by effectively following company standards and best practices Software Systems DesignIs responsible to evaluate possible architecture solutions by taking into account cost, business requirements, technology requirements and emerging technologiesIs responsible to describe the implications of changing an existing system or adding a new system to a specific area, by having a broad, high-level understanding of the infrastructure and architecture of our systemsIs responsible to help grow the business and/or accelerate software development by applying engineering techniques (e.g. prototyping, spiking and vendor evaluation) and standardsIs responsible to meet business needs by designing solutions that meet current requirements and are adaptable for future enhancements End to End System OwnershipIs responsible to own a service end to end by actively monitoring application health and performance, setting and monitoring relevant metrics and act accordingly when violatedIs responsible to reduce business continuity risks and bus factor by applying state-of-the-art practices and tools, and writing the appropriate documentation such as runbooks and OpDocsIs responsible to reduce risk and obtain customer feedback by using continuous delivery and experimentation frameworksIs responsible to independently manage an application or service by working through deployment and operations in production and guide more junior members of the team in this topic.Is responsible to maintain data security, integrity and quality by effectively following company standards and best practises Technical Incident ManagementIs responsible to address and resolve live production issues by mitigating the customer impact within SLAIs responsible to improve the overall reliability of systems by producing long term solutions through root cause analysisIs responsible to keep track of incidents by contributing to postmortem processes and logging live issues Automation and toil reductionIs responsible to ensure that infrastructure stays current by reducing technical debt, searching for bottlenecks and preparing for scalingIs responsible to reduce cost of operations and maintenance by leveraging new technologies, automation, and partner with vendors to ensure we stay currentIs responsible to reduce human labour by writing small software features that address availability, scalability, latency and efficiency Monitoring and Alerting improvementsIs responsible to review and verify performance of production systems and network infrastructure by continuously monitoring appropriate observability metrics, business KPIs and capacity planningIs responsible to improve application reliability by partnering with development teams to advise on setting appropriate observability metrics Critical ThinkingIs responsible to systematically identify patterns and underlying issues in complex situations, and to find solutions by applying logical and analytical thinking.Is responsible to constructively evaluate and develop ideas, plans and solutions by reviewing them, objectively taking into account external knowledge, initiating 'SMART' improvements and articulating their rationale. Continuous Quality and Process ImprovementIs responsible to identify opportunities for process, system and structural improvements (i.e performance gains) by examining and evaluating current process flows, methods and standards. Is responsible to design and implement relevant improvements by defining adapted/new process flows, standards, and practices that enable business performance. Effective CommunicationHas sufficient knowledge to deliver clear, well-structured, and meaningful information to a target audience by using suitable communication mediums and language tailored to the audienceHas sufficient knowledge to achieve mutually agreeable solutions by staying adaptable, communicating ideas in clear coherent language and practising active listeningHas sufficient knowledge to ask relevant (follow-up) questions to properly engage with the speaker and really understand what they are saying, by applying listening and reflection techniques Architectural GuidanceIs responsible to advise product teams towards a technical solution that meets the functional, nonfunctional & architectural requirements by challenging the rationale for an application design and providing context in the wider architectural landscapeHas sufficient knowledge to set a clear direction for a technical capability by evaluating and aligning the target architecture improvements, reframing architectural designs and decisions for varied stakeholder Coaching/MentoringHas basic knowledge to coach, guide and improve the overall performance of stakeholders and colleagues at all levels, when appropriate, by sharing experience, knowledge and approaches to work Communication.Stakeholder Track members Product stakeholders Peers Communication.Type Cooperation - Persuasion - Information Cooperation - Persuasion Cooperation - Persuasion Communication.Frequency Continuous Frequent Frequent Level of Education.Level of Education Master degree Years of relevant Job Knowledge.Years of relevant Job Knowledge Advanced Knowledge (5 - 8 years) Requirements of special knowledge/skills Building Software Applications Software System Design End to End System Ownership Technical Incident Management Operations (Automation & Toil) Observability (Monitoring & Alerting) Critical Thinking Continuous Quality & Process Improvement Effective Communication Architectural Guidance Coaching & Mentoring
Posted 1 week ago
7.0 - 12.0 years
18 - 22 Lacs
Pune
Work from Office
We are looking for a highly skilled Site Reliability Engineer (SRE) with strong engineering and architectural expertise to design, implement, and manage large-scale, mission-critical infrastructure across multiple data centers and cloud providers. As an SRE, you will be responsible for architecting and optimizing our global infrastructure, enabling development teams to roll out new features efficiently while maintaining high availability and reliability. You will be hands-on with automation, performance tuning, infrastructure scalability, and cloud-native technologies to ensure a seamless user experience for millions of customers. Key Responsibilities : 1. Architect and implement highly scalable, fault-tolerant, and distributed systems across multi-cloud (OCI, AWS, GCP) and on-premise environments using modern DevOps and SRE principles. 2. Design and deploy next-generation cloud infrastructure with a strong focus on automation, self-healing systems, and performance optimization. Develop and maintain infrastructure-as-code (IaC) using Terraform and configuration management tools such as Ansible and Puppet for automated provisioning and orchestration. 3. Build and optimize containerized environments using Kubernetes and Docker for seamless deployment and scaling. 4. Drive performance, scalability, and security improvements across our cloud and on-prem infrastructure, ensuring high availability and disaster recovery capabilities. Monitor, troubleshoot, and resolve complex system issues by implementing advanced observability solutions, logging, and real-time monitoring frameworks. 5. Develop and enforce SRE best practices, including SLI/SLO definition, capacity planning, and incident management strategies. 6. Eliminate toil and automate repetitive tasks using scripting languages such as Python, Golang, or Shell scripting to improve operational efficiency. 7. Collaborate closely with engineering, architecture, and security teams to improve system resiliency, optimize application performance, and streamline CI/CD workflows. Lead the transition of legacy systems to modern, cloud-native architectures, advocating for DevOps and infrastructure automation. 8. Participate in 24/7 on-call rotations, ensuring rapid response to critical incidents and driving post-mortem analysis for continuous improvement. Requirements : 1. 7+ years of hands-on experience in a Site Reliability Engineering (SRE) role, with a strong focus on designing, implementing, and managing cloud-native infrastructure. Proficient with any cloud platform (preferably OCI) -not just operational experience but actual design and implementation expertise. 2. Proven experience in building, deploying, and optimizing infrastructure-as-code (IaC) using Terraform. 3. Strong automation mindset with proficiency in Ansible, Puppet, or other configuration management tools. 4. Hands-on experience with container orchestration using Kubernetes, Docker, and microservices architecture. 5. Advanced scripting and automation skills in Python, Golang, or Shell scripting to eliminate manual operations. 6. Working knowledge of load balancing technologies (HAProxy, Nginx, F5, Varnish, dnsdist) and web servers (Apache, Nginx). 7. Strong understanding of networking, distributed systems, and observability tools (Prometheus, Grafana, ELK stack, Datadog). 8. Experience in designing and implementing highly available, scalable, and secure architectures across cloud and hybrid environments. 9. AWS and/or GCP certifications are a plus but not required. 10. This is not a support-focused role-we are looking for engineers who have built, deployed, and optimized complex distributed systems from the ground up.
Posted 2 weeks ago
5.0 - 8.0 years
13 - 17 Lacs
Gurugram
Work from Office
POSITION SUMMARY : In this role, you will play a crucial part in shaping the firm's infrastructure reliability and efficiency by implementing robust Site Reliability Engineering practices. Your contribution will be pivotal in ensuring the availability, scalability, and performance of our systems and applications. Leveraging your strong technical skills and expertise in DevOps principles, you will work towards enhancing the reliability of our infrastructure and minimizing downtime, thus enabling the organization to deliver high-quality software with maximum efficiency EXPERIENCE AND REQUIRED SKILL SETS : - Ensure 24-7 uptime and stability of production systems - Investigate and troubleshoot production issues - Collaborate with developers to optimize system performance - Participate in on-call rotation to provide 24/7 support for critical systems - Work on automation and enhancements to reduce manual processes / intervention. - Relevant 5+ years of experience in SRE / Production/Product Support role, with a track record of implementing SRE practices - Basic understanding of cloud solutions provided by providers such as AWS or Azure. - Basic-Intermediate knowledge of Scripting in either of Bash/Python/PowerShell. - Good presentation, communication and interpersonal skills with the ability to collaborate effectively with cross-functional teams and stakeholders across different countries and cultures. - Good problem solving and troubleshooting skills - Continuous learning mindset and willingness to adapt to new technologies and industry trends. - Good Understanding of Operating System Commands (Linux), SQL (Ability to write, analyze queries and deduce / build important information per requirement) - In-depth knowledge of Trading Life Cycle: The candidate should possess a comprehensive understanding of trading life cycle, including order management, trade execution, settlement and post-trade processes. Familiarity with various financial products like Equities, Derivatives, Currencies, Commodities, FX is a plus. - Incident and Problem Management Expertise: The candidate must demonstrate strong problem-solving skills and the ability to manage incidents frequently and efficiently within a fast paced trading environment. This includes identifying, analyzing and resolving issues related to trading systems and processes as well as collaborating with cross-functional teams to implement long-term solutions and improve operational efficiency. - Good Understanding of Tools : (a) Orchestration Autosys / Airflow or Cron (b) Monitoring & Logging PagerDuty, Prometheus & Grafana or Datadog, Splunk (c) Project Management / ITSM Service Now (Basic ability to navigate / create change tickets / incidents) , Jira (Basic ability to create Jira Tickets , ability to filter your work) EDUCATION : - Bachelors degree or masters in computer science, Engineering, Software Engineering or a relevant field
Posted 2 weeks ago
5.0 - 7.0 years
3 - 7 Lacs
Pune
Remote
We are seeking a Grafana Implementation Expert with deep expertise in Grafana and Prometheus, focusing on core development and customization rather than SRE or DevOps responsibilities. This role requires a specialist in monitoring tools, responsible for designing, developing, and optimizing Grafana dashboards, plugins, and data sources to provide real-time observability and analytics. Key Responsibilities : - Develop, customize, and optimize Grafana dashboards with advanced visualizations, queries, and alerting mechanisms.- Integrate Grafana with Prometheus and other data sources (i.e. Loki, InfluxDB, Elasticsearch, MySQL, PostgreSQL, OpenTelemetry).- Extend Grafana capabilities by developing custom plugins, panels, and data sources using JavaScript, TypeScript, React, and Go.- Optimize Prometheus queries (PromQL) and storage solutions to ensure efficient data retrieval and visualization.- Automate dashboard provisioning using JSON, Terraform, or Grafana APIs for seamless deployment across environments.- Work closely with engineering teams to translate monitoring requirements into scalable and maintainable solutions.- Troubleshoot and enhance Grafana performance, including load balancing, scaling, and security hardening.- Implement advanced alerting mechanisms using Alertmanager, Grafana Alerts, and webhook integrations.- Stay updated on Grafana ecosystem advancements and contribute to best practices in observability tooling.- Document configurations, implementation guidelines, and best practices for internal stakeholders. Required Skills & Experience : - 5+ years of experience in monitoring and observability tools with a strong focus on Grafana and Prometheus.- Expertise in Grafana internals, including API usage, dashboard templating, and custom plugin development.- Strong hands-on experience with Prometheus, including metric collection, relabeling, and PromQL queries.- Proficiency in JavaScript, TypeScript, React, and Go for Grafana plugin and dashboard development.- Familiarity with infrastructure monitoring, including Kubernetes, cloud services (AWS, GCP, Azure), and system-level metrics. - Experience with time-series databases and log aggregation tools (i.e., Loki, Elasticsearch, InfluxDB). - Knowledge of security best practices in Grafana, including authentication, RBAC, and API security.- Experience with automation and infrastructure-as-code (IaC) for monitoring stack deployment.- Strong problem-solving skills with the ability to debug and optimize dashboards and alerting configurations.- Excellent communication and documentation skills to collaborate with cross-functional teams. Preferred Qualifications : - Grafana Certified Observability Engineer or equivalent certifications.- Experience contributing to open-source Grafana projects or plugin development.- Knowledge of distributed tracing tools like Jaeger or Zipkin.- Familiarity with service meshes (Istio, Linkerd) and their monitoring strategies.- This is a high-impact role focused on developing and enhancing Grafana-based monitoring solutions for enterprise-grade observability
Posted 3 weeks ago
3.0 - 8.0 years
16 - 20 Lacs
Mumbai
Work from Office
What will you do at Fynd? - Run the production environment by monitoring availability and taking a holistic view of system health. - Improve reliability, quality, and time-to-market of our suite of software solutions - Be the 1st person to report the incident. - Debug production issues across services and levels of the stack. - Envisioning the overall solution for defined functional and non-functional requirements, and being able to define technologies, patterns and frameworks to realise it. - Building automated tools in Python / Java / GoLang / Ruby etc. - Help Platform and Engineering teams gain visibility into our infrastructure. - Lead design of software components and systems, to ensure availability, scalability, latency, and efficiency of our services. - Participate actively in detecting, remediating and reporting on Production incidents, ensuring the SLAs are met and driving Problem Management for permanent remediation. - Participate in on-call rotation to ensure coverage for planned/unplanned events. - Perform other task like load-test & generating system health reports. - Periodically check for all dashboards readiness. - Engage with other Engineering organizations to implement processes, identify improvements, and drive consistent results. - Working with your SRE and Engineering counterparts for driving Game days, training and other response readiness efforts. - Participate in the 24x7 support coverage as needed Troubleshooting and problem-solving complex issues with thorough root cause analysis on customer and SRE production environments - Collaborate with Service Engineering organizations to build and automate tooling, implement best practices to observe and manage the services in production and consistently achieve our market leading SLA. - Improving the scalability and reliability of our systems in production. - Evaluating, designing and implementing new system architectures. Some specific Requirements : - B.Tech. in Engineering, Computer Science, technical degree, or equivalent work experience - At least 3 years of managing production infrastructure. - Leading / managing a team is a huge plus. - Experience with cloud platforms like - AWS, GCP. - Experience developing and operating large scale distributed systems with Kubernetes, Docker and and Serverless (Lambdas) - Experience in running real-time and low latency high available applications (Kafka, gRPC, RTP) - Comfortable with Python, Go, or any relevant programming language. - Experience with monitoring alerting using technologies like Newrelic / zybix /Prometheus / Garafana / cloudwatch / Kafka / PagerDuty etc. - Experience with one or more orchestration, deployment tools, e. CloudFormation / Terraform / Ansible / Packer / Chef. - Experience with configuration management systems such as Ansible / Chef / Puppet. - Knowledge of load testing methodologies, tools like Gating, Apache Jmeter. - Work your way around Unix shell. - Experience running hybrid clouds and on-prem infrastructures on Red Hat Enterprise Linux / CentOS - A focus on delivering high-quality code through strong testing practices.
Posted 3 weeks ago
6.0 - 10.0 years
13 - 17 Lacs
Hyderabad
Remote
Mode of Interview : 2-3 rounds (Virtual/Inperson) Notice : Immediate - 15 Days Max Technical Skill Requirements : ServiceNow Business Analyst, ITIL, ITSM, Dashboard Creation, APM, Scripting, Datadog Role and Responsibilities : - 6+ Years of experience into SRE Engineer , having thorough knowledge on ITIL/ITSM process - Certification in ITIL v4 framework and deep knowledge of ITSM platforms preferable - Hands on experience on APM tool Datadog - Demonstrable ability to implement complex process workflows, and evidence performance through metrics-driven reporting - Strong understanding of IT Operations - Strong written and verbal communication skills with the ability to understand and present complex technical information in a clear and concise manner to a variety of audiences including executive leadership - Ability to develop strategic relationships with other teams, departments, business stakeholders, and 3rd parties - Ability to understand business requirements and define KPIs which can showcase stability of the application in production and give meaningful insights to business - Proven trouble-shooting experience and strong incident reduction-minded focus - Should be able to unsurfaced recurring issues and Toil and suggest automations - Strong problem-solving skills and the ability to think quickly and execute on short-time frames
Posted 3 weeks ago
8.0 - 13.0 years
15 - 25 Lacs
Hyderabad
Work from Office
Greetings from AIS!! AIS (Applied Information Sciences) is a highly regarded software and systems engineering firm providing professional application development services to commercial and government clients since 1982. One of Microsofts oldest and largest Managed Gold partners in the U.S., AIS is exclusively focused on building enterprise-class custom applications using Microsoft technologies. As we continue to experience extraordinary growth, we are seeking professionals to join our AIS Team in India. For more information, please visit: http://www.ais.com https://www.ais.com/blog/ Job Summary: Role: Site Reliability Engineer Mode of Hire: Full-time / Contract opportunity Responsibilities The Site reliability engineer will bring enhanced reliability, performance, and security to the project. Implementing comprehensive monitoring solutions to track system performance, detect anomalies, and prevent outages Setting up real-time alerts to quickly respond to issues, minimizing downtime and ensuring continuous service availability Automating routine tasks such as deployments, backups, and scaling, which reduces manual intervention and increases efficiency Integrating Continuous Integration/Continuous Deployment (CI/CD) pipelines to streamline the development and deployment process Optimizing the use of cloud resources to ensure cost-effectiveness and high performance Implementing load balancing strategies to distribute traffic evenly and prevent bottlenecks Applying security best practices to protect sensitive data and ensure compliance with regulatory requirements Regularly scanning for and addressing vulnerabilities to maintain a secure environment Developing and implementing incident response plans to quickly address and resolve issues Establishing disaster recovery protocols to ensure data integrity and service continuity in case of failures Working closely with development, operations, and business teams to align technical solutions with business goals Creating detailed documentation and providing training to ensure all team members are equipped to handle the system Requirements Oracle Cloud infrastructure experience Proficiency in oracle databases including performance tuning and optimization Scripting skills in Json, python Familiarity with CI/CD pipelines to ensure smooth deployments Understanding of security principles and practices to protect data and systems knowledge of regulatory requirements and how to implement them within Oracle Cloud Ability to work effectively with cross-functional teams, including developers and operations Communication Skills: Strong verbal and written communication skills to articulate technical issues and solutions If you are interested, please reply to me to meghana.mandhala@ais.com Thanks & Regards, Meghana Reddy M Sr. Talent Aquisition Business Partner
Posted 3 weeks ago
9.0 - 14.0 years
20 - 35 Lacs
Bengaluru
Work from Office
Lead automation and expense management initiatives across global network platforms. Ensure secure, cost-effective operations, enhance reliability via SRE practices, and oversee vendor TEM performance, reporting, and billing accuracy. Required Candidate profile Exp in network automation, CI/CD, and cost governance. Skilled in SRE, telecom expense management, circuit cleanup, vendor coordination, and performance reporting using Power BI and Microsoft 365.
Posted 3 weeks ago
10.0 - 18.0 years
30 - 45 Lacs
Bengaluru
Work from Office
Lead and support RF, Voice/IPT, telephony, and mobile infrastructure globally. Drive innovation, reliability, and automation across network platforms, ensuring secure, scalable, and high-performance communication systems. Required Candidate profile Experienced in RF design, VOIP/IPT systems, UC tools, wireless/mobility, and SRE practices. Skilled in Tier-3 support, automation, and vendor management.
Posted 3 weeks ago
5 - 10 years
7 - 12 Lacs
Bengaluru
Work from Office
Engineering Manager - Site Reliability The role of Engineering Manager - Site Reliability , is to primarily manage, mentor and develop a team of Site Reliability Engineers, ensuring the development of both (the individual and team as a whole) are in line with organizational objectives and direction. Manages all activities in scope through the direction of activities, to design new products and modify existing designs, ensuring that deliverables are on time and with acceptable quality. The role holder is required to analyze technology trends, human resource needs, and market demand to plan projects to ensure resilience in line with current demand and future ambition. In addition to this, the role will confer with leaders, production, key stakeholders and marketing teams to determine engineering feasibility, cost effectiveness, scalability and time-to-market for new and existing products. What youll be doing: Managing People Inspire, grow and develop individuals by helping the creation of their personal development plan, leveraging available learning resources and offering stretch opportunities. Get things done in the right way by taking ownership, being proactive and collaborating with business counterparts, peers, other craft managers and stakeholders. Ensure delivery by tracking team health metrics and KPIs, monitoring roadmap progress, identifying blockers and resolving or escalating them. End to End System Ownership Own a service end to end by actively monitoring application health and performance, setting and monitoring relevant metrics and act accordingly when violated. Reduce business continuity risks and bus factor by applying state-of-the-art practices and tools, and writing the appropriate documentation such as runbooks and OpDocs. Independently manage an application or service by working through deployment and operations in production and guide more junior members of the team in this topic. Technical Incident Management Address and resolve live production issues by mitigating the customer impact within SLA. improve the overall reliability of systems by producing long term solutions through root cause analysis. Keep track of incidents by contributing to postmortem processes and logging live issues. Building software applications Build software applications by using relevant development languages and applying knowledge of systems, services and tools appropriate for the business area. Write readable and reusable code by applying standard patterns and using standard libraries. Refactor and simplify code by introducing design patterns when necessary. Ensure the quality of the application by following standard testing techniques and methods that adhere to the test strategy. Maintain data security, integrity and quality by effectively following company standards and best practices. Architectural Guidance Has sufficient knowledge to advise product teams towards a technical solution that meets the functional, nonfunctional & architectural requirements by challenging the rationale for an application design and providing context in the wider architectural landscape Set a clear direction for a technical capability by evaluating and aligning the target architecture improvements, reframing architectural designs and decisions for varied stakeholders. What youll bring: Strong people management skills and experience; Excellent communicator with strong stakeholder management experience, good commercial awareness and technical vision; You are a humble and thoughtful technology leader, you lead by example and gain your teammates respect through actions, not the title; Experience in software development, building complex and scalable solutions; Proven experience leading and managing a team of engineers in a fast-paced and complex environment; Solid experience in at least one programming language (Java, C/C++, Python, Go) Ability to formulate software solutions from scratch Solid understanding of Service Oriented Architecture, Microservices & OOP patterns Hands-on experience in Linux administration and troubleshooting Creative approach to problem-solving Practical experience in understanding and defining SLIs and SLOs Past experience with Payments or FinTech and working in a regulated environment is a plus; Strong analytical skills and data-driven mindset. Key Skills Job Description - Engineering Manager - Site Reliability The role of Engineering Manager - Site Reliability, is to primarily manage, mentor and develop a team of Site Reliability Engineers, ensuring the development of both (the individual and team as a whole) are in line with organizational objectives and direction. Manages all activities in scope through the direction of activities, to design new products and modify existing designs, ensuring that deliverables are on time and with acceptable quality. The role holder is required to analyze technology trends, human resource needs, and market demand to plan projects to ensure resilience in line with current demand and future ambition. In addition to this, the role will confer with leaders, production, key stakeholders and marketing teams to determine engineering feasibility, cost effectiveness, scalability and time-to-market for new and existing products. FinTech is a complex, competitive and exciting industry. To accomplish Booking.coms mission (making it easier for everyone to experience the world), we aim to offer frictionless payment experiences to our guests and partners. The FinTech business unit creates best in class payment products that offer choice to guests and help Bookings business partners grow their business. What youll be doing: Managing People Inspire, grow and develop individuals by helping the creation of their personal development plan, leveraging available learning resources and offering stretch opportunities. Get things done in the right way by taking ownership, being proactive and collaborating with business counterparts, peers, other craft managers and stakeholders. Ensure delivery by tracking team health metrics and KPIs, monitoring roadmap progress, identifying blockers and resolving or escalating them. End to End System Ownership Own a service end to end by actively monitoring application health and performance, setting and monitoring relevant metrics and act accordingly when violated. Reduce business continuity risks and bus factor by applying state-of-the-art practices and tools, and writing the appropriate documentation such as runbooks and OpDocs. Independently manage an application or service by working through deployment and operations in production and guide more junior members of the team in this topic. Technical Incident Management Address and resolve live production issues by mitigating the customer impact within SLA. improve the overall reliability of systems by producing long term solutions through root cause analysis. Keep track of incidents by contributing to postmortem processes and logging live issues. Building software applications Build software applications by using relevant development languages and applying knowledge of systems, services and tools appropriate for the business area. Write readable and reusable code by applying standard patterns and using standard libraries. Refactor and simplify code by introducing design patterns when necessary. Ensure the quality of the application by following standard testing techniques and methods that adhere to the test strategy. Maintain data security, integrity and quality by effectively following company standards and best practices. Architectural Guidance Has sufficient knowledge to advise product teams towards a technical solution that meets the functional, nonfunctional & architectural requirements by challenging the rationale for an application design and providing context in the wider architectural landscape Set a clear direction for a technical capability by evaluating and aligning the target architecture improvements, reframing architectural designs and decisions for varied stakeholders. What youll bring: Strong people management skills and experience; Excellent communicator with strong stakeholder management experience, good commercial awareness and technical vision; You are a humble and thoughtful technology leader, you lead by example and gain your teammates respect through actions, not the title; Experience in software development, building complex and scalable solutions; Proven experience leading and managing a team of engineers in a fast-paced and complex environment; Solid experience in at least one programming language (Java, C/C++, Python, Go) Ability to formulate software solutions from scratch Solid understanding of Service Oriented Architecture, Microservices & OOP patterns Hands-on experience in Linux administration and troubleshooting Creative approach to problem-solving Practical experience in understanding and defining SLIs and SLOs Past experience with Payments or FinTech and working in a regulated environment is a plus; Strong analytical skills and data-driven mindset.
Posted 1 month ago
3 - 8 years
19 - 22 Lacs
Kolkata, Hyderabad, Pune
Work from Office
Experienced in .NET (3–5 yrs), DevOps/SRE (3+ yrs), CI/CD, Git, IaC, Agile, cloud-native apps, observability, KQL/SQL, and cross-functional DevOps solutions in production environments. Mail:kowsalya.k@srsinfoway.com
Posted 1 month ago
5 - 9 years
22 - 27 Lacs
Pune, Chennai, Bengaluru
Hybrid
#Hiring for below position #Immediate joiner or 15 days Job Title: Senior .Net Developer Experience: 5 - 9 years Job Location: Pan India (Hybrid) Key Requirements: Proficiency in writing production code with an industry standard programming language using Agile methodologies. Proficiency practicing Infrastructure as Code and Configuration as Code techniques Proficiency managing multiple code bases in Git Proficiency creating Continuous Integration builds and deployment automation, for example CI/CD Pipelines Proficiency building Cloud Native applications in a major public cloud Proficiency implementing observability, application monitoring, and log aggregation solutions Proficiency working with cross functional teams to provide DevOps inspired solutions Delivery Insights Team Specific Skills Experience in building customer facing data insights and reporting that span across the enterprise. Proficiency with Grafana Cloud stack. Comfortable configuring various Grafana cloud components, including data sources, permissions, and expanded feature set. Proficiency with Kusto Query Language (KQL). Building and using complex queries to include various merge, join, and sort operations. Will accept equivalent SQL syntax knowledge for certain applicants. Experience in Azure Function Apps. Building, supporting, and operating a modern .net code base across the entire development life cycle. Experience in Azure SQL or Postgres database systems Experience in various components of Azure Devops Webhook configuration and creation Rest API knowledge and ability to interpret reporting needs directly to data availability Comfort with how teams use Azure DevOps to complete the SDLC process, including work item management, repositories, pipelines, and access control. If you are interested, please share your updated CV on this email ID aashifjabarulla@tsit.co.in OR kousalya.v@tsit.co.in +91 9047052352
Posted 1 month ago
10 - 14 years
8 - 12 Lacs
Pune
Work from Office
Site Reliability Engineers at UKG are team members that have a breadth of knowledge encompassing all aspects of service delivery. They develop software solutions to enhance, harden and support our service delivery processes. This can include building and managing CI/CD deployment pipelines, automated testing, capacity planning, performance analysis, monitoring, alerting, chaos engineering and auto remediation. Site Reliability Engineers must have a passion for learning and evolving with current technology trends. They strive to innovate and are relentless in their pursuit of a flawless customer experience. They have an automate everything mindset, helping us bring value to our customers by deploying services with incredible speed, consistency and availability. Primary/Essential Duties and Key Responsibilities: Proficient in Splunk/ELK, and Datadog. Experience with observability tools such as Prometheus/InfluxDB, and Grafana. Possesses strong knowledge of at least one scripting language such as Python, Bash, Powershell or any other relevant languages. Design, develop, and maintain observability tools and infrastructure. Collaborate with other teams to ensure observability best practices are followed. Develop and maintain dashboards and alerts for monitoring system health. Troubleshoot and resolve issues related to observability tools and infrastructure. Engage in and improve the lifecycle of services from conception to EOL, including: system design consulting, and capacity planning Define and implement standards and best practices related to: System Architecture, Service delivery, metrics and the automation of operational tasks Support services, product & engineering teams by providing common tooling and frameworks to deliver increased availability and improved incident response. Improve system performance, application delivery and efficiency through automation, process refinement, postmortem reviews, and in-depth configuration analysis Collaborate closely with engineering professionals within the organization to deliver reliable services Identify and eliminate operational toil by treating operational challenges as a software engineering problem Actively participate in incident response, including on-call responsibilities Partner with stakeholders to influence and help drive the best possible technical and business outcomes Guide junior team members and serve as a champion for Site Reliability Engineering Engineering degree, or a related technical discipline, and 10+years of experience in SRE. Experience coding in higher-level languages (e.g., Python, Javascript, C++, or Java) Knowledge of Cloud based applications & Containerization Technologies Demonstrated understanding of best practices in metric generation and collection, log aggregation pipelines, time-series databases, and distributed tracing Ability to analyze current technology utilized and engineering practices within the company and develop steps and processes to improve and expand upon them Working experience with industry standards like Terraform, Ansible. (Experience, Education, Certification, License and Training) Must have hands-on experience working within Engineering or Cloud. Experience with public cloud platforms (e.g. GCP, AWS, Azure) Experience in configuration and maintenance of applications & systems infrastructure. Experience with distributed system design and architecture Experience building and managing CI/CD Pipelines
Posted 2 months ago
5 - 8 years
7 - 10 Lacs
Bengaluru
Work from Office
Job Summary As a Cloud Infrastructure/Site Reliability Engineer, you will be operating at the intersection of development and operations. Your role will involve engaging in and enhancing the lifecycle of cloud services - from design through deployment, operation, and refinement. You will be responsible for maintaining these services by measuring and monitoring their availability, latency, and overall system health. You will play a crucial role in sustainably scaling systems through automation and driving changes that improve reliability and velocity. As part of your responsibilities, you will administer cloud-based environments that support our SaaS/IaaS offerings, which are implemented on a microservices, container-based architecture (Kubernetes). In addition, you will oversee a portfolio of customer-centric cloud services (SaaS/IaaS), ensuring their overall availability, performance, and security. You will work closely with both NetApp and cloud service provider teams, including those from Google, located across the globe in regions such as RTP, Reykjavk, Bangalore, Sunnyvale, Redmond, and more. Due to the critical nature of the services we support, this position involves participation in a rotation-based on-call schedule as part of our global team. This role offers the opportunity to work in a dynamic, global environment, ensuring the smooth operation of vital cloud services. To be successful in this role, you should be a motivated self-starter and self-learner, possess strong problem-solving skills, and be someone who embraces challenges. Job Requirements Incident Response and Troubleshooting: Address and perform root cause analysis (RCA) of complex live production incidents and cross-platform issues involving OS, Networking, and Database in cloud-based SaaS/IaaS environments. Implement SRE best practices for effective resolution. Analysis, and Infrastructure Maintenance: Continuously monitor, analyze, and measure system health, availability, and latency using tools like Prometheus, Stackdriver, ElasticSearch, Grafana, and SolarWinds. Develop strategies to enhance system and application performance, availability, and reliability. In addition, maintain and monitor the deployment and orchestration of servers, docker containers, databases, and general backend infrastructure. Document system knowledge as you acquire it, create runbooks, and ensure critical system information is readily accessible. Security Management: Stay updated with security protocols and proactively identify, diagnose, and resolve complex security issues. Automation and Efficiency: Identify tasks and areas where automation can be applied to achieve time efficiencies and risk reduction. Develop software for deployment automation, packaging, and monitoring visibility. Issue Tracking and Resolution: Use Atlassian Jira, Google Buganizer, and Google IRM to track and resolve issues based on their priority. Team Collaboration and Influence: Work in tandem with other Cloud Infrastructure Engineers and developers to ensure maximum performance, reliability, and automation of our deployments and infrastructure. Additionally, consult and influence developers on new feature development and software architecture to ensure scalability. Debugging, Troubleshooting, and Advanced Support: Undertake debugging and troubleshooting of service bottlenecks throughout the entire software stack. Additionally, provide advanced tier 2 and 3 support for NetApp's Cloud Data Services solutions. Directly influence the decisions and outcomes related to solution implementation: measure and monitor availability, latency, and overall system health. Proficiency in Linux/Unix and CORE OS. Demonstrated experience in scripting and infrastructure automation using tools such as Ansible, Python, Go or Ruby. Deep working knowledge of Containers, Kubernetes, and Serverless computing implementation. DevOps development methodologies. Familiarity with distributed systems design patterns using tools such as Kubernetes. Experience with cloud platforms such as AWS, Azure, or Google Cloud. Education A minimum of 5- 8 years of experience is required. A Bachelor of Science Degree in Computer Science, a masters degree; or equivalent experience is required.
Posted 2 months ago
4 - 8 years
20 - 35 Lacs
Chennai
Remote
Senior Software Engineer Experience: 4 - 6 Years Exp Salary : Upto INR 35,00,000 / year Preferred Notice Period : Within 30 Days Shift : 10:00AM to 7:00PM IST Opportunity Type: Remote Placement Type: Permanent (*Note: This is a requirement for one of Uplers' Clients) Must have skills required : CI/CD, Site Reliability, Terraform, AWS, Kubernetes Good to have skills : Ansible, AWS Certification, DevSecOps, Azure, Docker Axelarant Technologies (One of Uplers' Clients) is Looking for: Senior Software Engineer who is passionate about their work, eager to learn and grow, and who is committed to delivering exceptional results. If you are a team player, with a positive attitude and a desire to make a difference, then we want to hear from you. Role Overview Description Why Should You Become a Site Reliability Engineer At Axelerant? Does managing and extending massive cloud platforms get your creative mind racing? Do you love building and debugging open-source, LAMP-stack architectures using cutting-edge technologies? Are you ready to ditch the commute and work from wherever is comfortable for you? If yes, then Axelerant is looking for a Senior Engineer like you. As a Senior Software Engineer - Site Reliability, you will be implementing automated solutions for multiple customers in various industries. Further, you will employ industry-leading continuous integration, delivery, and deployment patterns while collaboratively working with peers to execute them towards successful solutions. This role is hands-on development and operations and will be committing code to repositories daily. Responsibilities Responsible for understanding and implementing solutions to meet desired business outcomes and standards sustainably Responsible for design, deployment, and support automation of continuous integration, continuous delivery, and continuous deployment (CI/CD) pipeline operations per account and organizational requirements ¢ Participate in design, build, and on-call support of various cloud, container, and on-premises platforms through standalone and integration operations ¢ Strategize, review, design, and implement safe, secure, scalable, and easily maintained IT infrastructure for the organization and clients ¢ Responsible for ensuring that operational and service level agreements are operationally met through SLI and SLO monitoring, analysis, and incident responses ¢ Participate in planning team structure, activities, and involvement in project management activities plus proactive support thereof. Skills, Knowledge, and Expertise ¢ 4+ years of professional site reliability career experience ¢ 1+ years of experience using agile methodologies; Git source code versioning and Pull Requests ¢ Experience with a scripting language like Python, Java, JavaScript, C#, Ruby, PHP, or SmashTest ¢ Experience with automation platforms tools like TravisCI, CircleCI, Jenkins, or GitLabCI ¢ Experience with cloud provider Amazon Web Services (AWS) ¢ Experience with container and orchestration technology Kubernetes ¢ Experience with Infrastructure as a Code (IaC) and configuration management tool Terraform. ¢ Experience with monitoring, APM, and alerting tools like Newrelic, Pingdom, or Pagerduty ¢ Experience with twelve-factor development methodology for building software-as-a-service applications ¢ Ability to automate tasks using a scripting language ¢ Ability to use shell extensively and use regular expressions comfortably ¢ Strong communication skills and ability to partner across organizations Good To Have ¢ Certifications: Amazon Web Services and/or Kubernetes Administrator ¢ Experience with Governance best practices and DEVSECOPS methodologies ¢ Familiar with the 3factor application architecture pattern ¢ Gets the big picture ¢ Experience in managing multiple priorities and competing demands ¢ Exposure to cloud providers like Microsoft Azure, or GCP ¢ Exposure to with IaC and configuration management tools like Ansible, Chef, Puppet, or Salt ¢ Knowledge of other container and orchestration technologies like Docker or OpenShift. What Would Success Look Like For You? As a Senior Software Engineer, success means maintaining system availability, scalability, and reliability through proactive monitoring, automation, and incident management. It involves collaborating with development teams, staying current on DevOps and SRE practices, and potentially advancing to a Staff Software Engineer role in four years at Axelerant. Your Works Impact As a Site Reliability Engineer, your work directly impacts the stability and performance of our digital infrastructure, enabling our applications to deliver exceptional experiences to users without interruption. By championing reliability and scalability, you enhance our company's reputation for providing dependable services, boosting customer trust and satisfaction. Collaborating cross-functionally, you contribute to a culture of operational excellence, fostering innovation and ensuring our systems can withstand the challenges of a rapidly evolving technological landscape. Your role is integral to maintaining our company's competitive edge in a digital-first world. How to apply for this opportunity: Easy 3-Step Process: 1. Click On Apply! And Register or log in on our portal 2. Upload updated Resume & Complete the Screening Form 3. Increase your chances to get shortlisted & meet the client for the Interview! About Our Client: As a global company that puts care into employee happiness, engineering excellence, and customer success, we are in striking contrast to the typical outsourcing option. We are a diverse team working remotely across many time zones, with success stories that back up capabilities, and a reputation for an unconventional work environment that empowers. We are the individuals directly challenging what it means to do global delivery differently for employees and partners. About Uplers: Our goal is to make hiring and getting hired reliable, simple, and fast. Our role will be to help all our talents find and apply for relevant product and engineering job opportunities and progress in their career. (Note: There are many more opportunities apart from this on the portal.) So, if you are ready for a new challenge, a great work environment, and an opportunity to take your career to the next level, don't hesitate to apply today. We are waiting for you!
Posted 2 months ago
4 - 9 years
6 - 11 Lacs
Mumbai
Hybrid
Role: Site Reliability Engineers (SREs) in Google Cloud Platform (GCP) and RedHat OpenShift administration. Responsibilities: System Reliability: Ensure the reliability and uptime of critical services and infrastructure. Google Cloud Expertise: Design, implement, and manage cloud infrastructure using Google Cloud services. Automation: Develop and maintain automation scripts and tools to improve system efficiency and reduce manual intervention. Monitoring and Incident Response: Implement monitoring solutions and respond to incidents to minimize downtime and ensure quick recovery. Collaboration: Work closely with development and operations teams to improve system reliability and performance. Capacity Planning: Conduct capacity planning and performance tuning to ensure systems can handle future growth. Documentation: Create and maintain comprehensive documentation for system configurations, processes, and procedures. Qualifications: Education: Bachelors degree in computer science, Engineering, or a related field. Experience: 4+ years of experience in site reliability engineering or a similar role. Skills: Proficiency in Google Cloud services (Compute Engine, Kubernetes Engine, Cloud Storage, BigQuery, Pub/Sub, etc.). Familiarity with Google BI and AI/ML tools (Looker, BigQuery ML, Vertex AI, etc.) Experience with automation tools (Terraform, Ansible, Puppet). Familiarity with CI/CD pipelines and tools (Azure pipelines Jenkins, GitLab CI, etc.). Strong scripting skills (Python, Bash, etc.). Knowledge of networking concepts and protocols. Experience with monitoring tools (Prometheus, Grafana, etc.). Preferred Certifications: Google Cloud Professional DevOps Engineer Google Cloud Professional Cloud Architect Red Hat Certified Engineer (RHCE) or similar Linux certification Employee Type: Permanent
Posted 3 months ago
8 - 12 years
25 - 30 Lacs
Bengaluru
Work from Office
Role Description Deutsche Bank API Platforms and Integration Services team orchestrates internal and external API Platforms, portals, enabling services and embedded finance products in global level. The team is highly skilled and innovative group dedicated to developing cutting-edge solutions and services that leverage the power of APIs to drive digital transformation and enhance the banking experience for clients worldwide. Your key responsibilities Application Monitoring Proactively monitor application stability using Splunk and New Relic. Set up alerting and automated responses to minimize downtime. Perform root cause analysis and manage incidents for issue resolution. Monitor system performance, identify bottlenecks, and collaborate on optimizations. User Support Assist users with UI-related issues and provide effective resolutions. Create and maintain user-friendly documentation for self-service support. Develop and maintain incident response procedures for rapid issue resolution. Enhance troubleshooting tools and processes for improved efficiency. Be proficient in the web application's user interface for user support. Diagnose and resolve complex technical issues affecting the web application. Collaboration with Scrum Team Collaborate with UI/UX designers and developers to enhance the user experience. Actively participate in Scrum team activities, including stand-ups and sprint planning. Ensure seamless integration of reliability and performance enhancements into development. Collaborate with the team to prioritize and track defect and improvement request progress. Product Continuous Improvement Maintain open communication with the Product Owner for product alignment. Ensure SRE tasks align with the product's strategic goals. Participate in backlog refinement meetings to prioritize SRE-related work items. Suggest UI improvements based on user feedback and usage patterns. Identify, document, and communicate defects and improvement opportunities. Other Execute releases and contribute to the deployment process. Provide on-call support as part of a rotation for 24/7 incident response. Your skills and experience Utilize containerization technologies (e.g., Docker) and orchestration platforms (e.g., Kubernetes). Knowledge in GCP other similar cloud technologies Possess strong API knowledge, ensuring availability and reliability.
Posted 3 months ago
3 - 5 years
0 - 3 Lacs
Bengaluru, Hyderabad
Work from Office
Job Title: SRE/DevOps Engineer with PowerShell Expertise Location: Bengaluru/Hyderabad Job Type: Full-Time Job Summary: The SRE/DevOps Engineer with PowerShell Expertise Specialist will be responsible for automating and streamlining IT processes, managing system configurations, and enhancing operational efficiency through the development and implementation of PowerShell scripts. Key Responsibilities: Infrastructure Management: Design, implement, and manage scalable and reliable infrastructure solutions using cloud platforms such as AWS, Azure, or Google Cloud Platform. Automation and Scripting: Develop and maintain automation scripts using PowerShell to streamline and improve processes such as deployment, configuration management, and monitoring. CI/CD Pipelines: Create and manage continuous integration and continuous deployment (CI/CD) pipelines to ensure efficient and error-free software delivery. Monitoring and Performance: Implement and maintain monitoring and alerting systems to ensure system reliability and performance. Use tools like Prometheus, Grafana, or similar. Incident Management: Respond to and resolve incidents, ensuring minimal downtime and impact on users. Conduct root cause analysis and implement preventive measures. Collaboration: Work closely with development, QA, and other teams to promote a culture of reliability and operational excellence. Security and Compliance: Ensure systems are secure and compliant with industry standards and regulations. Documentation: Maintain clear and concise documentation for infrastructure, processes, and procedures. Qualifications: Technical Skills: Proficiency in PowerShell scripting and automation. Experience with cloud platforms (AWS, Azure, GCP). Knowledge of configuration management tools like Ansible, Puppet, or Chef. Familiarity with containerization and orchestration tools (Docker, Kubernetes). Experience: years of experience in a DevOps or SRE role. Experience with CI/CD tools such as Jenkins, GitLab CI, or CircleCI. Experience with monitoring tools like Prometheus, Grafana, or Datadog. Soft Skills: Strong problem-solving and analytical skills. Excellent communication and collaboration abilities. Ability to work in a fast-paced, dynamic environment. Education: Bachelors degree in Computer Science, Information Technology, or a related field, or equivalent work experience. Having the below certifications, will be an added advantage: Microsoft Certified: Windows Server Hybrid Administrator Associate Microsoft Certified: Azure Administrator Associate Microsoft Certified: Modern Desktop Administrator Associate Microsoft Certified: DevOps Engineer Expert Microsoft Certified: Power Platform Fundamentals CompTIA Server+ Microsoft 365 Certified: Enterprise Administrator Expert
Posted 3 months ago
7 - 12 years
9 - 19 Lacs
Chennai, Bengaluru, Kolkata
Hybrid
Hi Hope you are doing well!! Greetings from Randstad Digital Job Title : Site Reliability Engineer JL : Pan india. Experience -6 12 yrs Payroll:Randstad Digital Need alternative email id Dhanush -Senior Consultant - IT Recruitment Randstad Digital M : +91 8637 641979 | E: Dhanush.Ponnuswamy@randstaddigital.com
Posted 3 months ago
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
Accenture
36723 Jobs | Dublin
Wipro
11788 Jobs | Bengaluru
EY
8277 Jobs | London
IBM
6362 Jobs | Armonk
Amazon
6322 Jobs | Seattle,WA
Oracle
5543 Jobs | Redwood City
Capgemini
5131 Jobs | Paris,France
Uplers
4724 Jobs | Ahmedabad
Infosys
4329 Jobs | Bangalore,Karnataka
Accenture in India
4290 Jobs | Dublin 2