Get alerts for new jobs matching your selected skills, preferred locations, and experience range. Manage Job Alerts
0.0 years
0 Lacs
Hyderabad, Telangana, India
On-site
Job Summary We are looking for a highly skilled and adaptable Site Reliability Engineer to become a key member of our Cloud Engineering team. In this crucial role, you will be instrumental in designing and refining our cloud infrastructure with a strong focus on reliability, security, and scalability . As an SRE, you&aposll apply software engineering principles to solve operational challenges, ensuring the overall operational resilience and continuous stability of our systems. This position requires a blend of managing live production environments and contributing to engineering efforts such as automation and system improvements. Key Responsibilities: Cloud Infrastructure Architecture and Management: Design, build, and maintain resilient cloud infrastructure solutions to support the development and deployment of scalable and reliable applications. This includes managing and optimizing cloud platforms for high availability, performance, and cost efficiency. Enhancing Service Reliability: Lead reliability best practices by establishing and managing monitoring and alerting systems to proactively detect and respond to anomalies and performance issues. Utilize SLI, SLO, and SLA concepts to measure and improve reliability. Identify and resolve potential bottlenecks and areas for enhancement. Driving Automation and Efficiency: Contribute to the automation, provisioning, and standardization of infrastructure resources and system configurations. Identify and implement automation for repetitive tasks to significantly reduce operational overhead. Develop Standard Operating Procedures (SOPs) and automate workflows using tools like Rundeck or Jenkins. Incident Response and Resolution: Participate in and help resolve major incidents, conduct thorough root cause analyses, and implement permanent solutions. Effectively manage incidents within the production environment using a systematic problem-solving approach. Collaboration and Innovation: Work closely with diverse stakeholders and cross-functional teams, including software engineers, to integrate cloud solutions, gather requirements, and execute Proof of Concepts (POCs). Foster strong collaboration and communication. Guide designs and processes with a focus on resilience and minimizing manual effort. Promote the adoption of common tooling and components, and implement software and tools to enhance resilience and automate operations. Be open to adopting new tools and approaches as needed. Required Skills and Experience: Cloud Platforms: Demonstrated expertise in at least one major cloud platform (AWS, Azure, or GCP). Infrastructure Management: Proven proficiency in on-premises hosting and virtualization platforms (VMware, Hyper-V, or KVM). Solid understanding of storage internals (NAS, SAN, EFS, NFS) and protocols (FTP, SFTP, SMTP, NTP, DNS, DHCP). Experience with networking and firewall technologies. Strong hands-on experience with Linux internals and operating systems (RHEL, CentOS, Rocky Linux). Experience with Windows operating systems to support varied environments. Extensive experience with containerization (Docker) and orchestration (Kubernetes) technologies. Automation & IaC: Proficiency in scripting languages (shell and Python). Experience with configuration management tools (Ansible or Puppet). Must have exposure to Infrastructure as Code (IaC) tools (Terraform or CloudFormation). Monitoring & Observability: Experience setting up and configuring monitoring tools (Prometheus, Grafana, or the ELK stack). Hands-on experience implementing OpenTelemetry for observability. Familiarity with monitoring and logging tools for cloud-based applications. Service Reliability Concepts: A strong understanding of SLI, SLO, SLA, and error budgeting. Soft Skills & Mindset: Excellent communication and interpersonal skills for effective teamwork. We value proactive individuals who are eager to learn and adapt in a dynamic environment. Must possess a pragmatic and adaptable mindset, with a willingness to step outside comfort zones and acquire new skills. Ability to consider the broader system impact of your work. Must be a change advocate for reliability initiatives. Desired/Bonus Skills: Experience with DevOps toolchain elements like Git, Jenkins, Rundeck, ArgoCD, or Crossplane. Experience with database management, particularly MySQL and Hadoop. Knowledge of cloud cost management and optimization strategies. Understanding of cloud security best practices, including data encryption, access controls, and identity management. Experience implementing disaster recovery and business continuity plans. Familiarity with ITIL (Information Technology Infrastructure Library) processes Show more Show less
Posted 4 days ago
3.0 - 8.0 years
5 - 10 Lacs
Hyderabad
Work from Office
Job Profile Summary & Description: A Platform Operations Engineer is responsible for supporting multiple applications, consisting of different technologies, in an Enterprise Hosted environment. The individual provides escalation support to Multiple Platforms and its Services. They perform monthly/quarterly/yearly upgrades of the applications in the environment and work within teams to create solutions to identified issues. They are also responsible for communication to the end users. Shift Timing: Monthly Rotational Support Role: 24/7 Roles / Responsibilities: Fully functional and self-directed Resolve issues, manage workload, and balance priorities through frequent interruptions while meeting specific, time sensitive deadlines. Analyze clients/team requests to solve short- and long-term technical issues. Engineer solutions to meet companys SLA's and meet client expectations. Monitor and assist to tune applications in the environment through project initiatives, enhancements, and integration. Perform upgrades of the applications in the Hosted environment. Provides formal mentorship. High complexity assignments- owner. Moderate complexity assignments - owner (1 or >)l Low complexity assignments - provide oversight/review Regularly lead self and others and/or established as Product SME and/or established as specialist Sees the whole picture and adjusts work accordingly. Mentor others with less experience. Work with Senior Platform Operations Engineer to create and maintain documentation for all production environments and review regularly. Engage with Sr. Engineers/Team to document Standard Operating procedures, design changes and review prior to installation/implementation. Required Qualification: Typically requires a minimum 1 - 3 years of related experience in a professional job role with a bachelors degree in computer science or related field or 3 years and a master's degree; or a PhD without experience; or equivalent work experience. Basic knowledge on OS, Database & Networking concepts. Excellent problem-solving and communication abilities Strong knowledge on linux administration & linux server administration Thorough understanding of protocols such as DNS, HTTP, LDAP, SMTP, and SNMP Extensive understanding of Linux, including RedHat, CentOS, Rocky Linux Strong understanding on web servers, application servers, DNS & Mail servers. Should be good at any of the scripting languages. Shell/Python Knowledge on configuration management tools like puppet, Ansible is a plus. Industry certifications for application/s being supported are a plus. Experience with configuring and managing zones. Experience providing day to day administration and monitoring of servers to include: Provide support to ensure Linux Servers are operational. Provide file and system security management, log analysis, and statistical report generation. Analyse Security Scans and assist with vulnerability remediation. Good analytical ability (Basic knowledge of ELK, Basic knowledge of Tableau) Should be ready to work in rotational shifts.
Posted 1 month ago
3.0 - 5.0 years
5 - 7 Lacs
Hyderabad
Work from Office
A Platform Operations Engineer is responsible for supporting multiple applications, consisting of different technologies, in an Enterprise Hosted environment. The individual provides escalation support to Multiple Platforms and its Services. They perform monthly/quarterly/yearly upgrades of the applications in the environment and work within teams to create solutions to identified issues. They are also responsible for communication to the end users. Job Location: Hyderabad Shift Timing: Rotational Shift (24/7) Required Certification: RedHat certifications (RHCSA / RHCE) Duties & Responsibilities: Fully functional and self-directed Resolve issues, manage workload, and balance priorities through frequent interruptions while meeting specific, time sensitive deadlines. Analyze clients/team requests to solve short- and long-term technical issues. Engineer solutions to meet companys SLA's and meet client expectations Monitor and assist to tune applications in the environment through project initiatives, enhancements and integration. Perform upgrades of the applications in the Hosted environment. Provides formal mentorship High complexity assignments- owner Moderate complexity assignments - owner (1 or >)l Low complexity assignments - provide oversight/review Regularly lead self and others and/or established as Product SME and/or established as specialist Sees the whole picture and adjusts work accordingly. Mentors others with less experience. Work with Senior Platform Operations Engineer to create and maintain documentation for all production environments and review regularly.??? Engage with Sr. Engineers/Team to document Standard Operating procedures, design changes and review prior to installation/implementation. Required Qualifications: Typically requires a minimum of 3 - 5 years of related experience in a professional job role with a Bachelor's degree in Computer Science or related field.; or 3 years and a Master's degree; or a PhD without experience; or equivalent work experience Basic knowledge on OS, Database & Networking concepts. Excellent problem-solving and communication abilities Strong knowledge on linux administration & linux server administration Thorough understanding of protocols such as DNS, HTTP, LDAP, SMTP, and SNMP Extensive understanding of Linux, including RedHat, CentOS, Rocky Linux Strong understanding on web servers, application servers, DNS & Mail servers. Should be good at any of the scripting languages. Shell/Python Knowledge on configuration management tools like puppet, Ansible is a plus. Industry certifications for application/s being supported are a plus. Experience with configuring and managing zones. Experience providing day to day administration and monitoring of servers to include: Provide support to ensure Linux Servers are operational. Provide file and system security management, log analysis, and statistical report generation. Analyze Security Scans and assist with vulnerability remediation. Good analytical ability (Basic knowledge of ELK, Basic knowledge of Tableau) Should be ready to work in rotational shifts.
Posted 2 months ago
3 - 8 years
17 - 32 Lacs
Hyderabad
Work from Office
We are looking for a Site Reliability Engineer (SRE) with 3-6 years of experience to join our growing Cloud Engineering team. The ideal candidate should have hands-on experience managing cloud infrastructure, automating operations, and ensuring service reliability in production environments. This role is perfect for someone who is technically strong, process-oriented, and passionate about automation, observability, and reliability in cloud-native environments. Key Responsibilities: Deploy, monitor, and maintain cloud infrastructure (AWS or GCP) Automate infrastructure provisioning using Terraform/CloudFormation Implement and manage CI/CD pipelines and routine operational tasks Support production systems, respond to incidents, and conduct root cause analysis Collaborate with development teams to improve system performance and availability Set up monitoring and logging tools (e.g., Prometheus, ELK) Key Skills: Cloud: Hands-on with AWS or GCP OS: Proficient with Linux-based systems (CentOS, Rocky Linux) Automation: Ansible, shell scripting IaC: Terraform or CloudFormation Containers: Docker, Kubernetes (basic orchestration skills) Monitoring/Logging: Prometheus, Grafana, ELK stack CI/CD Tools: Jenkins, Rundeck, ArgoCD Version Control: Git
Posted 2 months ago
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
Accenture
39581 Jobs | Dublin
Wipro
19070 Jobs | Bengaluru
Accenture in India
14409 Jobs | Dublin 2
EY
14248 Jobs | London
Uplers
10536 Jobs | Ahmedabad
Amazon
10262 Jobs | Seattle,WA
IBM
9120 Jobs | Armonk
Oracle
8925 Jobs | Redwood City
Capgemini
7500 Jobs | Paris,France
Virtusa
7132 Jobs | Southborough