Site Reliability Engineer 2

4 - 8 years

4 - 8 Lacs

Posted:16 hours ago| Platform: Foundit logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Key Responsibilities:

  • Architect and manage secure, scalable cloud infrastructure and services, focusing on automation, reliability, and proactive cost management to ensure efficient operations.
  • Implement and refine observability and monitoring solutions using DataDog, ensuring proactive issue identification and efficient resource utilization.
  • Lead CI/CD pipeline development, maintenance, and optimization with Jenkins, integrating AWS services to enhance development workflows and infrastructure automation.
  • Drive the containerization and orchestration of applications using Kubernetes, enhancing scalability, deployment efficiency, and cost-effectiveness.
  • Monitor application and infrastructure performance in AWS, applying tuning and optimizations to ensure optimal resource utilization and user experience while managing costs.
  • Design and manage disaster recovery and backup strategies on AWS, prioritizing data integrity, system availability, and cost efficiency.
  • Provide expert troubleshooting and problem-solving across various platforms and applications within AWS, aiming for minimal disruption and quick resolution.
  • Ensure strict adherence to AWS security standards and compliance with data protection regulations, with a keen eye on cost implications.
  • Keep abreast of new cloud technologies and trends, recommending and implementing improvements for competitive advantage and cost savings.
  • Mentor and support junior team members, fostering a culture of learning, collaboration, and cost-consciousness.
  • Work closely with cross-functional teams to understand requirements and deliver AWS-based solutions that meet business objectives efficiently and cost-effectively

Qualifications:

  • Bachelor s degree in Computer Science, Information Technology,orrelated field, or equivalent experience.
  • A minimum of 3 years of experience in Site Reliability Engineering, Cloud Engineering, or a similar role, with a demonstrated track record of problem-solving in complex, cloud-based environments. This should include extensive experiencewithdesigning, implementing, and managing scalable, highly available, and fault-tolerant systems.
  • Strong expertise in managing cloud environments (preferably in AWS), with hands-on experience in observability platformssuch asDataDog.
  • Proficiency in automation and scripting languages (e.g., Python, Bash) and infrastructure as code (IaC) tools (e.g., Terraform, Ansible).
  • Extensive experience with CI/CD tools, notably Jenkins, and familiarity with containerization and orchestration technologies like Kubernetes.
  • Solid understanding of networking, cloud security best practices, performance optimization, and cost management strategies.
  • Demonstrated commitment to implementing industry-standard site reliability principles and a proactive approach to cost management in daily operations.
  • Proven leadership skills and the ability to mentor junior team members, guide teams through complex operational challenges, and foster a culture of continuous improvement.
  • Excellent verbal and written communication skills, with the ability to work effectively in a team environment and communicate complex technical concepts to a non-technical audience.

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

RecommendedJobs for You

Bengaluru, Karnataka, India

Bengaluru, Karnataka, India