Senior Engineer, Production Operations

2 - 5 years

4 - 7 Lacs

Posted:Just now| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

As a Senior Engineer , you will be a key individual contributor within our Production Operations team. You will be instrumental in designing, building, and maintaining highly reliable, scalable, and performant cloud infrastructure and systems that support Greenlights mission-critical services. This role is for a seasoned engineer who thrives on solving complex operational challenges, enhancing system stability, and improving efficiency through automation and best practices.
 
What you will be doing:
    • Contribute to the design, implementation, and maintenance of Greenlights core cloud infrastructure and Site Reliability Engineering (SRE) practices to ensure high availability, scalability, and performance.
    • Develop, maintain, and optimize our cloud infrastructure using Infrastructure as Code (primarily Terraform) and other automation tools.
    • Collaborate closely with development and security teams to embed SRE principles into the software development lifecycle, promoting secure and reliable coding practices.
    • Design and implement robust monitoring, logging, and alerting solutions to provide comprehensive visibility into system health.
    • Actively participate in and support incident response, performing deep-dive root cause analysis, and contributing to actionable blameless postmortems to prevent recurrence.
    • Identify and implement architectural improvements to enhance system reliability, resilience, and operational efficiency.
    • Automate operational tasks and processes to reduce toil and improve efficiency.
    • Research, evaluate, and advocate for new technologies and tools that can improve our operational posture and efficiency.
    • Enhance existing services and applications to increase availability, reliability, and scalability in a microservices environment.
    • Build and improve engineering tooling, processes, and standards to enable faster, more consistent, more reliable, and highly repeatable application delivery.
What you'll bring to the team:
    • 5+ years of experience in a Site Reliability Engineering, Production Operations, or similar role, with a strong focus on cloud infrastructure and distributed systems.
    • Proven experience architecting, building, and maintaining highly available, secure, and scalable systems in a public cloud environment (AWS strongly preferred).
    • Strong proficiency with IaC tools, particularly Terraform.
    • Demonstrated experience in automating operational tasks using scripting languages (eg, Python, Go, Bash) and automation platforms.
    • Expertise in designing and implementing comprehensive monitoring, logging, and alerting solutions (eg, Datadog, Prometheus, Grafana, ELK stack).
    • Solid understanding of incident response best practices, with experience in troubleshooting and resolving complex production issues.
    • Strong understanding of distributed systems, microservices architectures, and containerization technologies (Docker, Kubernetes/EKS).
    • Exceptional analytical and problem-solving skills, with a track record of debugging complex issues in production environments.
    • Excellent communication, collaboration, and interpersonal skills. Ability to clearly articulate technical concepts to both technical and non-technical audiences.
    • A passion for identifying and implementing improvements in system reliability, performance, and operational efficiency.
Technologies we use:
    • AWS
    • MySQL, DynamoDB, Redis
    • GitHub Actions for CI pipelines
    • Kubernetes (specifically EKS)
    • Ambassador, Helm, Argo CD, LinkerD
    • REST, gRPC, graphQL
    • React, Redux, Swift, Node.js , Kotlin, Java, Go, Python
    • Datadog, Prometheus

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You