Staff Site Reliability Engineer, Cloud Efficiency

8 - 13 years

18 - 22 Lacs

Posted:3 days ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

We are looking for a Staff Site Reliability Engineer to join Oktas Infrastructure Platform FinOps team. FinOps team creates tooling and workflows that activate Cloud Cost Optimization, Discount management, Project-costing and Financial accountability, helping the organization make informed decisions around cloud spending. Sitting inside the SRE organization, our work directly supports the companys efficiency and scalability goals by helping align engineering and finance teams on responsible cloud usage. You will act as technical lead for a small team of junior Engineers focused on accelerating our FinOps and other related best practices for the SRE organization. The ideal candidate is someone who exemplifies the ethics of, "If you have to do something more than once, automate it" and who can rapidly self-educate on new concepts and tools.
What youll be doing?
  • Work within and be technical Lead for a specialized SRE team designing, building, running, and monitoring FinOps tooling that helps project-cost, measure, estimate, tag and cost-optimize our global production infrastructure.
  • Design and implement FinOps tooling & pipelines that collect, analyze and present usage and cost data from infrastructure in a variety of instance- and container-based architectures such as EC2, ECS and EKS across multiple environments.
  • Continuously evolving and maintaining our Cost monitoring tools and platforms, leveraging full-stack technologies such as Python, Bash, NodeJS or React/Angular, Nginx, AppSync, S3, IAM, EC2/Fargate, Lambda, DynamoDB, Glue, Athena - a fullstack developer for critical internal tooling.
  • Deploy FinOps tooling and policies into containerized environments using Terraform, Cloud Custodian (c7n), CloudFormation
  • Leverage and manage data-pipelines for our CUR data into analytics tooling such as Quicksights and Tableau
  • Setup of Budgeting and Alerting using SQS, SNS from AWS Cost Explorer and other self-managed tooling, including GHG inventory management, and equivalent in GCP or other Cloud infrastructure environments
  • Be an evangelist for FinOps best practices and also lead initiatives/projects to strengthen our cost optimization posture for critical infrastructure.
  • Developing and maintaining technical documentation, runbooks, procedures and help develop Training and FinOps updates for the broader Engineering team
  • Be a technical SME on cost optimization and FinOps practices for a broader Engineering team that designs and builds Okta's production infrastructure, focusing on security at scale in the cloud.

  • What youll bring to the role?
  • Are always willing to go the extra milesee a problem, fix the problem.
  • Are passionate about encouraging the development of engineering peers and leading by example.
  • Have experience with either AWS and/or GCP Cloud environments.
  • Have an understanding and familiarity with configuration management tools like Terraform, Cloudformation, also Cloud Custodian for this role
  • Have expert-level abilities in operational tooling languages such as Python, Go, Bash for back-end, NodeJS for front-end, and use of source control systems
  • Have knowledge of various types of data stores and data pipelines, particularly DynamoDB, Athena, Glue.
  • Experience with industry-standard FinOps tooling such as Cloud Custodian, OpenCost/KubeCost
  • Have knowledge of CI/CD principles, Linux fundamentals, OS hardening, networking concepts, and IP protocols.
  • Skilled in using Cloudwatch, Grafana, Splunk for real-time monitoring and proactive incident detection
  • Ability to lead small technical teams, along with a strong ability to collaborate with cross-functional teams and promote a cost-aware engineering culture.

  • Experience in the following
  • 8+ years of experience running and managing complex AWS or other cloud networking infrastructure resources including architecture, security, scalability and cost optimization
  • 8+ years Coding/Automation experience using Python, Bash, NodeJS
  • 5+ years of experience with Terraform or other Infrastructure as Code
  • 5+ years of experience in automating CI/CD pipelines using tooling such as Spinnaker, ArgoCD or general GitOps with an emphasis on integrating security throughout the process.
  • 4+ years Kubernetes, ECS or related Container-management experience
  • Exposure in implementing monitoring and observability solutions such as Grafana or Splunk to enhance security and detect incidents in real-time.
  • Strong leadership and collaboration skills with experience working cross-functionally with site reliability engineers and developers to encourage cost-optimization best practices and policies.
  • Strong Linux understanding and experience.
  • Strong security background and knowledge.
  • BS In computer science (or equivalent experience).
  • Mock Interview

    Practice Video Interview with JobPe AI

    Start Python Interview
    cta

    Start Your Job Search Today

    Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

    Job Application AI Bot

    Job Application AI Bot

    Apply to 20+ Portals in one click

    Download Now

    Download the Mobile App

    Instantly access job listings, apply easily, and track applications.

    coding practice

    Enhance Your Python Skills

    Practice Python coding challenges to boost your skills

    Start Practicing Python Now

    RecommendedJobs for You