Manager - Tech Consulting - FS - CNS - TC - Platforms

8 - 13 years

20 - 25 Lacs

Posted:22 hours ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

Your key responsibilities 
  
  • Ensure system reliability, stability and performance by maintaining service-level objectives (SLOs) and minimizing downtime and incidents.
  • Collaborate with internal teams to assess system health, stability and resilience, providing architectural and design recommendations for reliability.
  • Lead incident management and post-incident reviews, diagnosing issues, deploying fixes and implementing preventive measures.
  • Drive automation of operational tasks, including deployments, monitoring, scaling and system recovery, to improve efficiency and reduce manual intervention.
  • Define and track key performance indicators (KPIs) such as availability, latency and error rates to optimize system performance and inform decision-making.
  • Plan and execute chaos engineering experiments to test system resilience and coordinate performance testing for reliability improvements.
  • Ensure alignment between service-level indicators (SLIs) and service-level objectives (SLOs) across the product family.
  • Develop and maintain product-level runbooks for incident response, collaborating with SRE teams to ensure effective recovery processes.
  • Provide leadership in tool selection and best practices for site reliability engineering (SRE), making final decisions on tools, libraries and standards.
  • Work closely with development teams to improve software reliability, scalability and resilience by offering feedback on design and architecture.
  • Lead troubleshooting and triage efforts during user-impacting incidents, ensuring swift resolution and minimal disruption.
  • Participate in special projects and continuous improvement initiatives, supporting long-term reliability and scalability goals.

  •   
     Skills and attributes for success 
  • A team player with strong analytical, communication and interpersonal skills
  • Constantly updating yourself about new technologies in the market
  • A winning personality and the ability to become a trusted advisor to the stakeholders

  •   
     To qualify for the role, you must have 
  • Minimum 8 years of related experience, with at least 5 years in software development.
  • Bachelors degree (B.E./B.Tech) in Computer Science or IT, or Bachelors in Computer Applications (BCA) from a recognized institution.
  • Expertise in Site Reliability Engineering (SRE), DevOps, and system reliability, ensuring high availability and performance.
  • Strong experience in mobile platform reliability (Android, iOS), including performance monitoring and optimization.
  • Proficiency in observability and resiliency tools such as Splunk, Honeycomb, Datadog, Prometheus, or Grafana.
  • Hands-on experience with cloud platforms (AWS, Azure, GCP) and containerization/orchestration tools like Kubernetes, Docker, ECS, or Fargate.
  • Solid understanding of automation, Infrastructure-as-Code (IaC), and configuration management using Terraform, Ansible, or CloudFormation.
  • Strong programming and scripting skills in Python, Go, Bash, or Java, with experience in automating operational tasks.
  • Experience with CI/CD pipelines, deployment automation, and version control tools like GitHub, Bitbucket, Jenkins, or Bamboo.
  • Deep knowledge of incident management, root cause analysis, and post-incident reviews, focusing on continuous improvement.

  •  Ideally, youll also have 
  • Strong verbal and written communication, facilitation, relationship-building, presentation and negotiation skills.
  • Be highly flexible, adaptable, and creative.
  • Comfortable interacting with senior executives (within the firm and at the client)

  • Mock Interview

    Practice Video Interview with JobPe AI

    Start Python Interview
    cta

    Start Your Job Search Today

    Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

    Job Application AI Bot

    Job Application AI Bot

    Apply to 20+ Portals in one click

    Download Now

    Download the Mobile App

    Instantly access job listings, apply easily, and track applications.

    coding practice

    Enhance Your Python Skills

    Practice Python coding challenges to boost your skills

    Start Practicing Python Now
    EY logo
    EY

    Professional Services

    London

    RecommendedJobs for You