Cloud Operations L2 Support Engineer

4 - 8 years

9 - 16 Lacs

Posted:4 days ago| Platform: Naukri logo

Apply

Work Mode

Hybrid

Job Type

Full Time

Job Description

Job Summary:

Key Responsibilities:

  • Platform Reliability & Availability (SRE Focus):

    • Run the production environment by proactively monitoring availability and taking a holistic view of system health for our cloud-based

      RAN and Core Network

      platforms.
    • Improve the reliability and quality of the system through automation, process refinement, and best practices for both

      RAN and Core

      cloud components.
    • Measure and optimize system performance to ensure efficient resource utilization and optimal user experience for network services.
    • Ensure services are available, the underlying infrastructure is properly functioning and monitor critical applications and related services to guarantee system availability for

      RAN and Core

      functions.

  • Cloud Operations & Kubernetes Management:

    • Design, deploy, and manage Kubernetes clusters and related cloud infrastructure for both

      RAN and Core Network

      application deployments.
    • Implement and maintain containerization strategies and orchestration best practices for telecom workloads.
    • Manage and troubleshoot Robin storage solutions within the Kubernetes environment, supporting the unique storage needs of

      RAN and Core

      applications.
    • Implement and manage CI/CD pipelines for cloud-native

      RAN and Core

      applications.
    • Responsible for cloud resource provisioning, scaling, and cost optimization for all deployed network functions.

  • Incident & Problem Management:

    • Collaborate for high-priority incident tickets (e.g., MIC Reported Incident, Serious/Medium/Small Network Incidents, RIUD Faults), ensuring rapid system recovery for both

      RAN and Core

      impacted services.
    • Be on standby to interface with developers when issues arise and get escalated, providing immediate technical insights and support for cloud-native network functions.
    • Lead Problem Management efforts, including Root Cause Analysis (RCA), for complex incidents affecting

      RAN and Core

      cloud deployments.
    • Identify bugs and work with development teams to prioritize and implement fixes for cloud-native network elements.

  • Monitoring & Alerting:

    • Implement and maintain robust monitoring, logging, and alerting solutions for cloud infrastructure and applications supporting

      RAN and Core

      services.
    • Define and track Service Level Indicators (SLIs) and Service Level Objectives (SLOs) for critical

      RAN and Core

      services running in the cloud.

  • Automation & Tooling:

    • Develop and implement automation scripts and tools to streamline operational tasks, deployments, and incident response for cloud-native

      RAN and Core

      components.
    • Evaluate and integrate new tools and technologies to enhance operational efficiency.

  • Collaboration & Knowledge Sharing:

    • Support for Governance Reports, providing technical data and insights on cloud platform performance for

      RAN and Core

      .
    • Handle customer queries with technical expertise and provide timely resolutions related to cloud-deployed network services.
    • Provide training and mentorship to junior team members on cloud technologies and SRE practices, specifically in the context of telecom networks.
    • Work closely with development, network, and security teams to ensure seamless service delivery across the entire network architecture.

Technical Requirements (Most Visible):

  • Deep expertise in Kubernetes:

    • Cluster deployment, management, and troubleshooting for high-performance telecom workloads.
    • Container orchestration, Pod lifecycle, Deployments, Services, Ingress.
    • Helm charts, Kustomize.
    • Advanced networking within Kubernetes (CNI, CoreDNS, service mesh concepts).
    • Security best practices in Kubernetes, especially for critical network functions.

  • Proficiency in Cloud Platforms:

    Experience with at least one major cloud provider (e.g., AWS, Azure, GCP) with focus on enterprise-grade infrastructure.
  • Containerization Technologies:

    Docker, container.
  • Robin Storage:

    Hands-on experience with Robin.io or similar distributed persistent storage solutions for Kubernetes, particularly for stateful

    RAN and Core

    applications.
  • Infrastructure as Code (IaC):

    Terraform, Ansible, or similar tools for automating cloud and Kubernetes deployments.
  • Scripting & Automation:

    Strong proficiency in Python, Go, Bash, or similar for developing automation and tooling.
  • Monitoring & Logging Tools:

    Prometheus, Grafana, ELK Stack (Elasticsearch, Logstash, Kibana), Splunk, Datadog, or similar, with experience in large-scale data ingestion and analysis.
  • CI/CD Tools:

    Jenkins, GitLab CI/CD, Argo CD, or similar, for continuous deployment of network functions.
  • Operating Systems:

    Linux (e.g., CentOS, Ubuntu, RHEL) expert-level knowledge.
  • Networking Fundamentals:

    Deep understanding of TCP/IP, DNS, Load Balancing, Firewalls, VPNs, and advanced network concepts relevant to telecom (e.g., SRv6, Segment Routing, GTP-U/C).
  • Telecommunications Network Knowledge:

    • Strong understanding of Radio Access Network (RAN) architecture, components, and interfaces (e.g., O-RAN, vRAN concepts).

    • Strong understanding of Core Network (EPC/5GC) architecture, functions (e.g., AMF, SMF, UPF, MME, SGW, PGW), and protocols.

    • Familiarity with network function virtualization (NFV) and software-defined networking (SDN) principles.

Qualifications:

  • Education:

    Bachelors degree in computer science, Engineering, or a related field.
  • Experience:

    Minimum of 5-6 years of experience in a Cloud Engineering, DevOps, or SRE role, with a significant focus on Kubernetes and cloud operations, ideally within a telecommunications or high-availability environment.
  • Problem-Solving:

    Exceptional analytical and problem-solving skills, with a methodical approach to debugging complex distributed systems.
  • Communication:

    Excellent verbal and written communication skills, capable of effectively collaborating with technical and non-technical stakeholders.
  • Proactive Mindset:

    Ability to anticipate issues, identify risks, and propose preventative solutions.
  • Incident Response:

    Proven experience in responding to and resolving critical production incidents in a fast-paced environment.
  • Continuous Improvement:

    A strong desire to learn, adapt, and drive continuous improvement in processes and systems.

    Role & responsibilities

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now
Rakuten Symphony logo
Rakuten Symphony

Telecommunications

N/A

RecommendedJobs for You