Cloud System Debug Engineer

4 - 6 years

0 Lacs

Posted:1 day ago| Platform: Foundit logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Job Title: Cloud System Debug Engineer

Position Overview

Cloud System Debug Engineer

Key Responsibilities

  • Debug complex issues across large-scale

    public, private, and hybrid cloud environments

    .
  • Knowledge of microservices debugging and cloud-native application behavior.
  • Investigate failures in cloud infrastructure components such as networking, storage, virtualization, and orchestration layers.
  • Diagnose and resolve system issues in

    Kubernetes clusters

    , including nodes, pods, networking (CNI), and storage (CSI).
  • Troubleshoot problems with container runtimes such as Docker, containerd, and CRI-O.
  • Debug

    OpenStack

    components including Nova, Neutron, Cinder, Keystone, Glance, Horizon, and related APIs.
  • Debug and optimize

    Ceph

    storage clusters, including OSD issues, MON behavior, CRUSH map analysis, and performance bottlenecks.
  • Perform deep

    Linux system debugging

    , including kernel-level issues, network stack debugging, storage subsystem issues, and performance anomalies.
  • Conduct thorough

    Root Cause Analysis (RCA)

    and implement long-term corrective actions.
  • Improve system observability by enhancing monitoring, logging, and tracing using tools like Prometheus, Grafana, ELK/EFK, and Jaeger.
  • Develop and refine internal tools and automation for diagnostics, system debugging, and infrastructure monitoring.
  • Support production operations through an on-call rotation, addressing high-impact incidents quickly and effectively.
  • Optimize cloud and on-premise infrastructure for performance, scalability, and reliability.
  • Collaborate with DevOps, SRE, platform engineering, and development teams to resolve infrastructure and cloud platform issues.
  • Produce high-quality technical documentation, runbooks, and troubleshooting guides for system and cloud operations.

Required Skills & Qualifications

  • 4+ years

    of experience in cloud infrastructure, distributed systems, Linux administration, or systems engineering.
  • Good expertise with

    cloud platforms

    (AWS, GCP, Azure) or large-scale

    private cloud environments

    .
  • Strong proficiency with

    Kubernetes

    cluster debugging, scaling, and cloud-native architectures.
  • Hands-on experience with

    OpenStack

    cloud components and troubleshooting.
  • Good knowledge of

    Ceph

    distributed storage systems and cluster tuning.
  • In-depth understanding of

    Linux internals

    , including networking, kernel behavior, process management, and storage subsystems.
  • Strong scripting/automation experience (Bash, Python, Ansible, Terraform, Helm).
  • Experience analyzing system logs, traces, crashes, and performance metrics in distributed systems.
  • Proficiency with observability stacks such as Prometheus, Grafana, OpenTelemetry
  • Ability to debug complex interactions between cloud services, orchestration tools, and infrastructure layers.
  • Strong analytical, communication, and documentation skills.

Preferred Qualifications

  • Certifications in AWS/Azure/GCP, CKA/CKAD/CKS, OpenStack, or Ceph.
  • Experience with cloud networking (VXLAN, BGP, SDN, overlay networks).
  • Experience designing, analyzing or operating high-availability, multi-region distributed architectures.

Education

  • Bachelor's or Master's degree in Computer Science, Engineering, or a related field (or equivalent experience).

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now
Ola logo
Ola

Transportation / Mobility

Bangalore

RecommendedJobs for You

bengaluru, karnataka, india