Software Platform Engineer

5 years

0 Lacs

Posted:1 week ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Overview

We are seeking a skilled Platform Engineer to join our team and drive the development, deployment, and supportability of our Kubernetes-based microservices platform, deployed on-premises by customers. You will build comprehensive observability, enable log and report extraction for service cases without real-time access, and optimize our overuse of Kafka by integrating Redis and batch processing. This role requires expertise in Kubernetes, Azure DevOps, C++ support, deployment sizing, and designing for reliability, availability, and serviceability (RAS).

Responsibilities
  • Build Comprehensive Observability

    : Implement centralized metrics, logging, and tracing (e.g., Prometheus, Fluentd, OpenTelemetry) for .NET, Python, Java, C++, Kafka, and Redis, ensuring supportability in on-premises environments.
  • Enable Log/Report Extraction

    : Design customer-facing tools (e.g., CLI scripts, Helm chart options) to collect and export logs/metrics from on-premises deployments for service cases, without real-time access.
  • Optimize Kafka Usage

    : Audit and optimize Kafka configurations (e.g., topics, partitions, compression) to reduce metadata streaming overhead, monitored with Prometheus or Azure Monitor.
  • Implement Alternatives

    : Integrate Redis (e.g., Azure Cache for Redis) for metadata caching/pub-sub and batch processing (e.g., Azure Data Factory, Kubernetes Jobs) for high-volume data, reducing Kafka dependency.
  • Troubleshoot Customer Environments

    : Debug issues in on-premises customer deployments for services (C++, .NET, Python, Java), Kafka, and Redis, using exported logs and metrics.
  • Enhance Product Supportability

    : Build Azure DevOps pipelines and installers (e.g., Helm charts) for consistent, supportable deployments, with documentation for customer support.
  • Contribute to RAS

    : Own serviceability by building observability and diagnostic tools; support reliability/availability via Kubernetes optimization, autoscaling, and fault-tolerant designs.
  • Enforce Standards

    : Implement and enforce structured logging (e.g., JSON with correlation IDs) and resource sizing standards via Azure DevOps pipelines.
  • Optimize Deployment Sizing

    : Set Kubernetes resource requests/limits and autoscaling policies (e.g., HPA, VPA) for services, Kafka, Redis, and batch jobs, based on profiling.
  • Evaluate Service Meshes

    : Assess service meshes (e.g., Linkerd) for improving microservice and data platform observability and communication.
  • Support C++ Services

    : Assist developers in containerizing, deploying, and debugging C++ services, ensuring integration with observability, Kafka, Redis, or batch workflows.
  • Automate with Azure DevOps

    : Build CI/CD pipelines in Azure DevOps for automated builds, tests, and deployments, integrating with AKS, Kafka, and Redis.
Qualifications
  • Experience

    : 3–5 years with Kubernetes, Azure DevOps (AKS, pipelines), and Kafka administration.
  • Technical Skills

    :
  • Expert in Kubernetes (CKA/CKAD preferred) and Azure DevOps (YAML pipelines, AKS integration).
  • Proficient in observability tools (e.g., Prometheus, Grafana, Fluentd, OpenTelemetry, Azure Monitor) for metrics, logs, and tracing.
  • Experience with on-premises Kubernetes deployments and log/report extraction for service cases.
  • Proficient in Kafka optimization (e.g., topic management, consumer groups) and monitoring.
  • Knowledge of Redis (e.g., Azure Cache for Redis, pub/sub) and batch processing (e.g., Azure Data Factory, Kubernetes Jobs).
  • Familiarity with C++ build systems (e.g., CMake) and debugging (e.g., gdb) in Kubernetes.
  • Proficiency in Kubernetes resource management and autoscaling (e.g., HPA, VPA).
  • Scripting skills (e.g., Python, Bash) for automation, diagnostics, and log extraction.
  • Customer Focus

    : Proven ability to troubleshoot on-premises customer environments and build supportable deployment and observability tools.
  • Standards Enforcement

    : Experience enforcing logging, sizing, and data platform standards via Azure DevOps pipelines.
  • RAS Expertise

    : Ability to design for serviceability (observability, diagnostics) and contribute to reliability/availability through platform optimization.
Nice-to-Haves
  • Experience with service meshes (e.g., Linkerd, Istio) and their integration with Azure.
  • Familiarity with .NET, Python, or Java for developer collaboration.
  • Knowledge of air-gapped Kubernetes deployments (e.g., Kubeadm, K3s).


Mock Interview

Practice Video Interview with JobPe AI

Start DevOps Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You