Home
Jobs

3 Helmcharts Jobs

Filter
Filter Interviews
Min: 0 years
Max: 25 years
Min: ₹0
Max: ₹10000000
Setup a job Alert
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

12.0 - 18.0 years

0 Lacs

, India

On-site

Foundit logo

General Overview Functional Area: Engineering Career Stream: Design - Software Engineering Job Code: SSE-ENG-DSE Job Level: Level 11 IC/MGR: Individual Contributor Direct/Indirect Indicator: Indirect Summary Celestica is looking for skilled and enthusiastic software engineers to join our team in developing cutting-edge data centers that leverage advanced GPU technologies. In this dynamic role, you will build orchestration software for the entire rack, develop integrated visualization tools for rack components, and create comprehensive diagnostics to optimize GPU server utilization. The ideal candidate will have a strong background in Orchestration Software development and experience creating solutions for the data center industry. Detailed Description Architect/Develop a full stack application to ease the task of designing, deploying and monitoring a next generation data centre including GPU/AI compute elements. Use Cloud Native Development Methods to support Kubernetes deployments for different scenarios. Build Template Driven Rack Design techniques to support various Rack element compositions. Scalable software that can gather data from a large number of devices, monitor and make it easy to visualize trends. Build Network validation techniques for GPU centric traffic patterns. Agile software that can react immediately to operational issues and self-heal the deployments Optimize code for performance, efficiency, scalability. Adopt GenAI tools for development efficiency. Work effectively in a team environment, collaborating with engineers and peer functional leads from different disciplines to innovate solutions, triage issues and speed execution Mentor and coach team members on the technical skills and approaches to solve problems. Present innovation and value addition from our software in technical forums and customer interactions Knowledge/Skills/Competencies Strong programming skills: Extensive Programming in Python , Go. Database system knowledge:Experience with SQL database like Postgres SQL and NoSQL databases like MongoDB , TSDB like Prometheus. Kubernetes Deployment Skills : Experience in Container orchestration, pod health checks, Networking, Helmcharts, Deployment Strategies. Familiar with UI Frameworks. Rest API Frameworks and Backend for Frontend Design methodologies. Debugging and testing skills:Ability to identify and resolve software issues. Problem-solving skills:Strong analytical and problem-solving abilities Experience with data center deployments: Prior experience in data center architectures, developing and maintaining software for deployments is a must. Clear Communication: Proven ability to articulate requirements and vision to large and diverse audience through written documents like architecture specifications and verbal presentations in technical forums is required. Physical Demands Duties of this position are performed in a normal office environment. Duties may require extended periods of sitting and sustained visual concentration on a computer monitor or on numbers and other detailed data. Repetitive manual movements (e.g., data entry, using a computer mouse, using a calculator, etc.) are frequently required. Occasional travel may be required. Typical Experience 12 to 18 years Typical Education Bachelor degree or consideration of an equivalent combination of education and experience. Educational Requirements may vary by Geography Notes This job description is not intended to be an exhaustive list of all duties and responsibilities of the position. Employees are held accountable for all duties of the job. Job duties and the % of time identified for any function are subject to change at any time.

Posted 3 weeks ago

Apply

4 - 6 years

6 - 8 Lacs

Bengaluru

Work from Office

Naukri logo

We are looking for Site Reliability Engineer! Youll make a difference by: SRE L1 Commander is responsible for ensuring the stability, availability, and performance of critical systems and services. As the first line of defense in incident management and monitoring, the role requires real-time response, proactive problem solving, and strong coordination skills to address production issues efficiently. Monitoring and Alerting: Proactively monitor system health, performance, and uptime using monitoring tools like Datadog, Prometheus. Serving as the primary responder for incidents to troubleshoot and resolve issues quickly, ensuring minimal impact on end-users. Accurately categorizing incidents, prioritize them based on severity, and escalate to L2/L3 teams when necessary. Ensuring systems meet Service Level Objectives (SLOs) and maintain uptime as per SLAs. Collaborating with DevOps and L2 teams to automate manual processes for incident response and operational tasks. Performing root cause analysis (RCA) of incidents using log aggregators and observability tools to identify patterns and recurring issues. Following predefined runbooks/playbooks to resolve known issues and document fixes for new problems. Youd describe yourself as: Experienced professional with 4 to 6 years of relevant experience in SRE, DevOps, or Production Support with monitoring tools (e.g., Prometheus, Datadog). Working knowledge of Linux/Unix operating systems and basic scripting skills (Python, Gitlab actions) cloud platforms (AWS, Azure, or GCP). Familiarity with container orchestration (Kubernetes, Docker, Helmcharts) and CI/CD pipelines. Exposure with ArgoCD for implementing GitOps workflows and automated deployments for containerized applications. Possessing experience in Monitoring: Datadog, Infrastructure: AWS EC2, Lambda, ECS/EKS, RDS, Networking: VPC, Route 53, ELB and Storage: S3, EFS, Glacier. Strong troubleshooting and analytical skills to resolve production incidents effectively. Basic understanding of networking concepts (DNS, Load Balancers, Firewalls). Good communication and interpersonal skills for incident communication and escalation. Having preferred certifications: AWS Certified SysOps Administrator Associate, AWS Certified Solutions Architect Associate or AWS Certified DevOps Engineer Professional

Posted 1 month ago

Apply

4 - 6 years

6 - 8 Lacs

Bengaluru

Work from Office

Naukri logo

We are looking for Site Reliability Engineer! Youll make a difference by: SRE L1 Commander is responsible for ensuring the stability, availability, and performance of critical systems and services. As the first line of defense in incident management and monitoring, the role requires real-time response, proactive problem solving, and strong coordination skills to address production issues efficiently. Monitoring and Alerting: Proactively monitor system health, performance, and uptime using monitoring tools like Datadog, Prometheus. Serving as the primary responder for incidents to troubleshoot and resolve issues quickly, ensuring minimal impact on end-users. Accurately categorizing incidents, prioritize them based on severity, and escalate to L2/L3 teams when necessary. Ensuring systems meet Service Level Objectives (SLOs) and maintain uptime as per SLAs. Collaborating with DevOps and L2 teams to automate manual processes for incident response and operational tasks. Performing root cause analysis (RCA) of incidents using log aggregators and observability tools to identify patterns and recurring issues. Following predefined runbooks/playbooks to resolve known issues and document fixes for new problems. Youd describe yourself as: Experienced professional with 4 to 6 years of relevant experience in SRE, DevOps, or Production Support with monitoring tools (e.g., Prometheus, Datadog). Working knowledge of Linux/Unix operating systems and basic scripting skills (Python, Gitlab actions) cloud platforms (AWS, Azure, or GCP). Familiarity with container orchestration (Kubernetes, Docker, Helmcharts) and CI/CD pipelines. Exposure with ArgoCD for implementing GitOps workflows and automated deployments for containerized applications. Possessing experience in Monitoring: Datadog, Infrastructure: AWS EC2, Lambda, ECS/EKS, RDS, Networking: VPC, Route 53, ELB and Storage: S3, EFS, Glacier. Strong troubleshooting and analytical skills to resolve production incidents effectively. Basic understanding of networking concepts (DNS, Load Balancers, Firewalls). Good communication and interpersonal skills for incident communication and escalation. Having preferred certifications: AWS Certified SysOps Administrator Associate, AWS Certified Solutions Architect Associate or AWS Certified DevOps Engineer Professional

Posted 2 months ago

Apply
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies